mirror of
https://github.com/google-deepmind/deepmind-research.git
synced 2026-05-30 12:25:25 +08:00
Update demo images for Gamma=1 experiment to match style for other images.
PiperOrigin-RevId: 291199661
This commit is contained in:
committed by
Diego de Las Casas
parent
70ede47463
commit
bd042fd283
+8
-6
@@ -142,9 +142,10 @@ the exploit phase.<br>
|
||||
For 10 replicas without TVT and with the same hyperparameters, we see consistent
|
||||
low performance.<br>
|
||||
# 
|
||||
For 5 replicas with gamma equal to 1, performance of the RMA agent without TVT
|
||||
is improved, but is unstable and never goes above 7.<br>
|
||||
# 
|
||||
For 10 replicas without TVT and with gamma equal to 1, performance of the RMA
|
||||
agent without TVT is improved, but is unstable and never consistently goes above
|
||||
6.<br>
|
||||
# 
|
||||
|
||||
### Active-visual-match
|
||||
Across 10 replicas, we found that the TVT agents get to a score of 10,
|
||||
@@ -156,11 +157,12 @@ For 10 replicas without TVT and with the same hyperparamters, performance is
|
||||
better than chance level but not at the maximum level, indicating that it is not
|
||||
able to actively seek for information in the explore phase and instead must rely
|
||||
on randomly encountering the information.<br>
|
||||
# 
|
||||
For 5 replicas with gamma equal to 1, performance of the RMA agent without TVT
|
||||
# 
|
||||
For 10 replicas wihtout TVT and with gamma equal to 1, performance of the RMA
|
||||
agent without TVT
|
||||
is considerably worse, suggesting the behavior learnt from later phases does not
|
||||
result in undirected exploration in the first phase.
|
||||
# 
|
||||
# 
|
||||
|
||||
## Citing this work
|
||||
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 55 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 42 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 36 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 50 KiB |
Reference in New Issue
Block a user