mirror of
https://github.com/google-deepmind/deepmind-research.git
synced 2026-05-30 20:35:25 +08:00
Update introduction paragraph in README.md
PiperOrigin-RevId: 290061509
This commit is contained in:
committed by
Diego de Las Casas
parent
d6385d6c13
commit
829d8444bb
@@ -1,11 +1,6 @@
|
|||||||
# Side effects penalties
|
# Side effects penalties
|
||||||
|
|
||||||
This is the code for the paper [Penalizing side effects using stepwise relative
|
Side effects are unnecessary disruptions to the agent's environment while completing a task. Instead of trying to explicitly penalize all possible side effects, we give the agent a general penalty for impacting the environment, defined as a deviation from some baseline state. For example, a reversibility penalty measures unreachability (deviation) of the starting state (baseline). This code implements a tabular Q-learning agent with different impact penalties. Each penalty consists of a deviation measure (none, unreachability, relative reachability, or attainable utility), a baseline (starting state, inaction, or stepwise inaction), and some other design choices. This is the code for the paper [Penalizing side effects using stepwise relative reachability](https://arxiv.org/abs/1806.01186) by Krakovna et al (2019).
|
||||||
reachability](https://arxiv.org/abs/1806.01186) by Krakovna et al (2019). It
|
|
||||||
implements a tabular Q-learning agent with different penalties for side effects.
|
|
||||||
Each side effects penalty consists of a deviation measure (none, unreachability,
|
|
||||||
relative reachability, or attainable utility) and a baseline (starting state,
|
|
||||||
inaction, or stepwise inaction).
|
|
||||||
|
|
||||||
## Instructions
|
## Instructions
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user