From 829d8444bba8c6d0b9f4bbfeb760bc22a7a1cbde Mon Sep 17 00:00:00 2001 From: Victoria Krakovna Date: Thu, 16 Jan 2020 15:20:56 +0000 Subject: [PATCH] Update introduction paragraph in README.md PiperOrigin-RevId: 290061509 --- side_effects_penalties/README.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/side_effects_penalties/README.md b/side_effects_penalties/README.md index 5cee43c..ddb33da 100644 --- a/side_effects_penalties/README.md +++ b/side_effects_penalties/README.md @@ -1,11 +1,6 @@ # Side effects penalties -This is the code for the paper [Penalizing side effects using stepwise relative -reachability](https://arxiv.org/abs/1806.01186) by Krakovna et al (2019). It -implements a tabular Q-learning agent with different penalties for side effects. -Each side effects penalty consists of a deviation measure (none, unreachability, -relative reachability, or attainable utility) and a baseline (starting state, -inaction, or stepwise inaction). +Side effects are unnecessary disruptions to the agent's environment while completing a task. Instead of trying to explicitly penalize all possible side effects, we give the agent a general penalty for impacting the environment, defined as a deviation from some baseline state. For example, a reversibility penalty measures unreachability (deviation) of the starting state (baseline). This code implements a tabular Q-learning agent with different impact penalties. Each penalty consists of a deviation measure (none, unreachability, relative reachability, or attainable utility), a baseline (starting state, inaction, or stepwise inaction), and some other design choices. This is the code for the paper [Penalizing side effects using stepwise relative reachability](https://arxiv.org/abs/1806.01186) by Krakovna et al (2019). ## Instructions