How would you make an agent capable of solving the complex hierarchical tasks? Imagine a problem that is complex and requires a collection of skills, which are extremely hard to learn in one go with sparse rewards (e.g. solving complex object manipulation in robotics). Hence, one might need to learn to generate a curriculum of simpler tasks, so that overall a student network can learn to perform a complex task efficiently.

Successor representations were introduced by Dayan in 1993, as a way to represent states by thinking of how “similarity” for TD learning is similar to the temporal sequence of states that can be reached from a given state.
Dayan derived it in the tabular case, but let’s do it when assuming a feature vector $\phi$.
We assumes that the reward function can be factorised linearly: $$r(s) = \phi(s) \cdot w$$

© 2017 Feryal Behbahani · Powered by the Academic theme for Hugo.