Successor representations were introduced by Dayan in 1993, as a way to represent states by thinking of how “similarity” for TD learning is similar to the temporal sequence of states that can be reached from a given state.
Dayan derived it in the tabular case, but let’s do it when assuming a feature vector $\phi$.
We assumes that the reward function can be factorised linearly: $$r(s) = \phi(s) \cdot w$$

© 2017 Feryal Behbahani · Powered by the Academic theme for Hugo.