Successor Representations

Successor representations were introduced by Dayan in 1993, as a way to represent states by thinking of how “similarity” for TD learning is similar to the temporal sequence of states that can be reached from a given state. Dayan derived it in the tabular case, but let’s do it when assuming a feature vector $\phi$. We assumes that the reward function can be factorised linearly: $$r(s) = \phi(s) \cdot w$$