Automated Curriculum Learning for Reinforcement Learning

Abstract

How would you teach an artificial agent to learn a complex problem, requiring multiple skills but which has very sparse rewards and where you can fail easily? Humans would tackle this hard problem by splitting that into simpler ones, and learn them one by one. We would even rely on a teacher to help split the task and decides on when to tackle each sub-problem to learn everything faster. We can do the same in the context of neural networks, by introducing a Teacher network which learns to generate a curriculum of simpler tasks, so that overall the Student network can learn to solve the complex task. In this work, we set out to train an automatic curriculum generator for solving complex sparse reward tasks. A teacher network tracks the progress of the student network, and proposes new tasks as a function of how well the student is learning. We propose an environment, CraftEnv, for designing multi-step crafting minigames. It supports fully flexible and hierarchical crafting tasks, covering a wide range of difficulty. We adapted an state-of-the-art distributed reinforcement learning algorithm, IMPALA [Espeholt, 2018], for training the Student network, while using an adversarial multi-armed bandit algorithm [Auer, 2003], for the Teacher network. We show that using an automated curriculum helped our agent to solve more complex tasks in our environment, where a random curriculum would fail or be too slow. Moreover, the Teacher showed interesting dynamics of task proposals, varying the tasks presented to the agent depending on their usefulness to drive learning. This allowed the student to learn incrementally, solving simple tasks first and progressing to solve more challenging tasks. We analysed how using different signals [Graves, 2017] for quantifying student progress (e.g. Return gain, Gradient prediction gain) affects the curriculum that the teacher learns to propose. Finally, we demonstrate that this approach can accelerate learning and interpretability of how the agent is learning to perform complex tasks. Just like kids learning, our algorithm allows the model to learn incrementally, solve simple tasks and transfer to more complex settings. This research has interesting potential for future research where the Teacher could be extended to take other signals into account e.g. safety requirements through multi objective optimisation. Furthermore, this work could have implications for automated tutoring systems, where one could adapt teaching to individual students.

Publication
13th annual Women in Machine Learning workshop (WiML) 2018
Date