Feryal Behbahani

Research Scientist

DeepMind

Biography

I am a Research Scientist at DeepMind working on Reinforcement Learning.

Previously, I was a Research Scientist leading the learning team at Latent Logic (now part of Waymo) where our team focused on Deep Reinforcement Learning and Learning from Demonstration techniques to generate human-like behaviour that can be applied to data-driven simulators, game engines and robotics.

I received my PhD from the Department of Computing at Imperial College London where I studied Computational Neuroscience and Machine Learning at the Brain and Behaviour Lab. My main research focused on investigating the underlying algorithms employed by the human brain for object representation and inference. I stayed for a Postdoc in the lab to continue my research on investigating the dynamics of uncertainty in sensorimotor perception. I previously obtained my MSc in Artificial Intelligence with distinction at Imperial College London.

I have also worked on several projects building machine learning solutions for a variety of problems as part of a technology consultancy start-up I co-founded. More recently, as a Visiting Postdoctoral Researcher at the BICV Group at Imperial College London, I worked on transfer learning and Deep Reinforcement Learning applied to control a Jaco robotic arm.

See what I’m doing now!

Interests

Reinforcement Learning
Meta Learning
Lifelong Learning
Representation learning
Imitation learning
Program Synthesis and Induction

Education

Ph.D. in Computing, 2016

Imperial College London
MSc in Artificial Intelligence with distinction, 2012

Imperial College London
BSc in Information Technology with First Class Honours (1st in class), 2010

Herriot Watt University

Selected Publications

Acme: A Research Framework for Distributed Reinforcement Learning

Deep reinforcement learning has led to many recent-and groundbreaking-advancements. However, these advances have often come at the cost of both the scale and complexity of the underlying RL algorithms. Increases in complexity have in turn made it more difficult for researchers to reproduce published RL algorithms or rapidly prototype ideas. To address this, we introduce Acme, a tool to simplify the development of novel RL algorithms that is specifically designed to enable simple agent implementations that can be run at various scales of execution. Our aim is also to make the results of various RL algorithms developed in academia and industrial labs easier to reproduce and extend. To this end we are releasing baseline implementations of various algorithms, created using our framework. In this work we introduce the major design decisions behind Acme and show how these are used to construct these baselines. We also experiment with these agents at different scales of both complexity and computation-including distributed versions. Ultimately, we show that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.

Matt Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas

Details PDF Code Project

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation

Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One popular method for achieving transferability is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap. However, less work has gone into understanding such agents - which are deployed in the real world - beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation. We train agents for Fetch and Jaco robots on a visuomotor control task and evaluate how well they generalise using different testing conditions. Finally, we investigate the internals of the trained agents by using a suite of interpretability techniques. Our results show that the primary outcome of domain randomisation is more robust, entangled representations, accompanied with larger weights with greater spatial structure; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Additionally, we demonstrate that our domain randomised agents require higher sample complexity, can overfit and more heavily rely on recurrent processing. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the combination of inspection tools in order to provide sufficient insights into the behaviour of trained agents.

Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath

Details PDF

Modular Meta-Learning with Shrinkage

The modular nature of deep networks allows some components to learn general features, while others learn more task-specific features. When a deep model is then fine-tuned on a new task, each component adapts differently. For example, the input layers of an image classification convnet typically adapt very little, while the output layers may change significantly. However, standard meta-learning approaches ignore this variability and either adapt all modules equally or hand-pick a subset to adapt. This can result in overfitting and wasted computation during adaptation. In this work, we develop techniques based on Bayesian shrinkage to meta-learn how task-independent each module is and to regularize it accordingly. We show that various recent meta-learning algorithms, such as MAML and Reptile, are special cases of our formulation in the limit of no regularization. Empirically, our approach discovers a small subset of modules to adapt, and improves performance. Notably, our method finds that the final layer is not always the best layer to adapt, contradicting standard practices in the literature.

Yutian Chen, Abram L. Friesen, Feryal Behbahani, Arnaud Doucet, David Budden, Matthew W. Hoffman, Nando de Freitas

Details PDF

Learning from Demonstration in the Wild

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on artificially generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviour that was occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose video to behaviour (ViBe), a new approach to learning models of road user behaviour that requires as input only unlabelled raw video data of a traffic scene collected from a single, monocular, uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.

Feryal Behbahani, Kyriacos Shiarlis, Xi Chen, Vitaly Kurin, Sudhanshu Kasewa, Ciprian Stirbu, João Gomes, Supratik Paul, Frans A. Oliehoek, João Messias, Shimon Whiteson

ICRA 2019

Details PDF Slides Video Project

Reverse-Engineering Human Visual and Haptic Perceptual Algorithms

Intelligent behaviour is fundamentally tied to the ability of the brain to make decisions in uncertain and dynamic environments. In neuroscience, the generative framework of Bayesian Decision Theory has emerged as a principled way to predict how the brain acts in the face of uncertainty. In the first part of my thesis, I study the question of how humans learn to perform a visual object categorisation task. I present a novel experimental paradigm to assess whether people use generative Bayesian principles as a general strategy. We found that humans indeed perform in a generative manner, but resort to approximate inference when faced with complex computations. In the second part, I consider how one would build a Bayesian ideal observer model of human haptic perception and object recognition, using MuJoCo as an environment. Our model can, using only noisy contact point information on the surface of the hand and noisy hand proprioception, simultaneously infer the shape of simple objects together with an estimation of the true hand pose in space. This is implemented using a recursive Bayesian estimation algorithm, inspired by simultaneous localisation and mapping (SLAM) methods in robotics, which can operate on computer-based physical simulations as well as experimental data from human subjects.

Feryal M P Behbahani

Thesis

Details

Haptic SLAM: An Ideal Observer Model for Bayesian Inference of Object Shape and Hand Pose from Contact Dynamics

Dynamic tactile exploration enables humans to seamlessly estimate the shape of objects and distinguish them from one another in the complete absence of visual information. Such a blind tactile exploration allows integrating information of the hand pose and contacts on the skin to form a coherent representation of the object shape. A principled way to understand the underlying neural computations of human haptic perception is through normative modelling. We propose a Bayesian perceptual model for recursive integration of noisy proprioceptive hand pose with noisy skin–object contacts. The model simultaneously forms an optimal estimate of the true hand pose and a representation of the explored shape in an object–centred coordinate system. A classification algorithm can, thus, be applied in order to distinguish among different objects solely based on the similarity of their representations. This enables the comparison, in real–time, of the shape of an object identified by human subjects with the shape of the same object predicted by our model using motion capture data. Therefore, our work provides a framework for a principled study of human haptic exploration of complex objects.

In Haptics: Perception, Devices, Control, and Applications. Lecture Notes in Computer Science (Finalist for best paper at EuroHaptics 2016).

Details PDF Book chapter

Recent Publications

More Publications

Acme: A Research Framework for Distributed Reinforcement Learning
Details PDF Code Project
Privileged Information Dropout in Reinforcement Learning
Details PDF
Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation
Details PDF
Modular Meta-Learning with Shrinkage
Details PDF
Learning from Demonstration in the Wild
Details PDF Slides Video Project
Automated Curriculum Learning for Reinforcement Learning
Details Code Project
Extending World Models for Multi-Agent Reinforcement Learning in MALMÖ
Details PDF
Human Visual classification reflects Bayesian generative representations (under submission)
Details
Dynamics of uncertainty in sensorimotor estimation across time (under submission)
Details
Reverse-Engineering Human Visual and Haptic Perceptual Algorithms
Details

Recent Talks

Learning from Demonstration: Applications and challenges
Mon, Nov 26, 2018, Distinguished Speakers: Oxford Women in Computer Science
Automated Curriculum Learning for Reinforcement Learning
Sat, Jul 28, 2018, Jeju Deep Learning Summer School
What Would it Take to Train an Agent to Play with a Shape-Sorter?
Fri, Sep 22, 2017, RE-WORK Deep Learning Summit London

Projects

Tutorial on RL (EEML 2020)

The tutorial covers a number of important reinforcement learning (RL) algorithms, including policy iteration, Q-Learning, and Neural Fitted Q and DQN in JAX. In the first part, we will guide you through the general interaction between RL agents and environments, where the agents ought to take actions in order to maximize returns (i.e. cumulative reward). Next, we will implement Policy Iteration, SARSA, and Q-Learning for a simple tabular environment. The core ideas in the latter will be scaled to more complex MDPs through the use of function approximation. Lastly, we will provide a short introduction to deep reinforcement learning and the DQN algorithm.

Learning from Demonstration in the Wild

My blog post for our recent paper, which presents a novel method for learning from demonstration in the wild that can leverage abundance of freely available videos of natural behaviour. We propose ViBe, a new approach to learning models of behaviour that requires as input only unlabelled raw video data. Our method calibrates the camera, detects relevant objects, tracks them reliably through time, and uses the resulting trajectories to learn policies via imitation. We introduce Horizon GAIL, an extension to GAIL that uses a novel curriculum to help stabilise learning.

Automated Curriculum Learning for Reinforcement Learning

In this project, I set out to train an automatic curriculum generator using a teacher network (Multi-Armed Bandit) which keeps track of the progress of the student network (IMAPALA), and proposes new tasks as a function of how well the student is learning.

Craft Environment

CraftEnv is a 2D crafting environment that supports a fully flexible setup of hierarchical tasks, with sparse rewards, in a fully procedural setting.

End-to-end control: A3C-MuJoCo

Applying end-to-end learning to solve pixel-driven control where learning is accomplished using Asynchronous Advantage Actor-Critic (A3C) method with sparse rewards.

Now

What I’m doing now

(This is a now page!)

I’ll be giving a lecture and a tutorial on Reinforcement Learning in August at MLSS.

Recently we organised a breakout session on Continual Reinforcement Learning at ICML WiML Un-workshop which was very exciting, we will release the session notes on our website.

I really enjoyed participating in the EEML Summer school and doing a tutorial on reinforcement learning, if you want to learn more check out the tutorial here.

Our NeurIPS 2019 Workshop on Biological and Artificial Reinforcement Learning got accepted!

I’m very excited to announce that I have joined Nando de Freitas’ team at DeepMind as a Research Scientist.

I’m very excited to be a mentor for OpenAI’s Scholars in Winter!

I’m really looking forward to attending SOCML 2018 and moderating the session on Curiosity-Driven RL! Find the session notes here! We’re welcoming any feedback you might have as we are preparing a report to release soon!

This past summer, I had the chance to join Jeju DL Camp, held in beautiful Jeju Island in South Korea, and work on what I’m really passionate about: Automated Curriculum Learning for RL! You can find the open-source code and a brief summary here: Automated Curriculum Learning for RL and check the slides for the talk I gave at the TensorFlow Summit in Korea. I’m currently working on a blog post for the project and I’ll present this work at the WiML Workshop in December!

I’m also a mentor for WiML Workshop 2018 and very excited to meet all the amazing students doing incredible research in ML!

I co-organised a Workshop on Learning from Demonstrations for high-level robotic tasks at RSS 2018 in June. I’m currently writing a survey paper on the role of imitation for learning, investigating challenges and exciting future directions.

I’m currently reading the World as Laboratory: Experiments with Mice, Mazes and Men which is a fascinating and at times disturbing account of animal and human experiments carried out during 20th century!

Contact

feryal.mp@gmail.com

Feryal Behbahani

Research Scientist

DeepMind

Biography

Interests

Education

Selected Publications

Acme: A Research Framework for Distributed Reinforcement Learning

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation

Modular Meta-Learning with Shrinkage

Learning from Demonstration in the Wild

Reverse-Engineering Human Visual and Haptic Perceptual Algorithms

Haptic SLAM: An Ideal Observer Model for Bayesian Inference of Object Shape and Hand Pose from Contact Dynamics

Recent Publications

Recent Talks

Projects

Tutorial on RL (EEML 2020)

Learning from Demonstration in the Wild

Automated Curriculum Learning for Reinforcement Learning

Craft Environment

End-to-end control: A3C-MuJoCo

Recent Posts

Successor Representations

Now

Contact