# Selected Publications

### Learning from Demonstration in the Wild

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on artificially generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviour that was occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose video to behaviour (ViBe), a new approach to learning models of road user behaviour that requires as input only unlabelled raw video data of a traffic scene collected from a single, monocular, uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.

### Reverse-Engineering Human Visual and Haptic Perceptual Algorithms

Intelligent behaviour is fundamentally tied to the ability of the brain to make decisions in uncertain and dynamic environments. In neuroscience, the generative framework of Bayesian Decision Theory has emerged as a principled way to predict how the brain acts in the face of uncertainty. In the first part of my thesis, I study the question of how humans learn to perform a visual object categorisation task. I present a novel experimental paradigm to assess whether people use generative Bayesian principles as a general strategy. We found that humans indeed perform in a generative manner, but resort to approximate inference when faced with complex computations. In the second part, I consider how one would build a Bayesian ideal observer model of human haptic perception and object recognition, using MuJoCo as an environment. Our model can, using only noisy contact point information on the surface of the hand and noisy hand proprioception, simultaneously infer the shape of simple objects together with an estimation of the true hand pose in space. This is implemented using a recursive Bayesian estimation algorithm, inspired by simultaneous localisation and mapping (SLAM) methods in robotics, which can operate on computer-based physical simulations as well as experimental data from human subjects.
Thesis

### Haptic SLAM: An Ideal Observer Model for Bayesian Inference of Object Shape and Hand Pose from Contact Dynamics

Dynamic tactile exploration enables humans to seamlessly estimate the shape of objects and distinguish them from one another in the complete absence of visual information. Such a blind tactile exploration allows integrating information of the hand pose and contacts on the skin to form a coherent representation of the object shape. A principled way to understand the underlying neural computations of human haptic perception is through normative modelling. We propose a Bayesian perceptual model for recursive integration of noisy proprioceptive hand pose with noisy skin–object contacts. The model simultaneously forms an optimal estimate of the true hand pose and a representation of the explored shape in an object–centred coordinate system. A classification algorithm can, thus, be applied in order to distinguish among different objects solely based on the similarity of their representations. This enables the comparison, in real–time, of the shape of an object identified by human subjects with the shape of the same object predicted by our model using motion capture data. Therefore, our work provides a framework for a principled study of human haptic exploration of complex objects.
In Haptics: Perception, Devices, Control, and Applications. Lecture Notes in Computer Science (Finalist for best paper at EuroHaptics 2016).

# Recent Publications

• Learning from Demonstration in the Wild

• Automated Curriculum Learning for Reinforcement Learning

• Extending World Models for Multi-Agent Reinforcement Learning in MALMÖ

• Human Visual classification reflects Bayesian generative representations (under submission)

• Dynamics of uncertainty in sensorimotor estimation across time (under submission)

• Reverse-Engineering Human Visual and Haptic Perceptual Algorithms

• Haptic SLAM: An Ideal Observer Model for Bayesian Inference of Object Shape and Hand Pose from Contact Dynamics

• The emergence of decision boundaries is predicted by second order statistics of stimuli in visual categorization

• Haptic SLAM for Context-Aware Robotic Hand Prosthetics - Simultaneous Inference of Hand Pose and Object Shape Using Particle Filters

• Visual categorization reflects second order generative statistics

# Projects

#### Learning from Demonstration in the Wild

My blog post for our recent paper, which presents a novel method for learning from demonstration in the wild that can leverage abundance of freely available videos of natural behaviour. We propose ViBe, a new approach to learning models of behaviour that requires as input only unlabelled raw video data. Our method calibrates the camera, detects relevant objects, tracks them reliably through time, and uses the resulting trajectories to learn policies via imitation. We introduce Horizon GAIL, an extension to GAIL that uses a novel curriculum to help stabilise learning.

#### Automated Curriculum Learning for Reinforcement Learning

In this project, I set out to train an automatic curriculum generator using a teacher network (Multi-Armed Bandit) which keeps track of the progress of the student network (IMAPALA), and proposes new tasks as a function of how well the student is learning.

#### Craft Environment

CraftEnv is a 2D crafting environment that supports a fully flexible setup of hierarchical tasks, with sparse rewards, in a fully procedural setting.

#### End-to-end control: A3C-MuJoCo

Applying end-to-end learning to solve pixel-driven control where learning is accomplished using Asynchronous Advantage Actor-Critic (A3C) method with sparse rewards.

# Recent Posts

### Successor Representations

Successor representations were introduced by Dayan in 1993, as a way to represent states by thinking of how “similarity” for TD learning is similar to the temporal sequence of states that can be reached from a given state. Dayan derived it in the tabular case, but let’s do it when assuming a feature vector $\phi$. We assumes that the reward function can be factorised linearly: $$r(s) = \phi(s) \cdot w$$

# Now

What I’m doing now

(This is a now page!)

I’m very excited to be a mentor for OpenAI’s Scholars in Winter!

I’m really looking forward to attending SOCML 2018!

This past summer, I had the chance to join Jeju DL Camp, held in beautiful Jeju Island in South Korea, and work on what I’m really passionate about: Automated Curriculum Learning for RL! You can find the open-source code and a brief summary here: Automated Curriculum Learning for RL and check the slides for the talk I gave at the TensorFlow Summit in Korea. I’m currently working on a blog post for the project and I’ll present this work at the WiML Workshop in December!

I’m also a mentor for WiML Workshop 2018 and very excited to meet all the amazing students doing incredible research in ML!

I co-organised a Workshop on Learning from Demonstrations for high-level robotic tasks at RSS 2018 in June. I’m currently writing a survey paper on the role of imitation for learning, investigating challenges and exciting future directions.

I’m currently reading the World as Laboratory: Experiments with Mice, Mazes and Men which is a fascinating and at times disturbing account of animal and human experiments carried out during 20th century!