Machine learning news and insights

Emergent Tool Use from Multi-Agent Interaction on 17 September

We've observed agents discovering progressively more complex tool use while playing a simple game of hide-and-seek. Through training in our new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which we did not know our environment supported. The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day...

AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers on 12 September

Reinforcement Learning (RL) algorithms have recently demonstrated impressive results in challenging problem domains such as robotic manipulation, Go, and Atari games. But, RL algorithms typically require a large number of interactions with the environment to train policies that solve new tasks, since they begin with no knowledge whatsoever about the task and rely on random exploration of their possible actions...

Policy Certificates and Minimax-Optimal PAC Bounds for Episodic Reinforcement Learning on 28 August

Figure 1: Comparison of existing algorithms without policy certificates (top) and with our proposed policy certificates (bottom). While in existing reinforcement learning the user has no information about how well the algorithm will perform in the next episode, we propose that algorithms output policy certificates before playing an episode to allow users to intervene if necessary.