Game Stack Blog

Twitter icon

MineRL sample-efficient reinforcement learning challenge

​​​Minecraft game image, diamond block infront of player in 1st person with a torch in left hand and stone pick in the right

To unearth a diamond in the block-based open world of Minecraft requires the acquisition of materials and the construction of tools before any diamond mining can even begin. Players need to gather wood, which they'll use to make a wood pickaxe for mining stone underground. They'll use the stone to fashion a stone pickaxe and, with the tool upgrade, mine iron ore. They'll build a furnace for smelting the iron and use that to make the iron pickaxe they need to start their search for the precious gem. Each task results in a stronger tool and brings players closer to retrieving that coveted diamond. The efforts to organize competitions designed to advance the state of the art feel quite similar. In fact, the research process in general feels quite similar. As you gain more knowledge and experience, forge new and deeper collaborations, and leverage and improve resources—collectively, stronger tools—uncovering a gem or many of them in the form of breakthroughs and promising new directions to explore becomes more reachable.

After none of the submitted agents were able to obtain a diamond in Minecraft during last year's MineRL competition, the sample-efficient reinforcement learning challenge is back and even better thanks to an additional dataset, structural changes that we see contributing to a more robust and wider-appealing contest, and the addition of DeepMind and OpenAI to the organizing team. MineRL 2020 is the fourth competition based on Project Malmo, an experimentation platform using Minecraft to advance AI. MineRL, the brainchild of a team of researchers from Carnegie Mellon University, tackles an ambitious problem facing the machine learning community: an increasing demand for large amounts of computational resources to replicate state-of-the-art research. Solving this challenge is key to making AI more accessible. To encourage the kind of efficiency in the RL space that will help make that possible, MineRL participants are limited in the amount of data they can use and time they can spend training an agent to complete the competition task of mining a diamond—no more than 8 million samples over four days or less using a single GPU machine.

MineRL 2020​, hosted and supported again by the competition platform AIcrowd, is part of the competition track at this year's Conference on Neural Information Processing Systems (NeurIPS 2020). Last year, the competition was also included in the conference lineup. Over 1,000 participants registered, and more than 50 people attended the affiliated NeurIPS workshop, during which the top teams presented their creative approaches. Microsoft is delighted to once again be among the organizers of a competition that is truly a result of great teamwork.​

Leveling the playing field

At its core, the MineRL competition is about lowering the barrier to entry, encouraging the research community to devise solutions that don't require the increasing amounts of samples and resources currently needed, which is what drew CMU PhD student Stephanie Milani to the organizing committee last year. Despite having played Minecraft and used Project Malmo as a research tool, Milani didn't participate in either of the first two Malmo competitions. For someone relatively new to ML research like herself, the challenges felt too "daunting," she said. Not so with MineRL, which brings together reinforcement and imitation learning with a large-scale dataset of human demonstrations. She had actually been involved in developing the dataset, helping with revisions to the dataset paper and contributing samples. When she heard there might be a competition around the dataset, she knew she wanted to be involved with organizing it. She was intrigued by its potential to promote sample efficiency, to allow people with limited resources to break in to machine learning, and to help democratize AI. "Groups with access to massive computational resources can train their learning algorithms for thousands of years on the desired task; the average person cannot do that," she said. "Constraining the computational resources available to train the submitted algorithms is one step toward leveling the playing field: Everyone's algorithm is evaluated using the same number of environment interactions and computational resources."​

Lead organizer William Guss, a CMU PhD student and research scientist at OpenAI, describes the benefits of lowering the barrier to entry as twofold: From a social justice perspective, it helps ensure the engineering and benefits of AI aren't concentrated among only those with access to large amounts of resources. From a science perspective, it means more diverse solutions. "We often see in science that the best innovations come from left field; those who can see the field from a higher purview than just recombining old ideas," said Guss.

Last year, the competition comprised a "demonstrations and environment" track in which participants were able to train their agent using the human demonstrations and 8 million Minecraft interactions. This year, the addition of a second track—human demonstrations only—not only addresses broader research interests but also effectively makes machine learning more accessible. As Guss explains, making a widely available dataset allows anyone who has access to an internet connection to leverage just the dataset without having to re-simulate an environment, which can be expensive. (Returning participants will also notice another change—the introduction of action and observation obfuscation, a technique through which the semantic mechanisms of the game are hidden using an autoencoder. The change—motivated by the use of hierarchal RL in last year's submissions—is designed to encourage domain-agnostic solutions.)​

Robust baselines and plenty of quality data

A key challenge in making competitions more accessible to people with different levels of interest, expertise, and resource access is the preparation of a good set of baselines they can use to ramp up the task and environment and leverage in their solutions. That's actually what we learned in the previous Project Malmo–based competitions. Preferred Networks (PFN), a tech startup Microsoft has been collaborating with for years to make deep learning technologies more easily available, joined the organizing committee last year to help on that front.

The company's intensive work resulted in an extensive set of excellent baselines, which utilized its deep learning framework Chainer and included behavioral cloning, Deep Q-learning from Demonstrations (DQfD), Rainbow, and proximal policy optimization (PPO). These baselines were well received by participants; about 40% of the entries used Chainer code in their submissions. Some of the algorithms PFN made baselines for hadn't been replicated before. This increased the visibility of those algorithms, as well as the number of algorithms included in PFN's ChainerRL deep reinforcement learning library, resulting in a common resource for the research community. Developing the baselines required PFN to become an early tester of the MineRL competition platform, and the company worked closely with the CMU team and AIcrowd to validate and improve the platform and dataset. Milani and Guss both describe the company's involvement and contributions as crucial to the success and accessibility of the competition. We're lucky to have PFN return as part of the organizing committee in 2020; the company will again be preparing baselines, making adjustments to accommodate the observation obfuscation element of the competition.​

The resource making the contest possible—the data—has undergone a change of its own for the 2020 competition. Last year, the CMU team released MineRL-v0. Built using a novel end-to-end platform for recording and automatically labeling samples, the dataset consists of more than 60 million frames of human demonstrations isolating four classes of structured, goal-based Minecraft tasks, most of which are required to mine a diamond. This year, the competition is also providing a "survival" dataset. It comprises millions of frames of human players freely exploring and interacting with Minecraft to accomplish whatever unique goals they've set, an indicator of the CMU team's vision of moving the competition toward more general problem-solving.

Competition diamonds: Compelling research contributions and organizer growth

Last year's competition was delivered quite successfully. We received great coverage and had a strong turnout in participation and workshop attendance, and the submissions were impressive, including the use of a discriminator soft actor critic and a hierarchical Deep Q-Network. The response reinforces how effective competitions can be in bringing together the expertise of academia, industry, and the larger research community to move research forward. In our case, it also attracted the insights of those outside of tech—Minecraft fans with no ML background. And the value extends beyond the sample-efficient RL solutions submitted.​

The research community benefits from a growing and comprehensive dataset of quality human priors, a library of deep learning baselines, and a framework that can be used to explore new challenges, like multi-agent coordination, even after the submission period closes. As the competition and research evolves, I also see growth on the parts of members of the organizing committee. PFN credits making last year's baselines for accelerating the development of ChainerRL and has since announced it is migrating its deep learning research platform from Chainer to the widely used PyTorch framework, a move the company believes will take it in a more exciting direction in serving the research community. PFN will be using this newly released deep RL library for PyTorch users, PFRL, to implement the competition baselines. And since the 2019 contest, Guss and organizer Brandon Houghton have both moved on to opportunities to extend their research agenda in industry. I'm very glad to know the competition and the platform helped these committee members further develop their impact and their careers, respectively.

The Minecraft diamond may still be up for grabs, but as far as I'm concerned, the MineRL competition has already unearthed some gems, and my Microsoft Research colleagues and I feel privileged to be a part of this fantastic competition.

The MineRL competition organizing team

William H. Guss, OpenAI and Carnegie Mellon University
Brandon Houghton, OpenAI and Carnegie Mellon University
Stephanie Milani, Carnegie Mellon University
Nicholay Topin, Carnegie Mellon University
Ruslan Salakhutdinov, Carnegie Mellon University
John Schulman, OpenAI
Mario Ynocente Castro, Preferred Networks
Crissman Loomis, Preferred Networks
Keisuke Nakata, Preferred Networks
Shinya Shiroshita, Preferred Networks
Avinash Ummadisingu, Preferred Networks
Sharada Mohanty, AIcrowd
Sam Devlin, Microsoft Research
Noboru Sean Kuno, Microsoft Research
Oriol Vinyals, DeepMind

The MineRL competition advisory committee

Fei Fang, Carnegie Mellon University
Zachary Chase Lipton, Carnegie Mellon University
Manuela Veloso, Carnegie Mellon University and JPMorgan Chase
David Ha, Google Brain
Chelsea Finn, Google Brain and UC Berkeley
Anca Dragan, UC Berkeley
Sergey Levine, UC Berkeley​