BASALT Minecraft competition aims to advance reinforcement learning

Deep reinforcement learning, a subfield of machine learning that combines reinforcement learning and deep learning, takes what’s known as a reward function and learns to maximize the expected total reward. This works remarkably well, enabling systems to figure out how to solve Rubik’s Cubes, beat world champions at chess, and more. But existing algorithms have a problem: They implicitly assume access to a perfect specification. In reality, tasks don’t come prepackaged with rewards — those rewards come from imperfect human reward designers. And it can be difficult to translate conceptual preferences into reward functions environments can calculate.

To solve this problem, researchers at DeepMind and the University of California, Berkeley, have launched a competition called BASALT, where the goal of an AI system must be communicated through demonstrations, preferences, or some other form of human feedback. Built on Minecraft, systems in BASALT must learn the details of specific tasks from human feedback, choosing among a wide variety of actions to perform.

Please follow and like us:
en_US