Based on the helicopter simulator from Andrew Ng's group, agents must control a helicopter which is attempting to stably hover. Challenges include:
  • Dynamics: Wind effects and complicated nonlinear dynamics make this a challenging problem.
  • Explore / exploit: this domain includes the catastrophic event of crashing. Agents must explore carefully to avoid unrecoverable errors.

Get more details on the Helicopter domain.

Download the training Helicopter domain.

Competitors must code a general purpose RL agent. Agents are tested on a variety of different MDPs which do not exhibit systematic structure between themselves. This forces the agent to learn quickly and reason flexibly about general MDPs. Challenges include:
  • Explore / exploit: in a general MDP, the explore/exploit dilemma is key. Although some theoretical analyses exist for different algorithms that navigate this tradeoff, which will perform best in practice?
  • Structure learning: is there structure in the space of rewards or state transitions?
  • Aggregation: can states be aggregated, either to learn an improved model or accelerate planning?

Get more details on the Polyathlon domain.

Download the training Polyathlon domain.

3. Invasive species

Get more details on the Invasive species domain.

Download the training invasive species domain.