Domains‎ > ‎


In the future robots will be used in many homes, offices and construction sites. It would be useful if these robots could learn to perform new tasks on-the-job with little dependance on human guidance and training. The polyathlon is meant to simulate this senerio: the agent faces a series of unknown tasks. The agent must learn, online, how to solve each task without any prior task knowledge or pretraining. The polyathlon raises a number of interesting algorithmic challenges, such as transfer learning, feature construction, adaptive representations and parameter-free learning.

Observation Space: 6 dimensional, continuous valued in [0,1]
Action Space: 6 discrete actions
Rewards: Reward range (maybe loose) will be provided by the task spec.

All problems are episodic.
All problems are roughly Markov.
Problems may have stochastic state transitions and reward functions.

We've "normalized" a number of classic and new reinforcement learning problems so that they all look identical in terms of their task spec. The agent will not easily be able to identify what domain the MDP represents. Some MDPs WILL have redundant actions and unnecessary observations. Some observations may be warped in weird ways.

Christos Dimitrakakis,
Apr 22, 2013, 1:19 PM