metaschool
provides a simple multi-task interface on top of OpenAI Gym environments.
Our hope is that it will simplify research in mutli-task, lifelong, and meta-reinforcement learning.
- Website: learnables.net/metaschool
- Slack: slack.learn2learn.net
- Mujoco Locomotion: https://arxiv.org/abs/1703.03400
- Meta-World: https://arxiv.org/abs/1910.10897
- MSR Jumping: https://arxiv.org/abs/1809.02591
- ... and more since metaschool is easily extensible.
At its essence, metaschool
builds on 3 basic classes:
EnvFactory
: A class to generate base Gym environments, given a configuration.WrapperFactory
: An (optional) class to generate wrappers, given a configuration.TaskConfig
: A simple dict-like object to configure tasks.
Now, say we have a base Gym environment DrivingEnv-v0
for training self-driving system in different conditions (locations, weather, car maker).
We can turn this environment into a set of multiple tasks as follows.
Note: we also use a GymTaskset
, which lets us automatically sample and keep track of tasks.
import metaschool as ms
class DrivingFactory(ms.EnvFactory): # defines how to sample new base environments
def make(self, config):
env = gym.make('DrivingEnv-v0', **config)
env = gym.wrappers.RecordEpisodeStatistics(env)
return env
def sample(self):
config = ms.TaskConfig(location='Town2', weather='Sunny') # TaskConfig is a dict-like configuration
config.color = random.choice(['Volvo', 'Mercedes', 'Audi', 'Hummer'])
return config
class ChangingHorizonFactory(ms.WrapperFactory): # let's us randomize base envs with wrappers
def wrap(self, env, config):
return gym.wrappers.TimeLimit(env, max_episode_steps=config.max_steps)
def sample(self, env=None):
return ms.TaskConfig(max_steps=random.randint(20, 200))
taskset = GymTaskset( # helps us create, replicate, and track tasks
env_factory=DrivingFactory(),
wrapper_factories=[ChangingHorizonFactory(), ],
)
for iteration in range(num_iterations): # learning over multiple tasks
train_task = taskset.sample() # train_task is a TimeLimit(RecordEpisodeStatistics(DrivingEnv)) with randomized configurations
learner.learn(train_task)
for seen_task in iter(taskset): # loops over all previously seen tasks
loss = learner.eval(test_task)
A human-readable changelog is available in the CHANGELOG.md file.
To cite this code in your academic publications, please use the following reference.
Arnold, Sebastien M. R. “metaschool: A Gym Interface for Multi-Task Reinforcement Learning”. 2022.
You can also use the following Bibtex entry.
@software{Arnold20222022,
author = {Arnold, Sebastien M. R.},
doi = {10.5281/zenodo.1234},
month = {12},
title = {{metaschool: A Gym Interface for Multi-Task Reinforcement Learning}},
url = {https://github.com/learnables/metaschool},
version = {0.0.1},
year = {2022}
}