The safe-adaptation-gym
is a benchmark suite for testing how agents perform in a multitask and adaptation settings where safety is crucial.
It is based on Safety Gym and extends its 3 tasks to 8 different task prototypes.
To find more details about evaluation metrics and formulation, please see our research paper.
- Open a new terminal and
git clone
the repo. - Create a new environment with your favorite environment manager (venv, conda). Make sure to have it set up with
python >= 3.8.13
. - Install dependencies with
cd safe-adaptation-gym && pip install .
. - The following snippet demostrates how to use the
safe-adapatation-gym
on a specific task:
import safe-adaptation-gym
from safe_adaptation_gym import tasks
# Define important parameters.
robot = 'point'
seed = 666
task_name = 'go_to_goal'
config = {'obstacles_size_noise_scale': 1.})
rgb_observation = False # use first-person-view observations, or only pseudo-lidar
render_options = {'camera_id': 'fixedfar', 'height': 320, 'width': 320}
render_lidar_and_collision = True # render human-supportive visualization (slight computation slowdown)
# Make a new simulation environment.
env = safe_adaptation_gym.make(robot,
task,
seed,
config,
rbg_observation,
render_options,
render_lidar_and_collision
)
policy = lambda obs: env.action_space.sample() # define a uniformly random policy
# One RL interaction.
observation = env.reset()
action = policy(observation)
next_observation, reward, done, info = env.step(action)
cost = info.get('cost', 0.)
# Set a new task.
env.set_task(tasks.HaulBox())
# One RL interaction.
observation = env.reset()
action = policy(observation)
next_observation, reward, done, info = env.step(action)
In order to reproduce our results, define a task sampler:
import safety_gym
from safe_adaptation_gym import benchmark
# Define important parameters.
robot = 'doggo'
seed = 666
env = safe_adaptation_gym.make(robot, seed)
policy = lambda obs: env.action_space.sample() # define a uniformly random policy
benchmark_name = 'multitask' # can also be 'task_adaptation', in which case, some tasks prototypes are held out.
batch_size = 30
task_sampler = benchmark.make(benchmark_name, batch_size, seed)
# Iterate over training task prototypes
for task_name, task in task_sampler.train_tasks:
observation = env.reset(options={'task': task}) # We can also set new tasks as we reset to a new episode
action = policy(observation)
next_observation, reward, done, info = env.step(action)
cost = info.get('cost', 0.)
# Iterate over held-out task prototypes
for task_name, task in task_sampler.test_tasks:
env.reset(options={'task': task})
action = policy(observation)
next_observation, reward, done, info = env.step(action)
cost = info.get('cost', 0.)