Vectorized environments #1513

tristandeleu · 2019-06-02T01:05:44Z

One of the most useful feature in openai/baselines is the ability to run multiple environments in parallel with the SubprocVecEnv wrapper. However in order to benefit from this feature, it is necessary to install the whole baselines package, which includes some heavy dependencies like Tensorflow. To avoid this dependency, starting a new project often requires to copy/paste the snippet of code for SubprocVecEnv. I am proposing to add this functionality in Gym, so that this feature can be used directly out of the box.

TL;DR

import gym
env = gym.vector.make('CartPole-v1', num_envs=5)
observations = env.reset()
# array([[-0.04071225, -0.00272771,  0.04585911,  0.03397448],
#        [ 0.04807664, -0.00466015,  0.03836329, -0.04897803],
#        [ 0.03999503,  0.03953767, -0.02474664, -0.00865895],
#        [-0.03199483,  0.01612429, -0.03704762, -0.02390875],
#        [ 0.00967298,  0.01544605,  0.04391582,  0.0040252 ]],
#       dtype=float32)

Adding a base class VectorEnv, inheriting from gym.Env. This ensures that the vectorized environments are still valid instances of gym.Env. In particular, the step method still returns a tuple (observations, rewards, dones, infos). However, the difference is that rewards and dones are now arrays (instead of scalars).
Adding SyncVectorEnv and AsyncVectorEnv. Roughly speaking, SyncVectorEnv is equivalent to DummyVecEnv, and AsyncVectorEnv is a unified API for both SubprocVecEnv and ShmemVecEnv.
- SyncVectorEnv runs the environments serially.
- AsyncVectorEnv uses multiprocessing, and runs the environments in parallel.
AsyncVectorEnv has the possibility to use shared memory across processes, to improve the efficiency of data transfer if observations are large (e.g. images). It supports all types of Gym spaces, including Dict and Tuple.
AsyncVectorEnv includes a low level control over the reset and step operations. In particular, these methods include an optional timeout argument, to shutdown the processes after a certain period, to avoid any hanging process.
The convenience function gym.vector.make() is the equivalent of gym.make(), with multiple copies of the environment wrapped in a vectorized environment.

import gym

env = gym.vector.make('BreakoutDeterministic-v4', num_envs=8)
env.seed(0)
observations = env.reset()
# observations.shape: (8, 210, 160, 3)

actions = env.action_space.sample()
observations, rewards, dones, infos = env.step(actions)
# rewards: array([0., 0., 0., 0., 0., 0., 0., 0.])
# dones: array([False, False, False, False, False, False, False, False])

EDIT:

Added max_retries argument in AsyncVectorEnv. This automatically restarts processes that failed, up to a certain amount of maximum retries.

Bonus:

import gym

env = gym.vector.make('HandReach-v0', num_envs=5)
env.observation_space
# Dict(achieved_goal:Box(5, 15), desired_goal:Box(5, 15), observation:Box(5, 63))
observations = env.reset()

observations.keys()
# odict_keys(['achieved_goal', 'desired_goal', 'observation'])
observations['observation'].shape
# (5, 63)

zuoxingdong · 2019-06-06T13:26:34Z

@tristandeleu That looks really great ! One potential issue with proposed API to me is that what if the user would like to wrap the "atomic" environment at first before wrapping them into a vectorized environments ?

tristandeleu · 2019-06-06T14:29:32Z

Thank you! The gym.vector.make function is meant for very simple cases, where you don't want to wrap the atomic environments. If you want to use wrappers around the environments, you can still use the lower level SyncVectorEnv and AsyncVectorEnv. For example:

import gym
from gym.vector import AsyncVectorEnv
from gym.wrappers import AtariPreprocessing

def make_env(env_id):
    def _make():
        env = gym.make(env_id)
        env = AtariPreprocessing(env)
        return env
    return _make

env_fns = [make_env('BreakoutDeterministic-v4') for _ in range(5)]
env = AsyncVectorEnv(env_fns)
observations = env.reset()
# observations.shape: (5, 84, 84)

tristandeleu · 2019-06-06T18:36:42Z

I added the episodic argument to both SyncVectorEnv and AsyncVectorEnv to override the default behavior of calling reset after an environment finishes an episode. The episodic argument is also available in the gym.vector.make function. This allows you to do something like

import gym
import numpy as np

env = gym.vector.make('CartPole-v1', num_envs=8, episodic=True)
observations = env.reset()
dones = np.zeros(env.num_envs, dtype=np.bool_)

while not dones.all():
    actions = env.action_space.sample()
    observations, rewards, dones, infos = env.step(actions)

pzhokhov · 2019-06-07T21:33:37Z

@tristandeleu Thanks, this looks great! In fact we have bounced this idea internally for a while, great to see the initiative taken by the community :) Given the size of the PR, however, I'd like to take more time to read through it carefully; also putting @christopherhesse into the loop.

christopherhesse · 2019-06-07T21:47:49Z

Having a standard vectorized environment in gym would be really nice. I'll write a few comments on this PR.

christopherhesse · 2019-06-07T21:49:50Z

gym/vector/__init__.py

+
+__all__ = ['AsyncVectorEnv', 'SyncVectorEnv', 'VectorEnv', 'make']
+
+def make(id, num_envs=1, asynchronous=True, episodic=False, **kwargs):


When is episodic=True used? I can't reset individual environments, so I'd have to reset them all at once right?

Yes, episodic=True means that the individual environments do not reset automatically after finishing an episode. You get only a single episode of each environment, and you are responsible for resetting the environment yourself. This is meant to cover the frequent pattern

while not done env.step(action)

How does that cover that pattern? Seems like I want episodic=True in that case or else I'll step past the end of some environments.

Yes, in that case you'd like episodic=True. Here is an example

import gym import numpy as np env = gym.vector.make('CubeCrash-v0', 5, episodic=True) observations = env.reset() dones = np.zeros(env.num_envs, dtype=np.bool_) # This terminates after getting 1 episode per environment while not dones.all(): actions = env.action_space.sample() observations, rewards, dones, _ = env.step(actions) env = gym.vector.make('CubeCrash-v0', 5, episodic=False) observations = env.reset() dones = np.zeros(env.num_envs, dtype=np.bool_) # This doesn't terminate while not dones.all(): actions = env.action_space.sample() observations, rewards, dones, _ = env.step(actions)

This allows you to either

Get a fixed amount of episodes. Let's say you have 5 environments and you want 20 episodes overall, with episodic=True you can just loop 4 times through this (and you are responsible for calling reset):

env = gym.vector.make('CubeCrash-v0', 5, episodic=True) for _ in range(4): observations = env.reset() dones = np.zeros(env.num_envs, dtype=np.bool_) while not dones.all(): actions = env.action_space.sample() observations, rewards, dones, _ = env.step(actions)

Or get an unbounded amount of episodes, without having to reset the environment yourself. This is the behavior you get in baselines, and is the default one here.

I think this is a little dangerous since it's easy to confuse the past-done "undefined" observations with the good ones in this case. What we normally do is if you want 20 episodes you do episodic=False and then count the number of dones (done_count += np.sum(dones) for instance). Any thoughts @pzhokhov ?

The episodic=True seems dangerous for me as well (left a separate comment in the code about that); while I can see the cases where it is useful, I think proper implementation will require env-specific details (like what is an ok observation after done, what is in the info dictionary).

gym/vector/async_vector_env.py

christopherhesse · 2019-06-07T22:09:34Z

gym/vector/async_vector_env.py

+            return True
+        return False
+
+    def close_extras(self, timeout=None, terminate=False):


The existing subprocvecenv doesn't seem to need the terminate or timeout arguments. While I could see an argument for the timeout one, why add terminate?

Actually, what's the argument for timeout as well? We haven't needed it in baselines.

The idea behind timeout is to leave close the possibility to close the individual environment gracefully, but doesn't have to wait for too long (for example, if something went wrong in one of the individual environment that leaves it hanging forever). terminate is the extreme version of it, where the user wants to close the environment without having to wait at all.

Sorry, that wasn't a very clear question. I realize why they're there, it's just I haven't used an environment where they were necessary. It seems like something that would be handled by a wrapper instead of the environment vectorizer. Do you have a case where this is actually used?

I never had to use it, but I feel like leaving the user the ability to have a finer control over this if they need to might be a good idea. I think it makes sense to have it in the base class for AsyncVectorEnv, especially for handling termination of processes, but I'm completely open to a wrapper-based solution if you have an idea. The default behavior is the same as in baselines.

gym/vector/async_vector_env.py

gym/vector/sync_vector_env.py

gym/vector/vector_env.py

setup.py

christopherhesse · 2019-06-07T22:43:32Z

Overall, this looks good, it simplifies some things but adds some complexity over the baselines implementation and it's not clear that the extra complexity is necessary.

gym/vector/async_vector_env.py

gym/vector/vector_env.py

christopherhesse · 2019-06-08T04:07:47Z

Also it looks like the existing tests did not make it in this PR: https://github.com/openai/baselines/blob/master/baselines/common/vec_env/test_vec_env.py

It would be nice to have those here as well.

…eption

…ents

tristandeleu · 2019-06-15T23:20:29Z

I have removed the episodic and automatic restart features from the current PR, and have them ready for future PRs once this one is merged.

christopherhesse · 2019-06-18T19:40:08Z

Awesome, thanks @tristandeleu! It looks to me that the only remaining item is to separate call_wait/call_async/call/get_attr/set_attr into a separate PR. Sorry again for the long cycle here, but it's a rather large PR. If everything on this PR looks good on Friday, hopefully we can merge it then.

The docstrings and tests look great btw.

tristandeleu · 2019-06-18T20:37:15Z

I removed call/get_attr/set_attr from the current PR, and this is also ready for another PR once this one is merged.

pzhokhov · 2019-06-19T16:28:11Z

LGTM! Thanks for the all the efforts, this looks very nice!

christopherhesse · 2019-06-21T21:30:58Z

Thanks again @tristandeleu, having vectorized envs in gym is nice, and the cleanup over what baselines did is also good.

christopherhesse · 2019-06-21T22:06:12Z

I filed some issues I found with this: https://github.com/openai/gym/labels/VectorEnv

* Initial version of vectorized environments * Raise an exception in the main process if child process raises an exception * Add list of exposed functions in vector module * Use deepcopy instead of np.copy * Add documentation for vector utils * Add tests for copy in AsyncVectorEnv * Add example in documentation for batch_space * Add cloudpickle dependency in setup.py * Fix __del__ in VectorEnv * Check if all observation spaces are equal in AsyncVectorEnv * Check if all observation spaces are equal in SyncVectorEnv * Fix spaces non equality in SyncVectorEnv for Python 2 * Handle None parameter in create_empty_array * Fix check_observation_space with spaces equality * Raise an exception when operations are out of order in AsyncVectorEnv * Add version requirement for cloudpickle * Use a state instead of binary flags in AsyncVectorEnv * Use numpy.zeros when initializing observations in vectorized environments * Remove poll from public API in AsyncVectorEnv * Remove close_extras from VectorEnv * Add test between AsyncVectorEnv and SyncVectorEnv * Remove close in check_observation_space * Add documentation for seed and close * Refactor exceptions for AsyncVectorEnv * Close pipes if the environment raises an error * Add tests for out of order operations * Change default argument in create_empty_array to np.zeros * Add get_attr and set_attr methods to VectorEnv * Improve consistency in SyncVectorEnv

tristandeleu force-pushed the feature/vector_env branch from da71bd1 to 359bc59 Compare June 2, 2019 02:11