-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized environments #1513
Vectorized environments #1513
Conversation
da71bd1
to
359bc59
Compare
@tristandeleu That looks really great ! One potential issue with proposed API to me is that what if the user would like to wrap the "atomic" environment at first before wrapping them into a vectorized environments ? |
Thank you! The import gym
from gym.vector import AsyncVectorEnv
from gym.wrappers import AtariPreprocessing
def make_env(env_id):
def _make():
env = gym.make(env_id)
env = AtariPreprocessing(env)
return env
return _make
env_fns = [make_env('BreakoutDeterministic-v4') for _ in range(5)]
env = AsyncVectorEnv(env_fns)
observations = env.reset()
# observations.shape: (5, 84, 84) |
import gym
import numpy as np
env = gym.vector.make('CartPole-v1', num_envs=8, episodic=True)
observations = env.reset()
dones = np.zeros(env.num_envs, dtype=np.bool_)
while not dones.all():
actions = env.action_space.sample()
observations, rewards, dones, infos = env.step(actions) |
@tristandeleu Thanks, this looks great! In fact we have bounced this idea internally for a while, great to see the initiative taken by the community :) Given the size of the PR, however, I'd like to take more time to read through it carefully; also putting @christopherhesse into the loop. |
Having a standard vectorized environment in gym would be really nice. I'll write a few comments on this PR. |
gym/vector/__init__.py
Outdated
|
||
__all__ = ['AsyncVectorEnv', 'SyncVectorEnv', 'VectorEnv', 'make'] | ||
|
||
def make(id, num_envs=1, asynchronous=True, episodic=False, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is episodic=True used? I can't reset individual environments, so I'd have to reset them all at once right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, episodic=True
means that the individual environments do not reset automatically after finishing an episode. You get only a single episode of each environment, and you are responsible for resetting the environment yourself. This is meant to cover the frequent pattern
while not done
env.step(action)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that cover that pattern? Seems like I want episodic=True in that case or else I'll step past the end of some environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in that case you'd like episodic=True
. Here is an example
import gym
import numpy as np
env = gym.vector.make('CubeCrash-v0', 5, episodic=True)
observations = env.reset()
dones = np.zeros(env.num_envs, dtype=np.bool_)
# This terminates after getting 1 episode per environment
while not dones.all():
actions = env.action_space.sample()
observations, rewards, dones, _ = env.step(actions)
env = gym.vector.make('CubeCrash-v0', 5, episodic=False)
observations = env.reset()
dones = np.zeros(env.num_envs, dtype=np.bool_)
# This doesn't terminate
while not dones.all():
actions = env.action_space.sample()
observations, rewards, dones, _ = env.step(actions)
This allows you to either
- Get a fixed amount of episodes. Let's say you have 5 environments and you want 20 episodes overall, with
episodic=True
you can just loop 4 times through this (and you are responsible for callingreset
):
env = gym.vector.make('CubeCrash-v0', 5, episodic=True)
for _ in range(4):
observations = env.reset()
dones = np.zeros(env.num_envs, dtype=np.bool_)
while not dones.all():
actions = env.action_space.sample()
observations, rewards, dones, _ = env.step(actions)
- Or get an unbounded amount of episodes, without having to reset the environment yourself. This is the behavior you get in baselines, and is the default one here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a little dangerous since it's easy to confuse the past-done "undefined" observations with the good ones in this case. What we normally do is if you want 20 episodes you do episodic=False
and then count the number of dones (done_count += np.sum(dones)
for instance). Any thoughts @pzhokhov ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The episodic=True seems dangerous for me as well (left a separate comment in the code about that); while I can see the cases where it is useful, I think proper implementation will require env-specific details (like what is an ok observation after done, what is in the info dictionary).
gym/vector/async_vector_env.py
Outdated
return True | ||
return False | ||
|
||
def close_extras(self, timeout=None, terminate=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing subprocvecenv doesn't seem to need the terminate
or timeout
arguments. While I could see an argument for the timeout one, why add terminate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what's the argument for timeout as well? We haven't needed it in baselines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind timeout
is to leave close
the possibility to close the individual environment gracefully, but doesn't have to wait for too long (for example, if something went wrong in one of the individual environment that leaves it hanging forever). terminate
is the extreme version of it, where the user wants to close the environment without having to wait at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that wasn't a very clear question. I realize why they're there, it's just I haven't used an environment where they were necessary. It seems like something that would be handled by a wrapper instead of the environment vectorizer. Do you have a case where this is actually used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never had to use it, but I feel like leaving the user the ability to have a finer control over this if they need to might be a good idea. I think it makes sense to have it in the base class for AsyncVectorEnv
, especially for handling termination of processes, but I'm completely open to a wrapper-based solution if you have an idea. The default behavior is the same as in baselines.
Overall, this looks good, it simplifies some things but adds some complexity over the baselines implementation and it's not clear that the extra complexity is necessary. |
Also it looks like the existing tests did not make it in this PR: https://github.com/openai/baselines/blob/master/baselines/common/vec_env/test_vec_env.py It would be nice to have those here as well. |
ddae559
to
b22a66b
Compare
I have removed the episodic and automatic restart features from the current PR, and have them ready for future PRs once this one is merged. |
Awesome, thanks @tristandeleu! It looks to me that the only remaining item is to separate The docstrings and tests look great btw. |
b22a66b
to
7a4efe4
Compare
I removed |
LGTM! Thanks for the all the efforts, this looks very nice! |
Thanks again @tristandeleu, having vectorized envs in gym is nice, and the cleanup over what baselines did is also good. |
I filed some issues I found with this: https://github.com/openai/gym/labels/VectorEnv |
* Initial version of vectorized environments * Raise an exception in the main process if child process raises an exception * Add list of exposed functions in vector module * Use deepcopy instead of np.copy * Add documentation for vector utils * Add tests for copy in AsyncVectorEnv * Add example in documentation for batch_space * Add cloudpickle dependency in setup.py * Fix __del__ in VectorEnv * Check if all observation spaces are equal in AsyncVectorEnv * Check if all observation spaces are equal in SyncVectorEnv * Fix spaces non equality in SyncVectorEnv for Python 2 * Handle None parameter in create_empty_array * Fix check_observation_space with spaces equality * Raise an exception when operations are out of order in AsyncVectorEnv * Add version requirement for cloudpickle * Use a state instead of binary flags in AsyncVectorEnv * Use numpy.zeros when initializing observations in vectorized environments * Remove poll from public API in AsyncVectorEnv * Remove close_extras from VectorEnv * Add test between AsyncVectorEnv and SyncVectorEnv * Remove close in check_observation_space * Add documentation for seed and close * Refactor exceptions for AsyncVectorEnv * Close pipes if the environment raises an error * Add tests for out of order operations * Change default argument in create_empty_array to np.zeros * Add get_attr and set_attr methods to VectorEnv * Improve consistency in SyncVectorEnv
One of the most useful feature in
openai/baselines
is the ability to run multiple environments in parallel with theSubprocVecEnv
wrapper. However in order to benefit from this feature, it is necessary to install the whole baselines package, which includes some heavy dependencies like Tensorflow. To avoid this dependency, starting a new project often requires to copy/paste the snippet of code forSubprocVecEnv
. I am proposing to add this functionality in Gym, so that this feature can be used directly out of the box.TL;DR
VectorEnv
, inheriting fromgym.Env
. This ensures that the vectorized environments are still valid instances ofgym.Env
. In particular, thestep
method still returns a tuple(observations, rewards, dones, infos)
. However, the difference is thatrewards
anddones
are now arrays (instead of scalars).SyncVectorEnv
andAsyncVectorEnv
. Roughly speaking,SyncVectorEnv
is equivalent toDummyVecEnv
, andAsyncVectorEnv
is a unified API for bothSubprocVecEnv
andShmemVecEnv
.SyncVectorEnv
runs the environments serially.AsyncVectorEnv
usesmultiprocessing
, and runs the environments in parallel.AsyncVectorEnv
has the possibility to use shared memory across processes, to improve the efficiency of data transfer if observations are large (e.g. images). It supports all types of Gym spaces, includingDict
andTuple
.AsyncVectorEnv
includes a low level control over thereset
andstep
operations. In particular, these methods include an optionaltimeout
argument, to shutdown the processes after a certain period, to avoid any hanging process.gym.vector.make()
is the equivalent ofgym.make()
, with multiple copies of the environment wrapped in a vectorized environment.EDIT:
max_retries
argument inAsyncVectorEnv
. This automatically restarts processes that failed, up to a certain amount of maximum retries.Bonus: