Skip to content

Running episodes in parallel #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Coolnesss opened this issue Dec 5, 2019 · 4 comments
Closed

Running episodes in parallel #21

Coolnesss opened this issue Dec 5, 2019 · 4 comments

Comments

@Coolnesss
Copy link

Coolnesss commented Dec 5, 2019

Hey, library looks great.

I was wondering how to run multiple episodes at the same time using multiple workers. The Runner wrapper doesn't seem to support amounts of workers. For example, increasing the episodes value in the PPO example seems to run said episodes sequentially.

Any documentation on this would be appreciated.

@seba-1511
Copy link
Member

Hello,

Thanks for your interest in cherry! Currently, there are two ways to run episodes in parallel.

Use torch.distributed as done in this example. If the ExperienceReplay doesn't need to be shared across processes, then I'd recommend this approach which is fully supported.

Use gym.vector, which will run environments in parallel via multi-processing. (c.f. the PR) In this case, Runnerwill handle the vectorized environments, and results in two behaviors depending on whether you pass in steps or episodesto env.run():

  1. Using episodes: all episodes are gathered in parallel, and the replay returned by env.run() is "flattened". (i.e. exactly the same as if you didn't use gym.vector) This is well supported.
  2. Using steps: this is tricky, as it's not clear how to automatically "flatten" the replay. In this case, the replay stores the vectorized transitions. (i.e. the state of a transition will have shape (num_envs, *state_dims)) All examples should work with way to parallelize too, but I haven't extensively tested it.

Note that some environment wrappers will raise errors when using gym.vector. (e.g. the VisdomLogger doesn't know how to render vectorized environments.) I'm still working those, and any help with implementation or testing is very welcome.

Hope this helps,
Séb

PS: Obviously, a third way is to use gym.vector and write your own data-gathering function instead of using the Runnerwrapper. Depending on your setup, this might actually be a simple solution.

@Coolnesss
Copy link
Author

Hey, thanks for the comments. Upon looking at the options I decided to go with gym.vector (which I hadn't heard of before, so thanks for that!). However, since my env has a gym.spaces.Tuple observation space with different shaped arrays, I get an error. Here's a quick reproducible example (sorry about the thread divergence, I can post another issue if you like).

import cherry as ch
from cherry.envs import Runner
import gym
from gym.spaces import Tuple, Box

class ExampleEnv(gym.Env):
    def __init__(self):
        self.observation_space = Tuple((
            Box(-1, 1, shape=(10, 10)),
            Box(-1, 1, shape=(10,))
        ))
        self.action_space = Box(-1, 1, shape=(1,))
        

    def reset(self):
        return self.observation_space.sample()    
    
    def step(self, action):
        return self.observation_space.sample(), 1, True, {}    

def make_env():
    return ExampleEnv()

vector_env = gym.vector.AsyncVectorEnv([make_env] * 2)

env = ch.envs.Torch(vector_env)
policy = lambda x: ch.totensor(vector_env.action_space.sample())
env = Runner(env)
replay = env.run(policy, episodes=1)
print(replay)

Error:

Traceback (most recent call last):
  File "cherry_tuple.py", line 29, in <module>
    replay = env.run(policy, episodes=1)
  File "/anaconda3/lib/python3.6/site-packages/cherry/envs/runner_wrapper.py", line 145, in run
    replay.append(old_state, action, reward, state, done, **info)
  File "/anaconda3/lib/python3.6/site-packages/cherry/experience_replay.py", line 303, in append
    sars = Transition(ch.totensor(state),
  File "/anaconda3/lib/python3.6/site-packages/cherry/_torch.py", line 41, in totensor
    array = th.cat([totensor(x) for x in array], dim=0)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 4 and 3 at /Users/distiller/project/conda/conda-bld/pytorch_1565272679438/work/aten/src/TH/generic/THTensor.cpp:680

Does the Runnable wrapper support Tuple observations or am I doing something wrong?

@seba-1511
Copy link
Member

Thanks for the reproducible issue. As of now Runner does not support Tuple observations. The problem is actually not with the wrapper, but with the ExperienceReplay which expects the state to be tensorable. (list, ndarray, or tensor)

If you only need one entry of the state tuple, you can use a wrapper as in the grid-world example. If you do need the full tuple, a solution is to implement your own loop that gather experience and when you use replay.append() make sure to unwrap the tuple and pass each of its components as part of the optional info dictionary.

@Coolnesss
Copy link
Author

Thanks, closing this as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants