Running episodes in parallel #21

Coolnesss · 2019-12-05T12:14:05Z

Hey, library looks great.

I was wondering how to run multiple episodes at the same time using multiple workers. The Runner wrapper doesn't seem to support amounts of workers. For example, increasing the episodes value in the PPO example seems to run said episodes sequentially.

Any documentation on this would be appreciated.

The text was updated successfully, but these errors were encountered:

seba-1511 · 2019-12-05T21:49:31Z

Hello,

Thanks for your interest in cherry! Currently, there are two ways to run episodes in parallel.

Use torch.distributed as done in this example. If the ExperienceReplay doesn't need to be shared across processes, then I'd recommend this approach which is fully supported.

Use gym.vector, which will run environments in parallel via multi-processing. (c.f. the PR) In this case, Runnerwill handle the vectorized environments, and results in two behaviors depending on whether you pass in steps or episodesto env.run():

Using episodes: all episodes are gathered in parallel, and the replay returned by env.run() is "flattened". (i.e. exactly the same as if you didn't use gym.vector) This is well supported.
Using steps: this is tricky, as it's not clear how to automatically "flatten" the replay. In this case, the replay stores the vectorized transitions. (i.e. the state of a transition will have shape (num_envs, *state_dims)) All examples should work with way to parallelize too, but I haven't extensively tested it.

Note that some environment wrappers will raise errors when using gym.vector. (e.g. the VisdomLogger doesn't know how to render vectorized environments.) I'm still working those, and any help with implementation or testing is very welcome.

Hope this helps,
Séb

PS: Obviously, a third way is to use gym.vector and write your own data-gathering function instead of using the Runnerwrapper. Depending on your setup, this might actually be a simple solution.

Coolnesss · 2019-12-09T15:32:45Z

Hey, thanks for the comments. Upon looking at the options I decided to go with gym.vector (which I hadn't heard of before, so thanks for that!). However, since my env has a gym.spaces.Tuple observation space with different shaped arrays, I get an error. Here's a quick reproducible example (sorry about the thread divergence, I can post another issue if you like).

import cherry as ch
from cherry.envs import Runner
import gym
from gym.spaces import Tuple, Box

class ExampleEnv(gym.Env):
    def __init__(self):
        self.observation_space = Tuple((
            Box(-1, 1, shape=(10, 10)),
            Box(-1, 1, shape=(10,))
        ))
        self.action_space = Box(-1, 1, shape=(1,))
        

    def reset(self):
        return self.observation_space.sample()    
    
    def step(self, action):
        return self.observation_space.sample(), 1, True, {}    

def make_env():
    return ExampleEnv()

vector_env = gym.vector.AsyncVectorEnv([make_env] * 2)

env = ch.envs.Torch(vector_env)
policy = lambda x: ch.totensor(vector_env.action_space.sample())
env = Runner(env)
replay = env.run(policy, episodes=1)
print(replay)

Error:

Traceback (most recent call last):
  File "cherry_tuple.py", line 29, in <module>
    replay = env.run(policy, episodes=1)
  File "/anaconda3/lib/python3.6/site-packages/cherry/envs/runner_wrapper.py", line 145, in run
    replay.append(old_state, action, reward, state, done, **info)
  File "/anaconda3/lib/python3.6/site-packages/cherry/experience_replay.py", line 303, in append
    sars = Transition(ch.totensor(state),
  File "/anaconda3/lib/python3.6/site-packages/cherry/_torch.py", line 41, in totensor
    array = th.cat([totensor(x) for x in array], dim=0)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 4 and 3 at /Users/distiller/project/conda/conda-bld/pytorch_1565272679438/work/aten/src/TH/generic/THTensor.cpp:680

Does the Runnable wrapper support Tuple observations or am I doing something wrong?

seba-1511 · 2019-12-14T01:25:53Z

Thanks for the reproducible issue. As of now Runner does not support Tuple observations. The problem is actually not with the wrapper, but with the ExperienceReplay which expects the state to be tensorable. (list, ndarray, or tensor)

If you only need one entry of the state tuple, you can use a wrapper as in the grid-world example. If you do need the full tuple, a solution is to implement your own loop that gather experience and when you use replay.append() make sure to unwrap the tuple and pass each of its components as part of the optional info dictionary.

Coolnesss · 2019-12-18T14:39:05Z

Thanks, closing this as resolved.

Coolnesss closed this as completed Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running episodes in parallel #21

Running episodes in parallel #21

Coolnesss commented Dec 5, 2019 •

edited

Loading

seba-1511 commented Dec 5, 2019

Coolnesss commented Dec 9, 2019

seba-1511 commented Dec 14, 2019

Coolnesss commented Dec 18, 2019

Running episodes in parallel #21

Running episodes in parallel #21

Comments

Coolnesss commented Dec 5, 2019 • edited Loading

seba-1511 commented Dec 5, 2019

Coolnesss commented Dec 9, 2019

seba-1511 commented Dec 14, 2019

Coolnesss commented Dec 18, 2019

Coolnesss commented Dec 5, 2019 •

edited

Loading