Skip to content

Conversation

@bjg2
Copy link
Contributor

@bjg2 bjg2 commented Feb 6, 2019

NOTE: This is the beginning of the pull request, so we can align. No unit tests were run, nor the changes are tested broader than we need them.

What do these changes do?

  • Adds the config remote_worker_envs to work with environments as remote ray process, and to step all environments in one worker in parallel. This is important for our use-case, as we're working with big environments (like SC2) and stepping through them is relatively expensive.
  • Adds the config entangled_worker_envs to work with one physical environment as many logical environments (batched step). This is important for our use-case, as we have a Unity environment where batching several logical environments give many performance/resource management benefits. These logical environments are disconnected, done in one environment does not done others, it should reset right away, it doesn't share episode with others, so MultiAgentEnv is not working for us.

Environment issues

These changes work, but they are not the prettiest - for example even though remote_worker_envs config is set, env_creator needs to create remote environment only if provided env_context remote field is True (sometimes is built with remote, sometimes without). Aside from that, there is already mentioned issue that sometimes environments are created only for observation/action space, and that all the init data might be unused. Proposal for fixing these issues: change the register env interface so it takes (env_name, env_class, function mapping env_context -> env_constructor_params, function mapping env_context -> (obs_space, action_space)). That way no unneded environments would be created (as space data is already provided), and creating remote environments could be handled by rllib framework (user should only set remote_worker_envs to true).

Current remote env creation:

def build_env(env_context):
    cls = ray.remote(num_cpus=1)(Sc2Env).remote if env_context.remote else Sc2Env
    return cls(env_context.copy(), env_context.worker_index, env_context.vector_index)

@bjg2 bjg2 mentioned this pull request Feb 6, 2019
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11616/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remote env option makes sense to me. Isn't the entangled env the same as VectorEnv though? Is the only difference that the class subclasses gym.Env instead of VectorEnv?

def _make_evaluator(self, cls, env_creator, policy_graph, worker_index,
config):
config, remote_worker_envs=False,
entangled_worker_envs=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These args can be passed as part of config right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed entangled_worker_envs.

remote_worker_envs is hardcoded to False for local_evaluator, so it doesn't create ray workers if it doesn't need to. Also, you can see that dummy env in each evaluator is created with remote false with the same reason.

A lot of problems boils down to the issue that env needs to be created for dummy values. I encourage opening new environment register interface as mentioned in pull request description, and obsoleting current one (printing obsolete when used and removing it after few releases)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So one reason it makes sense to put under config is that you might want to run with num_workers: 0 with remote envs.

I think you can instead have the remote env wrapper initialize the remote workers only on reset(), similar to the rest of the env adapters. I agree the registration interface could use re-thinking though for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get how / why would you run with num_workers 0? Local evaluator is used for optimizer and remote evaluators are used for sampling, why would you want to mix those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local environment currently creates num num_envs_per_worker environments too much (it should not create neither one). It would be too costly to request this much ray workers and could even lead to hang if they couldn't be scheduled.

if not isinstance(env, BaseEnv):
if isinstance(env, MultiAgentEnv):
# NOTE: Probably should handle remote / entangled envs in
# _MultiAgentEnvToAsync as well
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we raise NotImplementedError here if those options are enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@PublicAPI
class EntangledEnv(gym.Env):
"""Interface for one physical environment that hosts
several logical environments."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from VectorEnv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the point - it does not. My mistake, didn't realize there is env I could use out of the box.


for env in range(self.envs):
assert isinstance(env, ray.actor.ActorClass), \
"Your environment needs to be ray remote environment"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of requiring the env to be decorated as remote, you could add the decorator here on the fly (i.e., ray.remote(cls)). That way, it becomes possible to toggle between remote / non-remote with just the use_remote flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mentioning that problem in the description. We don't have the environment class, just a function that maps env_context to env object. Even if we wanted to create env object and deduce the class from environment object (which is very ugly), we would need to know how to map env_context to env constructor arguments, which is info that we do not have.

The only solution I can think of is changing register env interface as mentioned in pull request desription.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, this seems fine then.

self.entangled_envs_num = entangled_envs_num

def with_vector_index(self, vector_index):
def align(self, env_config=None, worker_index=None, vector_index=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this function do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad naming, agree. Renamed to copy_with_overrides.

rew_batch.append(rew)
done_batch.append(done)
info_batch.append(info)
return obs_batch, rew_batch, done_batch, info_batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some way of destroying these remote actors when the agent is stopped, otherwise the actors may be leaked. (I haven't checked if they actually are).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is, I don't know how we can do that if the framework doesn't start ray workers. And then we come back to the issues we were talking about earlier...

But I'm not sure, seems that those are getting closed (they do not exist after keyboard interrupt). But other ugly kinds of stuff are happening - everything hangs if you request more environments than you can schedule on ray, etc (because it is out of the framework rllib does not handle it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense for now, I think you only run into the issue if running multiple experiments on the same cluster.

Re: resources: I think that can be resolved with the right resource requests (at least when running in Tune):

Basically, if you have remote envs enabled, then the agent should request max(1, num_workers) * num_envs_per_worker * num_remote_envs extra_cpu in default_resource_request(). That way you won't run into this resource allocation deadlock. However this only applies if running in Tune.

Alternatively, I think it would also work to create these remote envs with num_cpus=0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured out those ways around the issue, just mentioned that we don't control the remote environments and we don't even raise an exception if there are no resources when building them, it just hangs.

@bjg2
Copy link
Contributor Author

bjg2 commented Feb 7, 2019

The remote env option makes sense to me. Isn't the entangled env the same as VectorEnv though? Is the only difference that the class subclasses gym.Env instead of VectorEnv?

Yes. My blunder, I didn't understand I could just use that one.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11637/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11641/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11647/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a quick regression test in test_policy_evaluator.py? You could also have a standalone test script if it doesn't fit into that file, and add the entry to run_multi_node_tests.sh

},

# Whether environments are in remote process or not
"remote_worker_envs": False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to under the Environment config section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

def _make_evaluator(self, cls, env_creator, policy_graph, worker_index,
config):
config, remote_worker_envs=False,
entangled_worker_envs=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So one reason it makes sense to put under config is that you might want to run with num_workers: 0 with remote envs.

I think you can instead have the remote env wrapper initialize the remote workers only on reset(), similar to the rest of the env adapters. I agree the registration interface could use re-thinking though for the future.

output_creator=output_creator)
output_creator=output_creator,
remote_worker_envs=remote_worker_envs,
entangled_worker_envs=entangled_worker_envs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use config (and remove entangled_)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented about config above, and entangled does not exist (comment is on outdated snippet).

return self.envs


class _RemoteVectorizedGymEnv(_VectorizedGymEnv):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extending VectorEnv directly, since you don't seem to use much of the functionality of VectorizedGymEnv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I do reuse constructor and get_unwrapped, no need to copy those. I would leave it like this.


for env in range(self.envs):
assert isinstance(env, ray.actor.ActorClass), \
"Your environment needs to be ray remote environment"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, this seems fine then.

rew_batch.append(rew)
done_batch.append(done)
info_batch.append(info)
return obs_batch, rew_batch, done_batch, info_batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense for now, I think you only run into the issue if running multiple experiments on the same cluster.

Re: resources: I think that can be resolved with the right resource requests (at least when running in Tune):

Basically, if you have remote envs enabled, then the agent should request max(1, num_workers) * num_envs_per_worker * num_remote_envs extra_cpu in default_resource_request(). That way you won't run into this resource allocation deadlock. However this only applies if running in Tune.

Alternatively, I think it would also work to create these remote envs with num_cpus=0.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11687/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11692/
Test FAILed.

@ericl
Copy link
Contributor

ericl commented Feb 8, 2019

I pushed a change to auto-wrap remote envs. I think this will make it a lot easier to use since you don't need to modify existing envs to turn on the flag, let me know if it works for you. Also added a test.

Btw, you can run scripts/format.sh to auto-lint the files.

@ericl ericl self-assigned this Feb 8, 2019
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11698/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11700/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11701/
Test FAILed.

"Creating throwaway env to get action and obs space. To avoid "
"resource overheads, your env should defer any expensive "
"initialization to reset().")
dummy = make_env(0)
Copy link
Contributor Author

@bjg2 bjg2 Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need dummy env as you have action_space / observation_space already provided as arguments from dummy env created in policy evaluator constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if creating dummy env, calling dummy.close() at the end might be a good idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed.

return self.env.reset()

def step(self, action):
return self.env.step(action)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to call close() for remote env? SC2 environments are starting SC2 server which is a separate process, and and I guess the correct way to stop it in these situations would be calling the close method (though I see them dying after keyboard interrupt).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's python atexit which I think should work. If not, we can add close() hooks (but I don't know if this is as reliable in case of errors).

@bjg2
Copy link
Contributor Author

bjg2 commented Feb 11, 2019

I pushed a change to auto-wrap remote envs. I think this will make it a lot easier to use since you don't need to modify existing envs to turn on the flag, let me know if it works for you. Also added a test.

Btw, you can run scripts/format.sh to auto-lint the files.

Don't know why I haven't think of wrapping :( Looks and works great!

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11770/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11791/
Test FAILed.

@ericl
Copy link
Contributor

ericl commented Feb 12, 2019

  File "/ray/python/ray/rllib/test/test_nested_spaces.py", line 319, in <lambda>
    self.doTestNestedTuple(lambda _: BaseEnv.to_base_env(NestedTupleEnv()))
  File "/ray/python/ray/rllib/env/base_env.py", line 96, in to_base_env
    observation_space=env.observation_space)
  File "/ray/python/ray/rllib/env/vector_env.py", line 35, in wrap
    observation_space)
  File "/ray/python/ray/rllib/env/vector_env.py", line 96, in __init__
    self.envs.append(self.make_env(len(self.envs)))
TypeError: 'NoneType' object is not callable

@bjg2
Copy link
Contributor Author

bjg2 commented Feb 13, 2019

  File "/ray/python/ray/rllib/test/test_nested_spaces.py", line 319, in <lambda>
    self.doTestNestedTuple(lambda _: BaseEnv.to_base_env(NestedTupleEnv()))
  File "/ray/python/ray/rllib/env/base_env.py", line 96, in to_base_env
    observation_space=env.observation_space)
  File "/ray/python/ray/rllib/env/vector_env.py", line 35, in wrap
    observation_space)
  File "/ray/python/ray/rllib/env/vector_env.py", line 96, in __init__
    self.envs.append(self.make_env(len(self.envs)))
TypeError: 'NoneType' object is not callable

Fixed falling tests.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11883/
Test PASSed.

@ericl ericl merged commit 0e37ac6 into ray-project:master Feb 13, 2019
@ericl
Copy link
Contributor

ericl commented Feb 13, 2019

Tests look good, thanks for contributing this!

@bjg2 bjg2 deleted the env-changes branch February 25, 2019 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants