-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[wingman -> rllib] Remote and entangled environments #3968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remote env option makes sense to me. Isn't the entangled env the same as VectorEnv though? Is the only difference that the class subclasses gym.Env instead of VectorEnv?
python/ray/rllib/agents/agent.py
Outdated
| def _make_evaluator(self, cls, env_creator, policy_graph, worker_index, | ||
| config): | ||
| config, remote_worker_envs=False, | ||
| entangled_worker_envs=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These args can be passed as part of config right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed entangled_worker_envs.
remote_worker_envs is hardcoded to False for local_evaluator, so it doesn't create ray workers if it doesn't need to. Also, you can see that dummy env in each evaluator is created with remote false with the same reason.
A lot of problems boils down to the issue that env needs to be created for dummy values. I encourage opening new environment register interface as mentioned in pull request description, and obsoleting current one (printing obsolete when used and removing it after few releases)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So one reason it makes sense to put under config is that you might want to run with num_workers: 0 with remote envs.
I think you can instead have the remote env wrapper initialize the remote workers only on reset(), similar to the rest of the env adapters. I agree the registration interface could use re-thinking though for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get how / why would you run with num_workers 0? Local evaluator is used for optimizer and remote evaluators are used for sampling, why would you want to mix those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local environment currently creates num num_envs_per_worker environments too much (it should not create neither one). It would be too costly to request this much ray workers and could even lead to hang if they couldn't be scheduled.
python/ray/rllib/env/base_env.py
Outdated
| if not isinstance(env, BaseEnv): | ||
| if isinstance(env, MultiAgentEnv): | ||
| # NOTE: Probably should handle remote / entangled envs in | ||
| # _MultiAgentEnvToAsync as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we raise NotImplementedError here if those options are enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| @PublicAPI | ||
| class EntangledEnv(gym.Env): | ||
| """Interface for one physical environment that hosts | ||
| several logical environments.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this differ from VectorEnv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the point - it does not. My mistake, didn't realize there is env I could use out of the box.
python/ray/rllib/env/vector_env.py
Outdated
|
|
||
| for env in range(self.envs): | ||
| assert isinstance(env, ray.actor.ActorClass), \ | ||
| "Your environment needs to be ray remote environment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of requiring the env to be decorated as remote, you could add the decorator here on the fly (i.e., ray.remote(cls)). That way, it becomes possible to toggle between remote / non-remote with just the use_remote flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was mentioning that problem in the description. We don't have the environment class, just a function that maps env_context to env object. Even if we wanted to create env object and deduce the class from environment object (which is very ugly), we would need to know how to map env_context to env constructor arguments, which is info that we do not have.
The only solution I can think of is changing register env interface as mentioned in pull request desription.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, this seems fine then.
python/ray/rllib/env/env_context.py
Outdated
| self.entangled_envs_num = entangled_envs_num | ||
|
|
||
| def with_vector_index(self, vector_index): | ||
| def align(self, env_config=None, worker_index=None, vector_index=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this function do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad naming, agree. Renamed to copy_with_overrides.
python/ray/rllib/env/vector_env.py
Outdated
| rew_batch.append(rew) | ||
| done_batch.append(done) | ||
| info_batch.append(info) | ||
| return obs_batch, rew_batch, done_batch, info_batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some way of destroying these remote actors when the agent is stopped, otherwise the actors may be leaked. (I haven't checked if they actually are).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is, I don't know how we can do that if the framework doesn't start ray workers. And then we come back to the issues we were talking about earlier...
But I'm not sure, seems that those are getting closed (they do not exist after keyboard interrupt). But other ugly kinds of stuff are happening - everything hangs if you request more environments than you can schedule on ray, etc (because it is out of the framework rllib does not handle it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense for now, I think you only run into the issue if running multiple experiments on the same cluster.
Re: resources: I think that can be resolved with the right resource requests (at least when running in Tune):
Basically, if you have remote envs enabled, then the agent should request max(1, num_workers) * num_envs_per_worker * num_remote_envs extra_cpu in default_resource_request(). That way you won't run into this resource allocation deadlock. However this only applies if running in Tune.
Alternatively, I think it would also work to create these remote envs with num_cpus=0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I figured out those ways around the issue, just mentioned that we don't control the remote environments and we don't even raise an exception if there are no resources when building them, it just hangs.
Yes. My blunder, I didn't understand I could just use that one. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a quick regression test in test_policy_evaluator.py? You could also have a standalone test script if it doesn't fit into that file, and add the entry to run_multi_node_tests.sh
python/ray/rllib/agents/agent.py
Outdated
| }, | ||
|
|
||
| # Whether environments are in remote process or not | ||
| "remote_worker_envs": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to under the Environment config section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
python/ray/rllib/agents/agent.py
Outdated
| def _make_evaluator(self, cls, env_creator, policy_graph, worker_index, | ||
| config): | ||
| config, remote_worker_envs=False, | ||
| entangled_worker_envs=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So one reason it makes sense to put under config is that you might want to run with num_workers: 0 with remote envs.
I think you can instead have the remote env wrapper initialize the remote workers only on reset(), similar to the rest of the env adapters. I agree the registration interface could use re-thinking though for the future.
python/ray/rllib/agents/agent.py
Outdated
| output_creator=output_creator) | ||
| output_creator=output_creator, | ||
| remote_worker_envs=remote_worker_envs, | ||
| entangled_worker_envs=entangled_worker_envs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use config (and remove entangled_)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented about config above, and entangled does not exist (comment is on outdated snippet).
| return self.envs | ||
|
|
||
|
|
||
| class _RemoteVectorizedGymEnv(_VectorizedGymEnv): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extending VectorEnv directly, since you don't seem to use much of the functionality of VectorizedGymEnv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I do reuse constructor and get_unwrapped, no need to copy those. I would leave it like this.
python/ray/rllib/env/vector_env.py
Outdated
|
|
||
| for env in range(self.envs): | ||
| assert isinstance(env, ray.actor.ActorClass), \ | ||
| "Your environment needs to be ray remote environment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, this seems fine then.
python/ray/rllib/env/vector_env.py
Outdated
| rew_batch.append(rew) | ||
| done_batch.append(done) | ||
| info_batch.append(info) | ||
| return obs_batch, rew_batch, done_batch, info_batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense for now, I think you only run into the issue if running multiple experiments on the same cluster.
Re: resources: I think that can be resolved with the right resource requests (at least when running in Tune):
Basically, if you have remote envs enabled, then the agent should request max(1, num_workers) * num_envs_per_worker * num_remote_envs extra_cpu in default_resource_request(). That way you won't run into this resource allocation deadlock. However this only applies if running in Tune.
Alternatively, I think it would also work to create these remote envs with num_cpus=0.
|
Test FAILed. |
|
Test FAILed. |
|
I pushed a change to auto-wrap remote envs. I think this will make it a lot easier to use since you don't need to modify existing envs to turn on the flag, let me know if it works for you. Also added a test. Btw, you can run |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
python/ray/rllib/env/vector_env.py
Outdated
| "Creating throwaway env to get action and obs space. To avoid " | ||
| "resource overheads, your env should defer any expensive " | ||
| "initialization to reset().") | ||
| dummy = make_env(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need dummy env as you have action_space / observation_space already provided as arguments from dummy env created in policy evaluator constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if creating dummy env, calling dummy.close() at the end might be a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, fixed.
| return self.env.reset() | ||
|
|
||
| def step(self, action): | ||
| return self.env.step(action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to call close() for remote env? SC2 environments are starting SC2 server which is a separate process, and and I guess the correct way to stop it in these situations would be calling the close method (though I see them dying after keyboard interrupt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's python atexit which I think should work. If not, we can add close() hooks (but I don't know if this is as reliable in case of errors).
Don't know why I haven't think of wrapping :( Looks and works great! |
|
Test FAILed. |
|
Test FAILed. |
|
Fixed falling tests. |
|
Test PASSed. |
|
Tests look good, thanks for contributing this! |
NOTE: This is the beginning of the pull request, so we can align. No unit tests were run, nor the changes are tested broader than we need them.
What do these changes do?
Environment issues
These changes work, but they are not the prettiest - for example even though remote_worker_envs config is set, env_creator needs to create remote environment only if provided env_context remote field is True (sometimes is built with remote, sometimes without). Aside from that, there is already mentioned issue that sometimes environments are created only for observation/action space, and that all the init data might be unused. Proposal for fixing these issues: change the register env interface so it takes (env_name, env_class, function mapping env_context -> env_constructor_params, function mapping env_context -> (obs_space, action_space)). That way no unneded environments would be created (as space data is already provided), and creating remote environments could be handled by rllib framework (user should only set remote_worker_envs to true).
Current remote env creation: