Wingman IMPALA changes #3945

bjg2 · 2019-02-04T14:51:58Z

NOTE: This is the beginning of the pull request, so we can align. No unit tests were run, nor the changes are tested broader than we need them.

What do these changes do?

Enables the MultiDiscrete actions space in IMPALA.
Adds the MultiCategorical action distribution - we use IMPALA with that distribution - out model outputs categorical logits for each action. Possibly not all action spaces work for all action distributions, tested is that Discrete works with DiagGaussian (default) and that MultiDiscrete works with categorical (MultiCategorical). Note that model actually outputs a 1-dimensional tensor, that is reshaped to logit sizes according to provided MultiDiscrete action space.
VTrace is replaced with implementation that works for MultiDiscrete action space as well.
Adds the config remote_worker_envs to start environment as remote ray process, and to step all environments in one worker in parallel. This is important for our use-case, as we're working with big environments (like SC2) and stepping through them is relatively expensive. There are few problems with this approach I couldn't align with the current interface:

To use remote environments, user should change the config and change the registered environment function to return remote class, if remote field in environment context is true. Since user provides env_creator function, I didn't know how to create the remote env given that it might use specific constructor args in its env_creator.
As sometimes envs are created for real use (reset/step) and sometimes only for dummy (observation/action space), default dummy env is created with remote = False, others are created as set.

Stats gathering in AsyncSamplesOptimizer is improved, as samples_throughput and train_throughput were giving bad values (mostly 0, with periodical peaks, instead of steady graph).

AmplabJenkins · 2019-02-04T15:14:26Z

Merged build finished. Test FAILed.

AmplabJenkins · 2019-02-04T15:14:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11532/
Test FAILed.

ericl

The remote env stuff looks pretty clean. Shall we add those in a separate PR?

I think we should try to make the vtrace changes more surgical though, ideally not needing to touch most or all of vtrace.py

ericl · 2019-02-06T02:02:32Z

python/ray/rllib/agents/impala/vtrace.py

+    log_rhos = [t - b for t,
+                b in zip(target_action_log_probs, behaviour_action_log_probs)]
+    log_rhos = [tf.convert_to_tensor(l, dtype=tf.float32) for l in log_rhos]
+    log_rhos = tf.reduce_sum(tf.stack(log_rhos), axis=0)


IIUC, the change to support MultiDiscrete is mainly combining the independent log probabilities. In this case, is it possible to move the summations outside of vtrace.py to minimize the changes?

Correct. The summation of the independent log probabilities needs to be done inside vtrace.py since it comes after selecting policy values using actions for the behaviour and target policies and before its use in _from_importance_weights method (lines 119-133).

ericl · 2019-02-06T02:10:02Z

python/ray/rllib/env/vector_env.py

        return self.envs
+
+
+class _RemoteVectorizedGymEnv(_VectorizedGymEnv):


AmplabJenkins · 2019-02-06T15:15:37Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11612/
Test FAILed.

bjg2 · 2019-02-06T16:42:10Z

The remote env stuff looks pretty clean. Shall we add those in a separate PR?

I think we should try to make the vtrace changes more surgical though, ideally not needing to touch most or all of vtrace.py

Split into three MRs:

[wingman -> rllib] Improved stats changes in AsyncSamplesOptimizer #3966 [wingman -> rllib] Improved stats changes in AsyncSamplesOptimizer
[wingman -> rllib] Remote and entangled environments #3968 [wingman -> rllib] Remote and entangled environments
[wingman -> rllib] IMPALA MultiDiscrete changes #3967 [wingman -> rllib] IMPALA MultiDiscrete changes

ericl self-assigned this Feb 5, 2019

ericl reviewed Feb 6, 2019

View reviewed changes

bjg2 closed this Feb 6, 2019

bjg2 force-pushed the master branch from 7a4646f to 0b1608a Compare February 6, 2019 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wingman IMPALA changes #3945

Wingman IMPALA changes #3945

Uh oh!

bjg2 commented Feb 4, 2019

Uh oh!

AmplabJenkins commented Feb 4, 2019

Uh oh!

AmplabJenkins commented Feb 4, 2019

Uh oh!

ericl left a comment

Uh oh!

ericl Feb 6, 2019

Uh oh!

pimpke Feb 6, 2019

Uh oh!

ericl Feb 6, 2019

Uh oh!

AmplabJenkins commented Feb 6, 2019

Uh oh!

bjg2 commented Feb 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return self.envs


		class _RemoteVectorizedGymEnv(_VectorizedGymEnv):

Wingman IMPALA changes #3945

Wingman IMPALA changes #3945

Uh oh!

Conversation

bjg2 commented Feb 4, 2019

What do these changes do?

Uh oh!

AmplabJenkins commented Feb 4, 2019

Uh oh!

AmplabJenkins commented Feb 4, 2019

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

ericl Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

pimpke Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

ericl Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Feb 6, 2019

Uh oh!

bjg2 commented Feb 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants