Skip to content

Conversation

@bjg2
Copy link
Contributor

@bjg2 bjg2 commented Feb 4, 2019

NOTE: This is the beginning of the pull request, so we can align. No unit tests were run, nor the changes are tested broader than we need them.

What do these changes do?

  • Enables the MultiDiscrete actions space in IMPALA.

  • Adds the MultiCategorical action distribution - we use IMPALA with that distribution - out model outputs categorical logits for each action. Possibly not all action spaces work for all action distributions, tested is that Discrete works with DiagGaussian (default) and that MultiDiscrete works with categorical (MultiCategorical). Note that model actually outputs a 1-dimensional tensor, that is reshaped to logit sizes according to provided MultiDiscrete action space.

  • VTrace is replaced with implementation that works for MultiDiscrete action space as well.

  • Adds the config remote_worker_envs to start environment as remote ray process, and to step all environments in one worker in parallel. This is important for our use-case, as we're working with big environments (like SC2) and stepping through them is relatively expensive. There are few problems with this approach I couldn't align with the current interface:

  1. To use remote environments, user should change the config and change the registered environment function to return remote class, if remote field in environment context is true. Since user provides env_creator function, I didn't know how to create the remote env given that it might use specific constructor args in its env_creator.
  2. As sometimes envs are created for real use (reset/step) and sometimes only for dummy (observation/action space), default dummy env is created with remote = False, others are created as set.
  • Stats gathering in AsyncSamplesOptimizer is improved, as samples_throughput and train_throughput were giving bad values (mostly 0, with periodical peaks, instead of steady graph).

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11532/
Test FAILed.

@ericl ericl self-assigned this Feb 5, 2019
Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remote env stuff looks pretty clean. Shall we add those in a separate PR?

I think we should try to make the vtrace changes more surgical though, ideally not needing to touch most or all of vtrace.py

log_rhos = [t - b for t,
b in zip(target_action_log_probs, behaviour_action_log_probs)]
log_rhos = [tf.convert_to_tensor(l, dtype=tf.float32) for l in log_rhos]
log_rhos = tf.reduce_sum(tf.stack(log_rhos), axis=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the change to support MultiDiscrete is mainly combining the independent log probabilities. In this case, is it possible to move the summations outside of vtrace.py to minimize the changes?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The summation of the independent log probabilities needs to be done inside vtrace.py since it comes after selecting policy values using actions for the behaviour and target policies and before its use in _from_importance_weights method (lines 119-133).

return self.envs


class _RemoteVectorizedGymEnv(_VectorizedGymEnv):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/11612/
Test FAILed.

@bjg2
Copy link
Contributor Author

bjg2 commented Feb 6, 2019

The remote env stuff looks pretty clean. Shall we add those in a separate PR?

I think we should try to make the vtrace changes more surgical though, ideally not needing to touch most or all of vtrace.py

Split into three MRs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants