-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Wingman IMPALA changes #3945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wingman IMPALA changes #3945
Conversation
|
Merged build finished. Test FAILed. |
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remote env stuff looks pretty clean. Shall we add those in a separate PR?
I think we should try to make the vtrace changes more surgical though, ideally not needing to touch most or all of vtrace.py
| log_rhos = [t - b for t, | ||
| b in zip(target_action_log_probs, behaviour_action_log_probs)] | ||
| log_rhos = [tf.convert_to_tensor(l, dtype=tf.float32) for l in log_rhos] | ||
| log_rhos = tf.reduce_sum(tf.stack(log_rhos), axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the change to support MultiDiscrete is mainly combining the independent log probabilities. In this case, is it possible to move the summations outside of vtrace.py to minimize the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The summation of the independent log probabilities needs to be done inside vtrace.py since it comes after selecting policy values using actions for the behaviour and target policies and before its use in _from_importance_weights method (lines 119-133).
python/ray/rllib/env/vector_env.py
Outdated
| return self.envs | ||
|
|
||
|
|
||
| class _RemoteVectorizedGymEnv(_VectorizedGymEnv): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
|
Test FAILed. |
Split into three MRs:
|
NOTE: This is the beginning of the pull request, so we can align. No unit tests were run, nor the changes are tested broader than we need them.
What do these changes do?
Enables the MultiDiscrete actions space in IMPALA.
Adds the MultiCategorical action distribution - we use IMPALA with that distribution - out model outputs categorical logits for each action. Possibly not all action spaces work for all action distributions, tested is that Discrete works with DiagGaussian (default) and that MultiDiscrete works with categorical (MultiCategorical). Note that model actually outputs a 1-dimensional tensor, that is reshaped to logit sizes according to provided MultiDiscrete action space.
VTrace is replaced with implementation that works for MultiDiscrete action space as well.
Adds the config remote_worker_envs to start environment as remote ray process, and to step all environments in one worker in parallel. This is important for our use-case, as we're working with big environments (like SC2) and stepping through them is relatively expensive. There are few problems with this approach I couldn't align with the current interface: