-
Notifications
You must be signed in to change notification settings - Fork 7k
[RLlib] Attention Net prep PR #1: Smaller cleanups. #12447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| validate_spaces=validate_spaces, | ||
| before_init=before_init_fn, | ||
| after_init=setup_late_mixins, | ||
| before_loss_init=setup_late_mixins, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the more accurate kwarg to use (torch did not have a loss init step before, so this is new). The old after_init still works the exact same and thus this does not cause an API-break.
| # RNN case: Mask away 0-padded chunks at end of time axis. | ||
| if state: | ||
| max_seq_len = tf.reduce_max(train_batch["seq_lens"]) | ||
| # Derive max_seq_len from the data itself, not from the seq_lens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prep for attention nets, where dynamic max'ing over the given sequences is not allowed.
| episode._set_last_observation(agent_id, filtered_obs) | ||
| episode._set_last_raw_obs(agent_id, raw_obs) | ||
| episode._set_last_info(agent_id, infos[env_id].get(agent_id, {})) | ||
| # Infos from the environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding "infos" to the collector's, if required.
The current attention net trajectory view PR (#11729) is too large (>1000 lines added).
Therefore, I'm moving smaller preparatory and cleanup changes in ~2 pre-PRs. This is the first one of these.
Why are these changes needed?
Related issue number
Checks
scripts/format.shto lint the changes in this PR.