-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[RLlib] Attention Net prep PR #2: Smaller cleanups. #12449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # OBS are already shifted by -1 (the initial obs starts one ts | ||
| # before all other data columns). | ||
| shift = view_req.shift - \ | ||
| shift = view_req.data_rel_pos - \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed this b/c this will support (in the upcoming PRs) not just a single shift (int), but also:
- list of ints (include not just one ts in this view, but several)
- a range string, e.g. "-50:-1" (will be used by attention nets and Atari framestacking).
|
|
||
| def add_init_obs(self, episode_id: EpisodeID, agent_id: AgentID, | ||
| env_id: EnvID, init_obs: TensorType, | ||
| def add_init_obs(self, episode_id: EpisodeID, agent_index: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- agent_id vs agent_idx was a bug
- added timestep
| data_col: Optional[str] = None, | ||
| space: gym.Space = None, | ||
| shift: Union[int, List[int]] = 0, | ||
| data_rel_pos: Union[int, List[int]] = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep it as shift? It seems to be intuitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked shift, too. The problem is, there will also be an abs_pos soon (see attention net PRs). So I wanted to distinguish between these two concepts.
| whether to create those new envs in remote processes instead of | ||
| in the current process. This adds overheads, but can make sense | ||
| if your envs are expensive to step/reset (e.g., for StarCraft). | ||
| Use this cautiously, overheads are significant! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The current attention net trajectory view PR (#11729) is too large (>1000 lines added).
Therefore, I'm moving smaller preparatory and cleanup changes into 3 pre-PRs. This is the second one of these. Only review it once this one here (#12447) has been merged.
Why are these changes needed?
Related issue number
Checks
scripts/format.shto lint the changes in this PR.