-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fixes Termination Manager logging to report aggregated percentage of environments done due to each term. #3107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6244bfa to
45ff89d
Compare
| The corresponding termination term value. Shape is (num_envs,). | ||
| """ | ||
| return self._term_dones[name] | ||
| return self._term_dones[name, self._term_names.index(name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indexing here is not great. We use the value from here to give termination rewards in different environments. Doing the .index every time for reward computation may lead to slow downs (as it searches over the list in O(n) fashion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh if you like the single tensor approach I can also store one more name -> idx dict, and make change here O(1), I wasn't aware of termination rewards, (my bad!!) and thought this function may be used infrequently. I should do a global search next time rather than assume!
| self._term_dones = dict() | ||
| for term_name in self._term_names: | ||
| self._term_dones[term_name] = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool) | ||
| self._term_dones = torch.zeros((self.num_envs, len(self._term_names)), device=self.device, dtype=torch.bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to change this as a dict of tensors to a single tensor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it last_episode_done_stats = self._term_dones.float().mean(dim=0) this operation is very nice and optimized, thats why I did it, but if you think dict is more clear I can revert back : ))
| for i, key in enumerate(self._term_names): | ||
| # store information | ||
| extras["Episode_Termination/" + key] = torch.count_nonzero(self._term_dones[key][env_ids]).item() | ||
| extras["Episode_Termination/" + key] = last_episode_done_stats[i].item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want the ratio, isn't that simply?
self._term_dones[key][env_ids].sum() / len(env_ids)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reveiwing!
I guess, doing with env_ids will be viewing ratio of resetting environments. I thought maybe report stats of all environment can be a bit more nicer as user can verify from the graph that all terms sum up to 1. Of course you can do self._term_dones[key].sum() / self.env.num_envs as well
But it seems like if this approach is what I after, do it in one tensor operation seems quite nice, both speed wise and memory utility wise.
25d4594 to
c220593
Compare
Test Results Summary2 419 tests 2 011 ✅ 2h 22m 39s ⏱️ Results for commit a0767c6. ♻️ This comment has been updated with latest results. |
ea5fa9a to
41ccce3
Compare
|
@Mayankm96 benchmarked task: velocity rough anymal c, 4096 envs, 1000 steps before change : after change: one step total: 80.187 ms the cost for this PR is about 0.04 ms / 80, 0.05%, mostly due to compute needs modify other terms's done to correctly update the done buffer. I think this is reasonable |
5ace435 to
a0767c6
Compare
…environments done due to each term. (isaac-sim#3107) # Description Currently Termination Manager write current step's done count for each term if reset is detected. This leads to two problem. 1. User sees different counts just by varying num_envs 2. the count can be over-count or under-count depending on when reset happens, as pointed out in isaac-sim#2977 (Thanks, @Kyu3224) The cause of the bug is because we are reporting current step status into a buffer that suppose to record episodic done. So instead of write the entire buffer base on current value, we ask the update to respect the non-reseting environment's old value, and instead of reporting count, we report percentage of environment that was done due to the particular term. Test on Isaac-Velocity-Rough-Anymal-C-v0 Before fix: <img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-20" src="https://github.com/user-attachments/assets/4838d612-7f0e-4232-a07e-688b547e91db" /> Red: num_envs = 4096, Orange: num_envs = 1024 After fix: <img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-12" src="https://github.com/user-attachments/assets/e6e55c21-17ed-42ca-8d94-a19d08611f86" /> Red: num_envs = 4096, Orange: num_envs = 1024 Note that curve of the same color ran on same seed, and curves matched exactly, the only difference is the data gets reported in termination. The percentage version is a lot more clear in conveying how agent currently fails, and how much percentage of agent fails, and shows that increasing num_envs to 4096 helps improve agent avoiding termination by `base_contact` much quicker than num_envs=1024. Such message is a bit hard to tell in first image. <!-- Example: | Before | After | | ------ | ----- | | _gif/png before_ | _gif/png after_ | To upload images to a PR -- simply drag and drop an image while in edit mode and it should upload the image directly. You can then paste that source into the above before/after sections. --> ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task -->
…environments done due to each term. (isaac-sim#3107) # Description Currently Termination Manager write current step's done count for each term if reset is detected. This leads to two problem. 1. User sees different counts just by varying num_envs 2. the count can be over-count or under-count depending on when reset happens, as pointed out in isaac-sim#2977 (Thanks, @Kyu3224) The cause of the bug is because we are reporting current step status into a buffer that suppose to record episodic done. So instead of write the entire buffer base on current value, we ask the update to respect the non-reseting environment's old value, and instead of reporting count, we report percentage of environment that was done due to the particular term. Test on Isaac-Velocity-Rough-Anymal-C-v0 Before fix: <img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-20" src="https://github.com/user-attachments/assets/4838d612-7f0e-4232-a07e-688b547e91db" /> Red: num_envs = 4096, Orange: num_envs = 1024 After fix: <img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-12" src="https://github.com/user-attachments/assets/e6e55c21-17ed-42ca-8d94-a19d08611f86" /> Red: num_envs = 4096, Orange: num_envs = 1024 Note that curve of the same color ran on same seed, and curves matched exactly, the only difference is the data gets reported in termination. The percentage version is a lot more clear in conveying how agent currently fails, and how much percentage of agent fails, and shows that increasing num_envs to 4096 helps improve agent avoiding termination by `base_contact` much quicker than num_envs=1024. Such message is a bit hard to tell in first image. <!-- Example: | Before | After | | ------ | ----- | | _gif/png before_ | _gif/png after_ | To upload images to a PR -- simply drag and drop an image while in edit mode and it should upload the image directly. You can then paste that source into the above before/after sections. --> ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task -->
…ng (#3745) # Description This PR fixes the issue where get_done_term returned last episode value rather than current step value. This PR realizes values used for get_term should be different from that used for logging, and mixed useage leads to non-intuitive behavior. using per-step value for logging leads to overcounting and undercounting reported in #2977 using last-episode value for get_term leads to misalignment with expectation reported in #3720 Fixes #2977 #3720 --- The logging behavior remains *mostly* the same as #3107, and and also got rid of the weird overwriting behavior(yay). I get exactly the same termination curve as #3107 when run on `Isaac-Velocity-Rough-Anymal-C-v0` Here is a benchmark summary with 1000 steps running `Isaac-Velocity-Rough-Anymal-C-v0 ` with 4096 envs Before #3107: `| termination.compute | 0.229 ms|` `| termination.reset | 0.007 ms|` PR #3107: `| termination.compute | 0.274 ms|` `| termination.reset | 0.004 ms|` This PR: `| termination.compute | 0.258 ms|` `| termination.reset | 0.004 ms|` We actually see improvement, this is due to the fact that expensive maintenance of last_episode_value is only computed once per compute(#3107 computes last_episode_value for every term) ## Type of change - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Kelly Guo <[email protected]> Co-authored-by: Kelly Guo <[email protected]>
Description
Currently Termination Manager write current step's done count for each term if reset is detected. This leads to two problem.
The cause of the bug is because we are reporting current step status into a buffer that suppose to record episodic done. So instead of write the entire buffer base on current value, we ask the update to respect the non-reseting environment's old value, and instead of reporting count, we report percentage of environment that was done due to the particular term.
Test on Isaac-Velocity-Rough-Anymal-C-v0
Before fix:

Red: num_envs = 4096, Orange: num_envs = 1024
After fix:
Note that curve of the same color ran on same seed, and curves matched exactly, the only difference is the data gets reported in termination. The percentage version is a lot more clear in conveying how agent currently fails, and how much percentage of agent fails, and shows that increasing num_envs to 4096 helps improve agent avoiding termination by
base_contactmuch quicker than num_envs=1024. Such message is a bit hard to tell in first image.Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there