Skip to content

Conversation

@ooctipus
Copy link
Collaborator

@ooctipus ooctipus commented Aug 7, 2025

Description

Currently Termination Manager write current step's done count for each term if reset is detected. This leads to two problem.

  1. User sees different counts just by varying num_envs
  2. the count can be over-count or under-count depending on when reset happens, as pointed out in [Bug Report] Termination Overcounting Caused by Missing Log Buffer Reset in manager_based_rl_env.py #2977 (Thanks, @Kyu3224)

The cause of the bug is because we are reporting current step status into a buffer that suppose to record episodic done. So instead of write the entire buffer base on current value, we ask the update to respect the non-reseting environment's old value, and instead of reporting count, we report percentage of environment that was done due to the particular term.

Test on Isaac-Velocity-Rough-Anymal-C-v0

Before fix:
Screenshot from 2025-08-06 22-16-20
Red: num_envs = 4096, Orange: num_envs = 1024

After fix:

Screenshot from 2025-08-06 22-16-12 Red: num_envs = 4096, Orange: num_envs = 1024

Note that curve of the same color ran on same seed, and curves matched exactly, the only difference is the data gets reported in termination. The percentage version is a lot more clear in conveying how agent currently fails, and how much percentage of agent fails, and shows that increasing num_envs to 4096 helps improve agent avoiding termination by base_contact much quicker than num_envs=1024. Such message is a bit hard to tell in first image.

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@ooctipus ooctipus force-pushed the fix/termination_reporting branch from 6244bfa to 45ff89d Compare August 7, 2025 07:47
@ooctipus ooctipus changed the title Fixed Termination Manager logging to report aggregated percentage of environments done due to each term. Fixes Termination Manager logging to report aggregated percentage of environments done due to each term. Aug 7, 2025
The corresponding termination term value. Shape is (num_envs,).
"""
return self._term_dones[name]
return self._term_dones[name, self._term_names.index(name)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indexing here is not great. We use the value from here to give termination rewards in different environments. Doing the .index every time for reward computation may lead to slow downs (as it searches over the list in O(n) fashion).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh if you like the single tensor approach I can also store one more name -> idx dict, and make change here O(1), I wasn't aware of termination rewards, (my bad!!) and thought this function may be used infrequently. I should do a global search next time rather than assume!

self._term_dones = dict()
for term_name in self._term_names:
self._term_dones[term_name] = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
self._term_dones = torch.zeros((self.num_envs, len(self._term_names)), device=self.device, dtype=torch.bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to change this as a dict of tensors to a single tensor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it last_episode_done_stats = self._term_dones.float().mean(dim=0) this operation is very nice and optimized, thats why I did it, but if you think dict is more clear I can revert back : ))

for i, key in enumerate(self._term_names):
# store information
extras["Episode_Termination/" + key] = torch.count_nonzero(self._term_dones[key][env_ids]).item()
extras["Episode_Termination/" + key] = last_episode_done_stats[i].item()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want the ratio, isn't that simply?

self._term_dones[key][env_ids].sum() / len(env_ids)

Copy link
Collaborator Author

@ooctipus ooctipus Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reveiwing!
I guess, doing with env_ids will be viewing ratio of resetting environments. I thought maybe report stats of all environment can be a bit more nicer as user can verify from the graph that all terms sum up to 1. Of course you can do self._term_dones[key].sum() / self.env.num_envs as well

But it seems like if this approach is what I after, do it in one tensor operation seems quite nice, both speed wise and memory utility wise.

@ooctipus ooctipus force-pushed the fix/termination_reporting branch from 25d4594 to c220593 Compare August 10, 2025 18:28
@github-actions
Copy link

github-actions bot commented Aug 10, 2025

Test Results Summary

2 419 tests   2 011 ✅  2h 22m 39s ⏱️
   90 suites    408 💤
    1 files        0 ❌

Results for commit a0767c6.

♻️ This comment has been updated with latest results.

@ooctipus ooctipus force-pushed the fix/termination_reporting branch 2 times, most recently from ea5fa9a to 41ccce3 Compare August 11, 2025 08:55
@ooctipus
Copy link
Collaborator Author

ooctipus commented Aug 13, 2025

@Mayankm96
I did my best to optimize the operations in compute and reset,
index all changed from O(n) to O(1) -> this part not captured in benchmark below

benchmarked task: velocity rough anymal c, 4096 envs, 1000 steps

before change :
one step total: 79.038

| termination.compute     |       0.229 ms|
| termination.reset       |       0.007 ms|

after change:

one step total: 80.187 ms

| termination.compute     |       0.274 ms|  slower due to one extra operation
| termination.reset       |       0.004 ms|  twice as fast

the cost for this PR is about 0.04 ms / 80, 0.05%, mostly due to compute needs modify other terms's done to correctly update the done buffer. I think this is reasonable

@ooctipus ooctipus force-pushed the fix/termination_reporting branch from 5ace435 to a0767c6 Compare August 13, 2025 20:55
@ooctipus ooctipus merged commit 8dabd3f into main Aug 14, 2025
10 checks passed
@ooctipus ooctipus deleted the fix/termination_reporting branch August 14, 2025 01:04
StephenWelch pushed a commit to StephenWelch/IsaacLab that referenced this pull request Aug 17, 2025
…environments done due to each term. (isaac-sim#3107)

# Description

Currently Termination Manager write current step's done count for each
term if reset is detected. This leads to two problem.
1. User sees different counts just by varying num_envs
2. the count can be over-count or under-count depending on when reset
happens, as pointed out in isaac-sim#2977 (Thanks, @Kyu3224)

The cause of the bug is because we are reporting current step status
into a buffer that suppose to record episodic done. So instead of write
the entire buffer base on current value, we ask the update to respect
the non-reseting environment's old value, and instead of reporting
count, we report percentage of environment that was done due to the
particular term.

Test on Isaac-Velocity-Rough-Anymal-C-v0

Before fix:
<img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-20"
src="https://github.com/user-attachments/assets/4838d612-7f0e-4232-a07e-688b547e91db"
/>
Red: num_envs = 4096, Orange: num_envs = 1024

After fix:

<img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-12"
src="https://github.com/user-attachments/assets/e6e55c21-17ed-42ca-8d94-a19d08611f86"
/>
Red: num_envs = 4096, Orange: num_envs = 1024

Note that curve of the same color ran on same seed, and curves matched
exactly, the only difference is the data gets reported in termination.
The percentage version is a lot more clear in conveying how agent
currently fails, and how much percentage of agent fails, and shows that
increasing num_envs to 4096 helps improve agent avoiding termination by
`base_contact` much quicker than num_envs=1024. Such message is a bit
hard to tell in first image.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->
george-nehma pushed a commit to george-nehma/IsaacLab-Dreamerv3 that referenced this pull request Oct 24, 2025
…environments done due to each term. (isaac-sim#3107)

# Description

Currently Termination Manager write current step's done count for each
term if reset is detected. This leads to two problem.
1. User sees different counts just by varying num_envs
2. the count can be over-count or under-count depending on when reset
happens, as pointed out in isaac-sim#2977 (Thanks, @Kyu3224)

The cause of the bug is because we are reporting current step status
into a buffer that suppose to record episodic done. So instead of write
the entire buffer base on current value, we ask the update to respect
the non-reseting environment's old value, and instead of reporting
count, we report percentage of environment that was done due to the
particular term.

Test on Isaac-Velocity-Rough-Anymal-C-v0

Before fix:
<img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-20"
src="https://github.com/user-attachments/assets/4838d612-7f0e-4232-a07e-688b547e91db"
/>
Red: num_envs = 4096, Orange: num_envs = 1024

After fix:

<img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-12"
src="https://github.com/user-attachments/assets/e6e55c21-17ed-42ca-8d94-a19d08611f86"
/>
Red: num_envs = 4096, Orange: num_envs = 1024

Note that curve of the same color ran on same seed, and curves matched
exactly, the only difference is the data gets reported in termination.
The percentage version is a lot more clear in conveying how agent
currently fails, and how much percentage of agent fails, and shows that
increasing num_envs to 4096 helps improve agent avoiding termination by
`base_contact` much quicker than num_envs=1024. Such message is a bit
hard to tell in first image.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->
kellyguo11 added a commit that referenced this pull request Nov 7, 2025
…ng (#3745)

# Description

This PR fixes the issue where get_done_term returned last episode value
rather than current step value.

This PR realizes values used for get_term should be different from that
used for logging, and mixed useage leads to non-intuitive behavior.

using per-step value for logging leads to overcounting and undercounting
reported in #2977
using last-episode value for get_term leads to misalignment with
expectation reported in #3720

Fixes #2977 #3720 

---

The logging behavior remains *mostly* the same as #3107, and and also
got rid of the weird overwriting behavior(yay).
I get exactly the same termination curve as #3107 when run on
`Isaac-Velocity-Rough-Anymal-C-v0`

Here is a benchmark summary with 1000 steps running
`Isaac-Velocity-Rough-Anymal-C-v0 ` with 4096 envs

Before #3107: 

`| termination.compute     |       0.229 ms|`
`| termination.reset       |       0.007 ms|`

PR #3107: 

`| termination.compute     |       0.274 ms|`
`| termination.reset       |       0.004 ms|`

This PR: 

`| termination.compute     |       0.258 ms|`
`| termination.reset       |       0.004 ms|`

We actually see improvement, this is due to the fact that expensive
maintenance of last_episode_value is only computed once per
compute(#3107 computes last_episode_value for every term)

## Type of change

- Bug fix (non-breaking change which fixes an issue)

## Checklist

- [x] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Signed-off-by: Kelly Guo <[email protected]>
Co-authored-by: Kelly Guo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants