Separates per-step termination and last-episode termination bookkeeping #3745
+25
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes the issue where get_done_term returned last episode value rather than current step value.
This PR realizes values used for get_term should be different from that used for logging, and mixed useage leads to non-intuitive behavior.
using per-step value for logging leads to overcounting and undercounting reported in #2977
using last-episode value for get_term leads to misalignment with expectation reported in #3720
Fixes #2977 #3720
Type of change
The logging behavior remains mostly the same as #3107, and and also got rid of the weird overwriting behavior(yay).
I get exactly the same termination curve as #3107 when run on
Isaac-Velocity-Rough-Anymal-C-v0
Here is a benchmark summary with 1000 steps running
Isaac-Velocity-Rough-Anymal-C-v0
with 4096 envsBefore #3107:
| termination.compute | 0.229 ms|
| termination.reset | 0.007 ms|
PR #3107:
| termination.compute | 0.274 ms|
| termination.reset | 0.004 ms|
This PR:
| termination.compute | 0.258 ms|
| termination.reset | 0.004 ms|
We actually see improvement, this is due to the fact that expensive maintenance of last_episode_value is only computed once per compute(#3107 computes last_episode_value for every term)
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commit
checks with./isaaclab.sh --format
config/extension.toml
fileCONTRIBUTORS.md
or my name already exists there