-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the bug
In the current implementation of manager_based_rl_env.py
, the self.extras['log']
buffer retains termination statistics from previous timesteps if no environment reset occurs during a rollout. Since rsl_rl’s logging system averages these values per rollout, this leads to duplicated counting of terminations and inflated termination ratios.
Specifically, even when reset_env_ids
is empty, the termination logs are not cleared or updated appropriately, causing multiple counting of the same termination event across subsequent timesteps.
This issue is demonstrated in the attached logs and visualizations, where termination counts exceed the expected values based on actual resets.
Steps to reproduce
I conducted experiments using an environment that logs 2 types of rewards and 3 types of terminations, excluding the timeout termination. The setup uses the rsl_rl
framework with PPO, with 200 environments and num_steps_per_env = 7
.
Unfortunately, I’m unable to share the specific environment due to it being part of a proprietary research platform, but I believe the issue is not limited to this particular environment.
self.extras['log']
is passed to the def log function in on_policy_runner.py
of rsl_rl
, where the logged values are averaged and printed. However, if the environment does not reset, it reuses the previous termination and log values in the averaging process. This can lead to inaccurate logging.
I will attach a log visualization comparing the original code with the version I modified for debugging, to better illustrate this behavior.
# -- reset envs that terminated/timed-out and log the episode information
reset_env_ids = self.reset_buf.nonzero(as_tuple=False).squeeze(-1)
if len(reset_env_ids) > 0:
# trigger recorder terms for pre-reset calls
self.recorder_manager.record_pre_reset(reset_env_ids)
self._reset_idx(reset_env_ids)
# update articulation kinematics
self.scene.write_data_to_sim()
self.sim.forward()
# if sensors are added to the scene, make sure we render to reflect changes in reset
if self.sim.has_rtx_sensors() and self.cfg.rerender_on_reset:
self.sim.render()
# trigger recorder terms for post-reset calls
self.recorder_manager.record_post_reset(reset_env_ids)
Test code that i used
print("BEFORE:", end=" ")
print(", ".join(
f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
for k, v in self.extras["log"].items()
))
print(f"RESET? : {reset_env_ids.cpu().numpy()}")
if len(reset_env_ids) > 0:
# trigger recorder terms for pre-reset calls
self.recorder_manager.record_pre_reset(reset_env_ids)
self._reset_idx(reset_env_ids)
print("RESET:", end=" ")
print(", ".join(
f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
for k, v in self.extras["log"].items()
))
# update articulation kinematics
self.scene.write_data_to_sim()
self.sim.forward()
# if sensors are added to the scene, make sure we render to reflect changes in reset
if self.sim.has_rtx_sensors() and self.cfg.rerender_on_reset:
self.sim.render()
# trigger recorder terms for post-reset calls
self.recorder_manager.record_post_reset(reset_env_ids)
else:
print("NO RESET:", end=" ")
print(", ".join(
f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
for k, v in self.extras["log"].items()
))
For the logs below, please consider that the labels BEFORE:, RESET?, RESET:, or NO RESET
occur sequentially within a single rollout.
In manager_based_rl_env.py, a reset happened at the second step for environment number 137 due to termination condition 2. After that timestep, no resets occurred until the iteration ended. However, self.extras['log']
continued to record the previous value of Episode_Termination/term2 as 1.0000.
This caused the overall log at the bottom to show a termination ratio of 0.8571 for that termination condition.
In reality, since 200 environments perform rollouts for 7 timesteps, and only one termination happened, the termination ratio should be recorded as 1/7 = 0.1428.
Yet, due to the missing reset, the termination is counted repeatedly, resulting in an inflated termination ratio.

Similarly, in this figure, at the 4th timestep, environments [29, 166, 192] terminated due to termination condition 3. Then, at the 6th timestep, environment [7] terminated under the same condition and was reset.
However, at the 5th timestep, the log still shows that 3 environments are terminated. Because of this, the total count of terminations due to condition 3 in the rollout sums up as 1 + 3 + 3 + 1 + 1 = 9, resulting in a recorded termination ratio of 9/7 = 1.2857.
This again indicates duplicated counting of terminations before proper resets occur.

In summary, this behavior is expected to cause more terminations to be recorded than actually occur. That is, even when reset_env_ids
is empty (i.e., zero), it is necessary to properly reset the buffer in self.extras
.
Since rsl_rl
relies on this information, if this behavior is unintended and considered a bug, I am willing to submit a PR with my fix.
System Info
Describe the characteristic of your environment:
- Commit: 0b826f7
- Isaac Sim Version: 4.5.0
- OS: Ubuntu 20.04
- GPU: RTX 4060-ti 16GB
- CUDA: 11.8
- GPU Driver: 550.135
Additional context
Add any other context about the problem here.
Checklist
- [O] I have checked that there is no similar issue in the repo (required)
- [O] I have checked that the issue is not in running Isaac Sim itself and is related to the repo
Acceptance Criteria
Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.
- Criteria 1
- Criteria 2