[Bug Report] Termination Overcounting Caused by Missing Log Buffer Reset in manager_based_rl_env.py

### Describe the bug

In the current implementation of `manager_based_rl_env.py`, the `self.extras['log']` buffer retains termination statistics from previous timesteps if no environment reset occurs during a rollout. Since rsl_rl’s logging system averages these values per rollout, this leads to duplicated counting of terminations and inflated termination ratios.

Specifically, even when `reset_env_ids` is empty, the termination logs are not cleared or updated appropriately, causing multiple counting of the same termination event across subsequent timesteps.

This issue is demonstrated in the attached logs and visualizations, where termination counts exceed the expected values based on actual resets.

### Steps to reproduce

I conducted experiments using an environment that logs 2 types of rewards and 3 types of terminations, excluding the timeout termination. The setup uses the `rsl_rl` framework with PPO, with 200 environments and `num_steps_per_env = 7`.

Unfortunately, I’m unable to share the specific environment due to it being part of a proprietary research platform, but I believe the issue is not limited to this particular environment.

`self.extras['log']` is passed to the [def log](https://github.com/leggedrobotics/rsl_rl/blob/main/rsl_rl/runners/on_policy_runner.py) function in `on_policy_runner.py` of `rsl_rl`, where the logged values are averaged and printed. However, if the environment does not reset, it reuses the previous termination and log values in the averaging process. This can lead to inaccurate logging.

I will attach a log visualization comparing the original code with the version I modified for debugging, to better illustrate this behavior.


[Current code](https://github.com/isaac-sim/IsaacLab/blob/main/source/isaaclab/isaaclab/envs/manager_based_rl_env.py#L216-231)

```python
        # -- reset envs that terminated/timed-out and log the episode information
        reset_env_ids = self.reset_buf.nonzero(as_tuple=False).squeeze(-1)
        if len(reset_env_ids) > 0:
            # trigger recorder terms for pre-reset calls
            self.recorder_manager.record_pre_reset(reset_env_ids)

            self._reset_idx(reset_env_ids)
            # update articulation kinematics
            self.scene.write_data_to_sim()
            self.sim.forward()

            # if sensors are added to the scene, make sure we render to reflect changes in reset
            if self.sim.has_rtx_sensors() and self.cfg.rerender_on_reset:
                self.sim.render()

            # trigger recorder terms for post-reset calls
            self.recorder_manager.record_post_reset(reset_env_ids)
```
Test code that i used
```python
        print("BEFORE:", end=" ")
        print(", ".join(
            f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
            for k, v in self.extras["log"].items()
        ))
        print(f"RESET? : {reset_env_ids.cpu().numpy()}")
        if len(reset_env_ids) > 0:
            # trigger recorder terms for pre-reset calls
            self.recorder_manager.record_pre_reset(reset_env_ids)

            self._reset_idx(reset_env_ids)
            print("RESET:", end=" ")
            print(", ".join(
                f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
                for k, v in self.extras["log"].items()
            ))            
             # update articulation kinematics
            self.scene.write_data_to_sim()
            self.sim.forward()

            # if sensors are added to the scene, make sure we render to reflect changes in reset
            if self.sim.has_rtx_sensors() and self.cfg.rerender_on_reset:
                self.sim.render()

            # trigger recorder terms for post-reset calls
            self.recorder_manager.record_post_reset(reset_env_ids)
        else:
            print("NO RESET:", end=" ")
            print(", ".join(
                f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
                for k, v in self.extras["log"].items()
            ))
```

For the logs below, please consider that the labels `BEFORE:, RESET?, RESET:, or NO RESET` occur sequentially within a single rollout.

In manager_based_rl_env.py, a reset happened at the second step for environment number 137 due to termination condition 2. After that timestep, no resets occurred until the iteration ended. However, `self.extras['log']` continued to record the previous value of Episode_Termination/term2 as 1.0000.

This caused the overall log at the bottom to show a termination ratio of 0.8571 for that termination condition.

In reality, since 200 environments perform rollouts for 7 timesteps, and only one termination happened, the termination ratio should be recorded as 1/7 = 0.1428.

Yet, due to the missing reset, the termination is counted repeatedly, resulting in an inflated termination ratio.

<img width="1688" height="831" alt="Image" src="https://github.com/user-attachments/assets/c9a9220b-368f-4563-8013-7cb2abf47788" />

Similarly, in this figure, at the 4th timestep, environments [29, 166, 192] terminated due to termination condition 3. Then, at the 6th timestep, environment [7] terminated under the same condition and was reset.

However, at the 5th timestep, the log still shows that 3 environments are terminated. Because of this, the total count of terminations due to condition 3 in the rollout sums up as 1 + 3 + 3 + 1 + 1 = 9, resulting in a recorded termination ratio of 9/7 = 1.2857.

This again indicates duplicated counting of terminations before proper resets occur.
 
<img width="1696" height="816" alt="Image" src="https://github.com/user-attachments/assets/3f8683bf-05d4-4571-9330-8f00be2b9836" />

In summary, this behavior is expected to cause more terminations to be recorded than actually occur. That is, even when `reset_env_ids` is empty (i.e., zero), it is necessary to properly reset the buffer in `self.extras`.

Since `rsl_rl` relies on this information, if this behavior is unintended and considered a bug, I am willing to submit a PR with my fix.

### System Info

Describe the characteristic of your environment:


- Commit: 0b826f72
- Isaac Sim Version: 4.5.0
- OS: Ubuntu 20.04
- GPU: RTX 4060-ti 16GB
- CUDA: 11.8
- GPU Driver: 550.135

### Additional context

Add any other context about the problem here.

### Checklist

- [O] I have checked that there is no similar issue in the repo (**required**)
- [O] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

### Acceptance Criteria

Add the criteria for which this task is considered **done**. If not known at issue creation time, you can add this once the issue is assigned.

- [ ] Criteria 1
- [ ] Criteria 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug Report] Termination Overcounting Caused by Missing Log Buffer Reset in manager_based_rl_env.py #2977

Describe the bug

Steps to reproduce

System Info

Additional context

Checklist

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug Report] Termination Overcounting Caused by Missing Log Buffer Reset in manager_based_rl_env.py #2977

Description

Describe the bug

Steps to reproduce

System Info

Additional context

Checklist

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions