Skip to content

[Bug Report] Termination Overcounting Caused by Missing Log Buffer Reset in manager_based_rl_env.py #2977

@Kyu3224

Description

@Kyu3224

Describe the bug

In the current implementation of manager_based_rl_env.py, the self.extras['log'] buffer retains termination statistics from previous timesteps if no environment reset occurs during a rollout. Since rsl_rl’s logging system averages these values per rollout, this leads to duplicated counting of terminations and inflated termination ratios.

Specifically, even when reset_env_ids is empty, the termination logs are not cleared or updated appropriately, causing multiple counting of the same termination event across subsequent timesteps.

This issue is demonstrated in the attached logs and visualizations, where termination counts exceed the expected values based on actual resets.

Steps to reproduce

I conducted experiments using an environment that logs 2 types of rewards and 3 types of terminations, excluding the timeout termination. The setup uses the rsl_rl framework with PPO, with 200 environments and num_steps_per_env = 7.

Unfortunately, I’m unable to share the specific environment due to it being part of a proprietary research platform, but I believe the issue is not limited to this particular environment.

self.extras['log'] is passed to the def log function in on_policy_runner.py of rsl_rl, where the logged values are averaged and printed. However, if the environment does not reset, it reuses the previous termination and log values in the averaging process. This can lead to inaccurate logging.

I will attach a log visualization comparing the original code with the version I modified for debugging, to better illustrate this behavior.

Current code

        # -- reset envs that terminated/timed-out and log the episode information
        reset_env_ids = self.reset_buf.nonzero(as_tuple=False).squeeze(-1)
        if len(reset_env_ids) > 0:
            # trigger recorder terms for pre-reset calls
            self.recorder_manager.record_pre_reset(reset_env_ids)

            self._reset_idx(reset_env_ids)
            # update articulation kinematics
            self.scene.write_data_to_sim()
            self.sim.forward()

            # if sensors are added to the scene, make sure we render to reflect changes in reset
            if self.sim.has_rtx_sensors() and self.cfg.rerender_on_reset:
                self.sim.render()

            # trigger recorder terms for post-reset calls
            self.recorder_manager.record_post_reset(reset_env_ids)

Test code that i used

        print("BEFORE:", end=" ")
        print(", ".join(
            f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
            for k, v in self.extras["log"].items()
        ))
        print(f"RESET? : {reset_env_ids.cpu().numpy()}")
        if len(reset_env_ids) > 0:
            # trigger recorder terms for pre-reset calls
            self.recorder_manager.record_pre_reset(reset_env_ids)

            self._reset_idx(reset_env_ids)
            print("RESET:", end=" ")
            print(", ".join(
                f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
                for k, v in self.extras["log"].items()
            ))            
             # update articulation kinematics
            self.scene.write_data_to_sim()
            self.sim.forward()

            # if sensors are added to the scene, make sure we render to reflect changes in reset
            if self.sim.has_rtx_sensors() and self.cfg.rerender_on_reset:
                self.sim.render()

            # trigger recorder terms for post-reset calls
            self.recorder_manager.record_post_reset(reset_env_ids)
        else:
            print("NO RESET:", end=" ")
            print(", ".join(
                f"{k}: {', '.join(f'{float(x):.4f}' for x in (v.tolist() if v.numel() > 1 else [v.item()]))}"
                for k, v in self.extras["log"].items()
            ))

For the logs below, please consider that the labels BEFORE:, RESET?, RESET:, or NO RESET occur sequentially within a single rollout.

In manager_based_rl_env.py, a reset happened at the second step for environment number 137 due to termination condition 2. After that timestep, no resets occurred until the iteration ended. However, self.extras['log'] continued to record the previous value of Episode_Termination/term2 as 1.0000.

This caused the overall log at the bottom to show a termination ratio of 0.8571 for that termination condition.

In reality, since 200 environments perform rollouts for 7 timesteps, and only one termination happened, the termination ratio should be recorded as 1/7 = 0.1428.

Yet, due to the missing reset, the termination is counted repeatedly, resulting in an inflated termination ratio.

Image

Similarly, in this figure, at the 4th timestep, environments [29, 166, 192] terminated due to termination condition 3. Then, at the 6th timestep, environment [7] terminated under the same condition and was reset.

However, at the 5th timestep, the log still shows that 3 environments are terminated. Because of this, the total count of terminations due to condition 3 in the rollout sums up as 1 + 3 + 3 + 1 + 1 = 9, resulting in a recorded termination ratio of 9/7 = 1.2857.

This again indicates duplicated counting of terminations before proper resets occur.

Image

In summary, this behavior is expected to cause more terminations to be recorded than actually occur. That is, even when reset_env_ids is empty (i.e., zero), it is necessary to properly reset the buffer in self.extras.

Since rsl_rl relies on this information, if this behavior is unintended and considered a bug, I am willing to submit a PR with my fix.

System Info

Describe the characteristic of your environment:

  • Commit: 0b826f7
  • Isaac Sim Version: 4.5.0
  • OS: Ubuntu 20.04
  • GPU: RTX 4060-ti 16GB
  • CUDA: 11.8
  • GPU Driver: 550.135

Additional context

Add any other context about the problem here.

Checklist

  • [O] I have checked that there is no similar issue in the repo (required)
  • [O] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

  • Criteria 1
  • Criteria 2

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions