Skip to content

[Bug Report] TerminationManager._term_dones not updated per step (stale values during rollout) #3720

@Weifan408

Description

@Weifan408

Describe the bug

Hi, thank you for your excellent work.
I noticed that in the current implementation of TerminationManager.compute(), the internal buffer self._term_dones is only updated when a termination actually occurs, but not during the normal running steps of the environment.

Current Behavior

def compute(self) -> torch.Tensor:
        """Computes the termination signal as union of individual terms.

        This function calls each termination term managed by the class and performs a logical OR operation
        to compute the net termination signal.

        Returns:
            The combined termination signal of shape (num_envs,).
        """
        # reset computation
        self._truncated_buf[:] = False
        self._terminated_buf[:] = False
        # iterate over all the termination terms
        for i, term_cfg in enumerate(self._term_cfgs):
            value = term_cfg.func(self._env, **term_cfg.params)
            # store timeout signal separately
            if term_cfg.time_out:
                self._truncated_buf |= value
            else:
                self._terminated_buf |= value
            # add to episode dones
            rows = value.nonzero(as_tuple=True)[0]  # indexing is cheaper than boolean advance indexing
            if rows.numel() > 0: 
                self._term_dones[rows] = False
                self._term_dones[rows, i] = True
        # return combined termination signal
        return self._truncated_buf | self._terminated_buf

Here, self._term_dones is only modified when rows.numel() > 0, i.e., when a specific termination condition is triggered.
During the rest of the rollout (normal running steps), self._term_dones keeps the values from the previous episode, and is therefore stale.

Consequences

For logging (e.g. inside reset()), this behavior is harmless — it still correctly summarizes the last episode via:

        last_episode_done_stats = self._term_dones.float().mean(dim=0)

However, for runtime queries like:

get_term(name)
get_active_iterable_terms(env_idx)

the returned tensors will not reflect the current state of termination terms, but rather the previous episode’s data.
This can be problematic if a user or module expects real-time termination signals per environment step.

Expected Behavior

self._term_dones or a new parameter should be continuously updated at each compute() call, get_term(name) returns the correct, up-to-date boolean tensor for the current environment step — even when no episode termination occurs.

Suggestion

A minimal fix could be to explicitly clear the term buffer before the loop:

self._term_dones[:] = False

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions