Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions trl/models/activation_offloading.py
Original file line number Diff line number Diff line change
Expand Up @@ -538,6 +538,21 @@ def hook(outputs, inputs):
unpack_tensor = unpack_tensor_with_streams if self.use_streams else unpack_tensor_single_stream
super().__init__(pack_tensor, unpack_tensor)

def __enter__(self):
# Drop stale state from any prior step where saved tensors didn't unpack
# (e.g. MoE expert paths under torch.compile). is_first_forward_call only
# resets when tracker empties during backward, so leaked entries pin GPU
# memory across iterations -> linear VRAM leak
self.tracker.clear()
self.storage_to_tensor_id.clear()
if self.use_streams:
self.fwd_stash.clear()
self.bwd_tensor_stash.clear()
self.bwd_ev_stash.clear()
self.is_first_forward_call = True
self.is_first_backward_call = True
return super().__enter__()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added __enter__ is dead code, shadowed by later definition

Medium Severity

The newly added __enter__ at line 541 is completely dead code because a second __enter__ definition already exists at line 588 in the same class. In Python, the later definition silently overrides the earlier one, so this method will never execute. While the existing __enter__ at line 588 covers the same cleanup (plus tensor_id reset and BNB cache clearing), the duplicate is confusing and risks future developers modifying the wrong one.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit de13b45. Configure here.


def update_model_params(self, model: nn.Module):
"""
Update the set of parameter storage pointers from the model. This allows filtering out model parameters during
Expand Down