[FSDP] Add Masked importance sampling by zijiexia · Pull Request #1122 · THUDM/slime

zijiexia · 2025-12-15T22:08:53Z

Add masked importance sampling for both token level and sequence level as #1063 .
Results from @GuanxingLu:

Summary:

Directly use compute_mis_weights func from megatron backend
Add a pytest file (tests/test_fsdp_mis.py)

run with 4xH200 GPUs (using examples/train_infer_mismatch_helper/run-qwen3-4b-fsdp-mis.sh):

Unfortunately, the original mismatch of training and rollout engine is quite marginal, so the MIS just has no effect. Thus added a pytest script to test the functionality.

rewrite masked sum

Add masked importance sampling for FSDP backend (THUDM#1063).

GuanxingLu · 2025-12-17T06:11:49Z

@PopSoda2002 Hi, could you please review this? Thank you!

slime/backends/fsdp_utils/actor.py

zhaochenyang20 · 2025-12-18T02:00:05Z

slime/backends/megatron_utils/loss.py

    calculates PPO-style clipped policy gradient loss. For GSPO, gathers
    full sequences via context-parallel all-gather before computing per-sample
-    KL. Optionally applies TIS (Temporal Importance Sampling) correction and
+    KL. Optionally applies TIS (Truncated Importance Sampling) correction and


Nice catch 😂

tests/test_fsdp_mis.py

PopSoda2002

In my opinion, this PR should not be so large:

Delete the script and test it locally
We can add an argument like use-mis and implement the MIS func
Do not change the code a big diff since it's just a small func, currently the code is harder to read and may introduce higher potential for bug

zhaochenyang20 · 2025-12-18T02:42:35Z

slime/backends/fsdp_utils/actor.py

+    def _compute_tis_weights(
+        self,
+        old_log_probs: torch.Tensor,
+        rollout_log_probs: torch.Tensor,
+        loss_masks: list[torch.Tensor],
+        response_lengths: list[int],
+    ) -> tuple[torch.Tensor, dict[str, torch.Tensor]]:
+        """Compute Importance Sampling weights for TIS/MIS.
+
+        Supports both token-level and sequence-level aggregation, and truncate/mask modes.
+        """
+        tis_mode = self.args.tis_mode if self.args.tis_mode is not None else "truncate"
+        tis_level = self.args.tis_level if self.args.tis_level is not None else "token"
+        tis_clip_low = self.args.tis_clip_low if self.args.tis_clip_low is not None else 0.1
+        tis_clip_high = self.args.tis_clip if self.args.tis_clip is not None else 2.0
+
+        log_ratio = old_log_probs - rollout_log_probs
+
+        # Calculate raw TIS weights based on level
+        if tis_level == "token":
+            tis = torch.exp(log_ratio)
+        elif tis_level == "sequence":
+            tis_list = []
+            for seq_log_ratio, mask in zip(log_ratio.split(response_lengths, dim=0), loss_masks, strict=False):
+                seq_mask = mask.to(seq_log_ratio.device)
+                sum_log_ratio = masked_sum(seq_log_ratio, seq_mask, expand=True)
+                seq_tis = torch.exp(sum_log_ratio)
+                tis_list.append(seq_tis)
+            tis = torch.cat(tis_list, dim=0)
+        else:
+            raise ValueError(f"Unsupported tis_level: {tis_level}")
+
+        # Apply mode (truncate or mask)
+        if tis_mode == "truncate":
+            tis_clip = torch.clamp(tis, min=tis_clip_low, max=tis_clip_high)
+        elif tis_mode == "mask":
+            mask = (tis >= tis_clip_low) & (tis <= tis_clip_high)
+            tis_clip = tis * mask.float()
+        else:
+            raise ValueError(f"Unsupported tis_mode: {tis_mode}")
+
+        tis_clipfrac = tis_clip != tis
+
+        return tis_clip, tis, tis_clipfrac
+


Are you sure that this function should be put under fsdp_utils? You can refer to where we put similar function of Megatron.

zhaochenyang20 · 2025-12-18T02:43:12Z

slime/backends/fsdp_utils/actor.py

+def vanilla_tis_function_fsdp(
+    args,
+    *,
+    pg_loss: torch.Tensor,
+    train_log_probs: list[torch.Tensor],
+    rollout_log_probs: list[torch.Tensor],
+    loss_masks: list[torch.Tensor],
+    **kwargs,
+) -> tuple[torch.Tensor, list[torch.Tensor], dict[str, torch.Tensor]]:
+    """Apply TIS off-policy correction using importance sampling.
+
+    Parameters:
+        args: Arguments containing TIS settings.
+        pg_loss: Policy gradient loss tensor of shape [total_seq_len - 1].
+        train_log_probs: List of tensors containing training log-probabilities
+            for each sequence.
+        rollout_log_probs: List of tensors containing rollout log-probabilities
+            for each sequence.
+        loss_masks: List of tensors containing loss masks for each sequence.
+    """
+    rollout_log_probs_flat = torch.cat(rollout_log_probs, dim=0)
+    train_log_probs_flat = torch.cat(train_log_probs, dim=0)
+
+    tis = torch.exp(train_log_probs_flat - rollout_log_probs_flat)
+    tis_abs = (tis - 1).abs()
+
+    tis_clip_low = args.tis_clip_low if args.tis_clip_low is not None else 0.1
+    tis_clip_high = args.tis_clip if args.tis_clip is not None else 2.0
+    tis_clip = torch.clamp(tis, min=tis_clip_low, max=tis_clip_high)
+    tis_clipfrac = (tis_clip != tis).float()
+
+    metrics = {
+        "tis": tis.clone().detach(),
+        "tis_clipfrac": tis_clipfrac.clone().detach(),
+        "tis_abs": tis_abs.clone().detach(),
+    }
+    pg_loss = pg_loss * tis_clip
+
+    return pg_loss, loss_masks, metrics


As I mentioned, maybe these functions should not be put here. We want to keep actor.py as clean as possible.

Hi @zijiexia, can we just remove vanilla_tis_function_fsdp (it should not be used as we can specify custom-tis-function-path to use compute_mis_weights in examples/train_infer_mismatch_helper/mis.py) and _compute_tis_weights (it is not used now)?

Hi @GuanxingLu , I've made the following changes:

Cleanup the unused functions in actor.py.

I move vanilla_tis_function_fsdp to ppo_utils.py as I think we do need it, following the same pattern as Megatron: referring to this

slime/slime/backends/megatron_utils/loss.py

Line 526 in 461fc8a

tis_func = vanilla_tis_function

I move the compute_mis_weights_fsdp to mis.py.

I didn't add a new use-mis arg as I'm trying to follow the same parameter system in mis.yaml. Could you take a look at it and let me know what you think before I mark it back to ready for review? Thanks!

Looks good to me!

zijiexia · 2025-12-18T17:13:47Z

Hi @PopSoda2002 @zhaochenyang20 , thanks for the review, I've refactor the code accordingly:

Delete the test script.
Remove the unused functions (sorry!) and cleanup the actor.py
I didn't add an additional use-mis args since I was trying to follow the same pattern as Refactoring training inference importance sampling with seqeunce/geometry level #429. Please let me know what you think, I'll further refactor accordingly.

Thank you!

zhaochenyang20 · 2025-12-21T20:51:51Z

thanks! Sorry for the late reply. Let me and Huapeng review this @PopSoda2002

PopSoda2002

It looks pretty nice now! Thanks for your great work!

zhaochenyang20 · 2025-12-22T05:13:50Z

Nice done Zijie! @zijiexia

PopSoda2002 · 2025-12-22T05:15:52Z

Nice done Zijie! @zijiexia

I think @GuanxingLu does important contribute to this PR also cc @zhaochenyang20 😂

zijiexia · 2025-12-22T05:19:07Z

Nice done Zijie! @zijiexia

@GuanxingLu start this before I join so most credit should goes to him

GuanxingLu · 2025-12-22T05:31:20Z

Appreciate it, we all made a lot, happy to contribute!

Co-authored-by: Guanxing Lu <747398423@qq.com>

zijiexia marked this pull request as draft December 16, 2025 00:34

zijiexia changed the title ~~[FSDP] Add Masked importance sampling #1063~~ [WIP][FSDP] Add Masked importance sampling #1063 Dec 16, 2025

support token/seq level MIS for FSDP

51954ed

rewrite masked sum

zijiexia force-pushed the zijie_dev_branch branch from 68bf817 to 51954ed Compare December 16, 2025 01:48

zijiexia marked this pull request as ready for review December 16, 2025 02:03

zijiexia changed the title ~~[WIP][FSDP] Add Masked importance sampling #1063~~ [FSDP] Add Masked importance sampling Dec 16, 2025

GuanxingLu and others added 3 commits December 16, 2025 00:43

Add masked importance sampling for FSDP backend (THUDM#1063).

471420b

Merge branch 'zijie_dev_branch' into feature/fsdp-mis

227b679

Merge pull request #1 from GuanxingLu/feature/fsdp-mis

8c8a782

Add masked importance sampling for FSDP backend (THUDM#1063).

zijiexia mentioned this pull request Dec 18, 2025

[FSDP] Add Masked importance sampling #1063

Closed

zhaochenyang20 requested changes Dec 18, 2025

View reviewed changes

zijiexia added 2 commits December 17, 2025 18:18

avoid getattr, remove mis test for FSDP

d44e8d0

remove further getattr

1105f26

PopSoda2002 reviewed Dec 18, 2025

View reviewed changes

zhaochenyang20 requested changes Dec 18, 2025

View reviewed changes

remove unnecessary funcs, cleanup actor.py

464e8ec

zijiexia marked this pull request as draft December 18, 2025 06:42

fix

4e633d3

zijiexia marked this pull request as ready for review December 18, 2025 17:13

zijiexia requested review from PopSoda2002 and zhaochenyang20 December 18, 2025 17:13

PopSoda2002 approved these changes Dec 21, 2025

View reviewed changes

zhuzilin merged commit b74858a into THUDM:main Dec 22, 2025
7 checks passed

Yangruipis pushed a commit to rednote-ai/slime that referenced this pull request Feb 28, 2026

[FSDP] Add Masked importance sampling (THUDM#1122)

b4714f9

Co-authored-by: Guanxing Lu <747398423@qq.com>

Conversation

zijiexia commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GuanxingLu commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PopSoda2002 left a comment

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

GuanxingLu Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

zijiexia Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

GuanxingLu Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zijiexia commented Dec 18, 2025

Uh oh!

zhaochenyang20 commented Dec 21, 2025

Uh oh!

PopSoda2002 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhaochenyang20 commented Dec 22, 2025

Uh oh!

PopSoda2002 commented Dec 22, 2025

Uh oh!

zijiexia commented Dec 22, 2025

Uh oh!

GuanxingLu commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zijiexia commented Dec 15, 2025 •

edited

Loading

GuanxingLu commented Dec 17, 2025 •

edited

Loading

GuanxingLu Dec 18, 2025 •

edited

Loading