Refactoring training inference importance sampling with seqeunce/geometry level by zhaochenyang20 · Pull Request #429 · THUDM/slime

zhaochenyang20 · 2025-10-06T04:22:05Z

Thanks so much to the contribution of this paper When Speed Kills Stability: Demystifying RL Collapse from the Training-Inference Mismatch

This PR refactors the Importance Sampling (IS) functionality, replacing the legacy --use-tis parameter with a more flexible --use-train-infer-is parameter system. We introduce multiple aggregation levels and processing modes to handle training-inference mismatch problems better.

Parameter System Refactoring

Removed legacy parameters: Deleted --use-tis, --tis-clip, --tis-clip-low parameters
New parameters:
- --use-train-infer-is: Enable training-inference importance sampling
- --train-infer-is-level: Aggregation level (token/sequence/geometric)
- --train-infer-is-mode: Processing mode (truncate/mask/clip)
- --train-infer-is-lower-bound/--train-infer-is-upper-bound: Weight bounds
- --train-infer-is-veto-threshold: Catastrophic token threshold

Aggregation Levels

Token Level (default):

Computes importance weights independently for each token
Formula: w_i = exp(log π_train(x_i) - log π_rollout(x_i))
Characteristics: Biased but computationally simple, suitable for most scenarios

Sequence Level:

Uses the product of all token weights as the sequence weight
Formula: w_seq = exp(Σ(log π_train(x_i) - log π_rollout(x_i)))
Characteristics: Unbiased but high variance, suitable for sequence-level optimization

Geometric Level:

Uses geometric mean to compute sequence weights
Formula: w_seq = exp(mean(log π_train(x_i) - log π_rollout(x_i)))
Characteristics: Biased but low variance, balances bias and variance

Processing Modes

Truncate Mode (TIS):

Clips weights exceeding the upper bound to the upper bound
Maintains original TIS behavior, suitable for variance control

Mask Mode (MIS):

Sets weights outside [lower, upper] range to zero
More aggressive filtering strategy, suitable for handling extreme mismatches

Clip Mode (CIS):

Constrains weights within [lower, upper] range
Balanced truncation strategy

Others

Catastrophic token detection: Detects and filters sequences containing catastrophic tokens via --train-infer-is-veto-threshold
monitoring metrics: Added training/inference perplexity, KL divergence, K3 KL estimator, and more

Usage

# Using geometric mean + mask mode
--use-train-infer-is \
--train-infer-is-level geometric \
--train-infer-is-mode mask \
--train-infer-is-lower-bound 0.5 \
--train-infer-is-upper-bound 2.0 \
--train-infer-is-veto-threshold 1e-3

zhaochenyang20 · 2025-10-06T04:37:36Z

/gemini review

yitianlian · 2025-10-08T01:59:25Z

slime/utils/tis.py

+    seq_mean = masked_mean(tis_weights, eos_mask, dim=-1)
+    metrics["tis_seq_mean"] = seq_mean.mean()
+    metrics["tis_seq_std"] = seq_mean.std()
+    metrics["tis_seq_max"] = seq_mean.max()
+    metrics["tis_seq_min"] = seq_mean.min()


I think it might have a problem when cp>1.

yitianlian · 2025-10-08T02:08:19Z

slime/backends/fsdp_utils/actor.py

-                tis_clip = torch.clamp(
-                    tis, min=getattr(self.args, "tis_clip_low", 0.1), max=getattr(self.args, "tis_clip", 2.0)
+                # Build eos mask from loss masks
+                eos_mask = torch.cat(loss_masks, dim=0).to(device=log_probs.device)


I think it might have a problem when cp>1. Because the loss mask is not split by cp_size, while logp is split by cp_size. You can reuse the implementation of sum_of_sample_mean in cp_utils.

Further modify tis

docs/en/get_started/usage.md

examples/train_infer_mismatch_helper/run-qwen3-4b-mis.sh

slime/backends/fsdp_utils/actor.py

examples/train_infer_mismatch_helper/mis.py

Move mis to examples

slime/utils/tis.py

slime/backends/megatron_utils/loss.py

…etry level (THUDM#429) Co-authored-by: Jiajun Li <guapisolo@gmail.com>

zhaochenyang20 added 8 commits October 5, 2025 23:35

fix quick start docs in zh/en

30e01df

Update run-qwen3-30B-A3B.sh

82695df

[Importance sampling] seperate importance sampling as a function

fc0ec27

Merge branch 'THUDM:main' into importance_sampling

2ee67b6

fix lint

35f414c

Merge branch 'THUDM:main' into main

901da8d

fix lint of main

19649f9

adding pre-commit as a CI flow

81ffa47

zhaochenyang20 closed this Oct 6, 2025

zhaochenyang20 added 6 commits October 6, 2025 04:52

only pre-commit with yml

243cc44

fix up pre commit

0b164f3

unigy local pre-commit with third party

0717171

rebase with main for lint

0d27716

fix lint with main

984c724

adding kl metircs

0695c73

zhaochenyang20 reopened this Oct 6, 2025

zhaochenyang20 force-pushed the importance_sampling branch from 5459cfc to 0695c73 Compare October 6, 2025 21:56

zhaochenyang20 added 6 commits October 6, 2025 22:21

fix type custing for metrics

cac9fa9

comments to compute_tis_weights

e7cf0c2

[lint] tis comment

2924975

refactor clip mode in sequence level

7963809

[test] geometric level

6f36eef

adding metrics to new tis

3cc4982

yitianlian reviewed Oct 8, 2025

View reviewed changes

zhaochenyang20 added 4 commits October 9, 2025 02:12

[log probs in 1D]

d60a595

stash with main

5cac6e0

slice tis with slice_log_prob_with_cp

71194c3

[todo] filter out catastrophic tokens

92e6e97

zhaochenyang20 changed the title ~~[WIP] Importance sampling~~ Refactoring training inference importance sampling with seqeunce/geometry level Oct 15, 2025

zhaochenyang20 and others added 12 commits October 15, 2025 06:18

logging a whole sequence

8a8c44c

rebase with main

4326b28

Update run-qwen3-30B-A3B.sh

52a401e

create test scripts

ac4e63a

revert change in qwen3 30B sh

4df0724

remove two tests sh

7b01369

add kl metrics

4455479

fix comment

3319f38

Merge branch 'main' into importance_sampling

ce541be

Merge pull request #3 from guapisolo/tis

1d35f45

Further modify tis

adding kl metrics

bf8df31

Merge branch 'main' into importance_sampling

e7a88e3

zhuzilin reviewed Oct 16, 2025

View reviewed changes

docs/en/get_started/usage.md Show resolved Hide resolved

examples/train_infer_mismatch_helper/run-qwen3-4b-mis.sh Show resolved Hide resolved

slime/backends/fsdp_utils/actor.py Outdated Show resolved Hide resolved

examples/train_infer_mismatch_helper/mis.py Show resolved Hide resolved

guapisolo and others added 5 commits October 17, 2025 04:02

Merge branch 'main' into cytis

8249244

revert changes to use_tis

fa4606d

move to examples and use yaml for custom args parsing

a66753f

Merge pull request #5 from guapisolo/tis

5809cd1

Move mis to examples

fix small bug

fffeab9

yitianlian reviewed Oct 17, 2025

View reviewed changes

slime/utils/tis.py Outdated Show resolved Hide resolved

remove tis file

9e4bf7c

fzyzcjy reviewed Oct 18, 2025

View reviewed changes

slime/backends/megatron_utils/loss.py Outdated Show resolved Hide resolved

give vanilla tis func

eb7711c

zhuzilin approved these changes Oct 20, 2025

View reviewed changes

zhuzilin merged commit 46e2cd4 into THUDM:main Oct 20, 2025
4 checks passed

nanjiangwill pushed a commit to nanjiangwill/slime that referenced this pull request Oct 22, 2025

Refactoring training inference importance sampling with seqeunce/geom…

6275b74

…etry level (THUDM#429) Co-authored-by: Jiajun Li <guapisolo@gmail.com>

llltttwww pushed a commit to llltttwww/slime that referenced this pull request Nov 30, 2025

Refactoring training inference importance sampling with seqeunce/geom…

ecd351c

…etry level (THUDM#429) Co-authored-by: Jiajun Li <guapisolo@gmail.com>

zijiexia mentioned this pull request Dec 18, 2025

[FSDP] Add Masked importance sampling #1122

Merged

zhaochenyang20 mentioned this pull request Feb 24, 2026

[Template] Code Review Style Guide zhaochenyang20/sglang-diffusion-routing#32

Open

Yangruipis pushed a commit to rednote-ai/slime that referenced this pull request Feb 28, 2026

Refactoring training inference importance sampling with seqeunce/geom…

70b7de2

…etry level (THUDM#429) Co-authored-by: Jiajun Li <guapisolo@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring training inference importance sampling with seqeunce/geometry level#429

Refactoring training inference importance sampling with seqeunce/geometry level#429
zhuzilin merged 70 commits intoTHUDM:mainfrom
zhaochenyang20:importance_sampling

zhaochenyang20 commented Oct 6, 2025 •

edited

Loading

Uh oh!

zhaochenyang20 commented Oct 6, 2025

Uh oh!

yitianlian Oct 8, 2025

Uh oh!

yitianlian Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

zhaochenyang20 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 commented Oct 6, 2025

Uh oh!

yitianlian Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

yitianlian Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zhaochenyang20 commented Oct 6, 2025 •

edited

Loading