Fix missing restore_weights_before_loading in CompressedTensorsFusedMoEMethod by yueming-yuan · Pull Request #21795 · sgl-project/sglang

yueming-yuan · 2026-03-31T23:55:34Z

Summary

The quantization refactor in [2/N] Quantization Refactor: Compressed tensors MoE schemes #17503 introduced CompressedTensorsFusedMoEMethod as a unified quant_method for all compressed-tensors MoE schemes, delegating to layer.scheme. However, restore_weights_before_loading was not forwarded.
This causes INT4 weight update to fail in RL training (miles): post_process_weights with restore_weights_before_load=True skips FusedMoE modules because hasattr(quant_method, "restore_weights_before_loading") returns False. The INT4 packed weights are never restored to full size before loading new weights.
Error: RuntimeError: start (0) + length (1536) exceeds dimension size (768) in FusedMoE._load_w13

Fix

Add restore_weights_before_loading to CompressedTensorsFusedMoEMethod that delegates to layer.scheme, with a hasattr guard since only CompressedTensorsWNA16MoE implements this method.

Test plan

Run INT4 rollout CI (e2e/megatron/test_qwen3_30B_A3B.py with MILES_TEST_USE_INT4_ROLLOUT=1)

…oEMethod The quantization refactor in #17503 introduced CompressedTensorsFusedMoEMethod as a unified quant_method for all compressed-tensors MoE schemes, delegating to layer.scheme. However, restore_weights_before_loading was not forwarded. This causes INT4 weight update to fail in RL training: post_process_weights with restore_weights_before_load=True skips FusedMoE modules because hasattr(quant_method, "restore_weights_before_loading") returns False. The INT4 packed weights (768) are never restored to full size, so load_weights tries to narrow full-size HF weights (1536) into packed parameters and crashes with: RuntimeError: start (0) + length (1536) exceeds dimension size (768) Add the missing delegation with a hasattr guard since only CompressedTensorsWNA16MoE implements restore_weights_before_loading.

gemini-code-assist · 2026-03-31T23:55:38Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

yueming-yuan requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg and ch-wan as code owners March 31, 2026 23:55

yueming-yuan merged commit af90309 into sglang-miles Mar 31, 2026
1 check passed

yueming-yuan deleted the fix/compressed-tensors-moe-restore-weights branch March 31, 2026 23:56

yueming-yuan mentioned this pull request Apr 1, 2026

Fix missing restore_weights_before_loading in CompressedTensorsFusedMoEMethod #21806

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing restore_weights_before_loading in CompressedTensorsFusedMoEMethod#21795

Fix missing restore_weights_before_loading in CompressedTensorsFusedMoEMethod#21795
yueming-yuan merged 1 commit into
sglang-milesfrom
fix/compressed-tensors-moe-restore-weights

yueming-yuan commented Mar 31, 2026

Uh oh!

gemini-code-assist Bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yueming-yuan commented Mar 31, 2026

Summary

Fix

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant