fix: preserve parameter attrs during weight reload (FP8 block + MXFP4 MoE) by fergusfinn · Pull Request #2 · doublewordai/vllm

fergusfinn · 2026-02-11T15:48:06Z

Summary

Fixes two weight reload crashes that break sleep/wake (L2) and RL weight reloading for quantized MoE models:

FP8 block-wise MoE (e.g. stepfun-ai/Step-3.5-Flash-FP8): quant method must be one of ['tensor', 'channel', 'group', 'block']
MXFP4 MoE (e.g. openai/gpt-oss-20b): default_weight_loader() got an unexpected keyword argument 'weight_name'

Both have the same root cause: custom parameter attributes are dropped when process_weights_after_loading replaces parameters with new nn.Parameter objects.

Root cause

replace_parameter() in utils.py creates a new nn.Parameter and only copies weight_loader from the old parameter. Other attributes like quant_method (needed by FP8 block-wise) are silently dropped. Additionally, marlin_utils_fp4.py bypasses replace_parameter() entirely, using raw setattr() which copies nothing.

After the initial load completes, these attributes are gone. When reload_weights() restores the model from saved metadata and tries to re-run weight loading, the missing attributes cause crashes:

FP8: FusedMoE.weight_loader reads getattr(param, "quant_method", None) → None → falls through all branches → ValueError
MXFP4: Params have no weight_loader → fall back to default_weight_loader(param, loaded_weight) which doesn't accept weight_name/shard_id/expert_id kwargs → TypeError

Changes

utils.py: replace_parameter() now copies all __dict__ attributes from the old parameter, not just weight_loader
marlin_utils_fp4.py: Replace raw nn.Parameter() + setattr() with replace_parameter() so MXFP4 Marlin gets the same attribute preservation

Upstream references

Fixes vllm-project/vllm#23577
replace_parameter was introduced in vllm-project/vllm#28480
Layerwise reload system in vllm-project/vllm#32133
Both bugs are still live on vllm main as of 2026-02-11

Test plan

L2 sleep/wake cycle with stepfun-ai/Step-3.5-Flash-FP8 TP=2
L2 sleep/wake cycle with openai/gpt-oss-20b MXFP4
Verify initial model loading is unaffected (no regression)
Run upstream test_reload.py (includes FP8_BLOCK and FP8_DYNAMIC reload tests)

🤖 Generated with Claude Code

Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> (cherry picked from commit 43a013c)

…llm-project#33093) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> (cherry picked from commit a2393ed)

Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit b539f98)

Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 1f3a2c2)

…oject#33090) Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 492a798)

Signed-off-by: Nick Hill <nickhill123@gmail.com> (cherry picked from commit 0cd259b)

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> (cherry picked from commit a97b5e2)

…roject#30207)" (vllm-project#33241) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com> (cherry picked from commit 2e8de86)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> (cherry picked from commit f9d0359)

…M100 (vllm-project#33285) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit 1bd47d6)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit abb34ac)

…3300) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit bfb9bda)

…oject#33129) Signed-off-by: khluu <khluu000@gmail.com> (cherry picked from commit 2284461)

…t#32831 (vllm-project#33366) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> (cherry picked from commit 31aedfe)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> (cherry picked from commit 0a3c71e)

…ng kv cache update to splitting ops (vllm-project#33441) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Richard Zou <zou3519@gmail.com> (cherry picked from commit 15f40b2)

…oE kernels (vllm-project#33417) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> (cherry picked from commit 0797811)

Signed-off-by: greg pereira <grpereir@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> (cherry picked from commit d6416fd)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com> Co-authored-by: xiewuxun <xiewuxun@stepfun.com> Co-authored-by: zetaohong <i-hongzetao@stepfun.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> (cherry picked from commit c3b40dc)

Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> (cherry picked from commit 318b120)

…project#33524) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> (cherry picked from commit a01ef3f)

…ect#33602) Signed-off-by: zhewenli <zhewen@inferact.ai> Co-authored-by: zhewenli <zhewen@inferact.ai>

Signed-off-by: zhewenli <zhewen@inferact.ai> Co-authored-by: zhewenli <zhewen@inferact.ai>

Signed-off-by: Zachary Aristei <zaristei@nvidia.com> Co-authored-by: Zachary Aristei <zaristei@nvidia.com>

Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com> (cherry picked from commit 9e138cb)

…t#33184) Signed-off-by: Richard Zou <zou3519@gmail.com> (cherry picked from commit d9aa39a)

…e is speculative decoding (vllm-project#33624) Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> (cherry picked from commit 5eac9a1)

…LLM per-tensor FP8 MoE (vllm-project#33620) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

…lm-project#33729) Signed-off-by: Nick Hill <nickhill123@gmail.com>

replace_parameter() was only copying the weight_loader attribute from old parameters to new ones, dropping other custom attributes like quant_method. This causes two reload failures: 1. FP8 block-wise MoE models (e.g. Step-3.5-Flash-FP8): after process_weights_after_loading calls replace_parameter on scale params, the quant_method attribute is lost. On reload, FusedMoE.weight_loader reads quant_method=None and raises "quant method must be one of ['tensor', 'channel', 'group', 'block']" 2. MXFP4 MoE models (e.g. openai/gpt-oss-20b): marlin_utils_fp4.py uses raw setattr() to replace parameters, dropping weight_loader entirely. On reload, params fall back to default_weight_loader which doesn't accept MoE kwargs (weight_name, shard_id, expert_id), raising "default_weight_loader() got an unexpected keyword argument" Fix: make replace_parameter() copy all __dict__ attributes from old params, and use replace_parameter() in marlin_utils_fp4.py instead of raw setattr(). Fixes vllm-project#23577 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6e725f5f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-11T15:52:47Z

vllm/model_executor/utils.py

-        weight_loader = old_param.weight_loader
-        set_weight_attrs(new_param, {"weight_loader": weight_loader})
+    if old_param is not None:
+        new_param.__dict__.update(old_param.__dict__)


Preserve weight_loader property when replacing params

replace_parameter now copies old_param.__dict__, but many quantized weights are BasevLLMParameter subclasses where weight_loader is exposed via a class property backed by _weight_loader (vllm/model_executor/parameter.py), not an instance weight_loader field. After replacement the new object is a plain torch.nn.Parameter, so copying __dict__ keeps _weight_loader but drops accessible param.weight_loader; this breaks reload paths that expect param.weight_loader (for example step3p5.py calls it directly with shard/expert kwargs) and can regress to the same reload failures this change is trying to fix.

Useful? React with 👍 / 👎.

Fixes sleep/wake crash for MXFP4 MoE models (e.g. openai/gpt-oss-20b): `default_weight_loader() got an unexpected keyword argument 'weight_name'` The MXFP4 Marlin code in vLLM uses raw setattr() instead of replace_parameter(), so custom parameter attributes (weight_loader etc.) are dropped after process_weights_after_loading. On warm wake, reload_weights falls back to default_weight_loader which doesn't accept the kwargs. Patch from doublewordai/vllm#2.

robertgshaw2-redhat and others added 30 commits January 26, 2026 12:37

[Bugfix] Fix Dtypes for Pynccl Wrapper (vllm-project#33030)

cf1167e

Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> (cherry picked from commit 43a013c)

[CI] Fix AssertionError: MCP tool call not found in output_messages (v…

afb390a

…llm-project#33093) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> (cherry picked from commit a2393ed)

[Bugfix] Disable CG for Whisper+FA2 (vllm-project#33164)

d51e1f8

Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 1f3a2c2)

[Bugfix] Fix DeepseekV32 AssertionError: num_kv_heads == 1 (vllm-pr…

0d8ce32

…oject#33090) Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 492a798)

[BugFix] Fix P/D with non-MoE DP (vllm-project#33037)

7779de3

Signed-off-by: Nick Hill <nickhill123@gmail.com> (cherry picked from commit 0cd259b)

Relax protobuf library version constraints (vllm-project#33202)

5f7f9ea

Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> (cherry picked from commit a97b5e2)

Revert "Enable Cross layers KV cache layout at NIXL Connector (vllm-p…

fe18ce4

…roject#30207)" (vllm-project#33241) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com> (cherry picked from commit 2e8de86)

[Release] [CI] Optim release pipeline (vllm-project#33156)

f176443

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> (cherry picked from commit f9d0359)

[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+S…

39e8b49

…M100 (vllm-project#33285) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit 1bd47d6)

[Bugfix] Enable Triton MoE for FP8 per-tensor dynamic (vllm-project#3…

6ff16b7

…3300) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit bfb9bda)

[release] Minor fixes to release annotation and wheel upload (vllm-pr…

a2dba55

…oject#33129) Signed-off-by: khluu <khluu000@gmail.com> (cherry picked from commit 2284461)

[Bugfix][ROCm] Fixing the skinny gemm dispatch logic from vllm-projec…

5f45b0b

…t#32831 (vllm-project#33366) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> (cherry picked from commit 31aedfe)

[fix][torch.compile] Fix cold-start compilation time increase by addi…

2915268

…ng kv cache update to splitting ops (vllm-project#33441) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Richard Zou <zou3519@gmail.com> (cherry picked from commit 15f40b2)

fix: Add SM120 (RTX Blackwell) support for FlashInfer CUTLASS NVFP4 M…

15ebd0c

…oE kernels (vllm-project#33417) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> (cherry picked from commit 0797811)

pin LMCache to v0.3.9 or greater with vLLM v0.15.0 (vllm-project#33440)

c7039a8

Signed-off-by: greg pereira <grpereir@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> (cherry picked from commit d6416fd)

[Nightly CI] Remove CT Model (vllm-project#33530)

94cbe0a

Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> (cherry picked from commit 318b120)

[Fix] prefix cache hit rate == 0 bug with gpt-oss style models (vllm-…

f0d0058

…project#33524) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> (cherry picked from commit a01ef3f)

[Release] patch step3p5 attention class in v0.15.1 release (vllm-proj…

57eae2f

…ect#33602) Signed-off-by: zhewenli <zhewen@inferact.ai> Co-authored-by: zhewenli <zhewen@inferact.ai>

[Release] Fix format and cherry-pick (vllm-project#33618)

31a64c6

Signed-off-by: zhewenli <zhewen@inferact.ai> Co-authored-by: zhewenli <zhewen@inferact.ai>

Patch aiohttp for CVE-2025-69223 (vllm-project#33621)

099a787

Signed-off-by: Zachary Aristei <zaristei@nvidia.com> Co-authored-by: Zachary Aristei <zaristei@nvidia.com>

Patch Protobuf for CVE 2026-0994 (vllm-project#33619)

7c023ba

Signed-off-by: Zachary Aristei <zaristei@nvidia.com> Co-authored-by: Zachary Aristei <zaristei@nvidia.com>

[Misc][Build] Lazy load cv2 in nemotron_parse.py (vllm-project#33189)

eec3546

Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com> (cherry picked from commit 9e138cb)

[torch.compile] Speed up MOE handling in forward_context (vllm-projec…

611b187

…t#33184) Signed-off-by: Richard Zou <zou3519@gmail.com> (cherry picked from commit d9aa39a)

[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRT…

daa2784

…LLM per-tensor FP8 MoE (vllm-project#33620) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

cherry pick

7d98f09

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>

njhill and others added 2 commits February 3, 2026 20:28

[BugFix][Spec Decoding] Fix negative accepted tokens metric crash (vl…

1892993

…lm-project#33729) Signed-off-by: Nick Hill <nickhill123@gmail.com>

chatgpt-codex-connector bot reviewed Feb 11, 2026

View reviewed changes

fergusfinn force-pushed the main branch from 1892993 to de7dd63 Compare March 2, 2026 11:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve parameter attrs during weight reload (FP8 block + MXFP4 MoE)#2

fix: preserve parameter attrs during weight reload (FP8 block + MXFP4 MoE)#2
fergusfinn wants to merge 32 commits intomainfrom
fix/reload-preserve-param-attrs

fergusfinn commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

fergusfinn commented Feb 11, 2026

Summary

Root cause

Changes

Upstream references

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants