Skip to content

fix: preserve parameter attrs during weight reload (FP8 block + MXFP4 MoE)#2

Open
fergusfinn wants to merge 32 commits intomainfrom
fix/reload-preserve-param-attrs
Open

fix: preserve parameter attrs during weight reload (FP8 block + MXFP4 MoE)#2
fergusfinn wants to merge 32 commits intomainfrom
fix/reload-preserve-param-attrs

Conversation

@fergusfinn
Copy link
Copy Markdown

Summary

Fixes two weight reload crashes that break sleep/wake (L2) and RL weight reloading for quantized MoE models:

  • FP8 block-wise MoE (e.g. stepfun-ai/Step-3.5-Flash-FP8): quant method must be one of ['tensor', 'channel', 'group', 'block']
  • MXFP4 MoE (e.g. openai/gpt-oss-20b): default_weight_loader() got an unexpected keyword argument 'weight_name'

Both have the same root cause: custom parameter attributes are dropped when process_weights_after_loading replaces parameters with new nn.Parameter objects.

Root cause

replace_parameter() in utils.py creates a new nn.Parameter and only copies weight_loader from the old parameter. Other attributes like quant_method (needed by FP8 block-wise) are silently dropped. Additionally, marlin_utils_fp4.py bypasses replace_parameter() entirely, using raw setattr() which copies nothing.

After the initial load completes, these attributes are gone. When reload_weights() restores the model from saved metadata and tries to re-run weight loading, the missing attributes cause crashes:

  1. FP8: FusedMoE.weight_loader reads getattr(param, "quant_method", None)None → falls through all branches → ValueError
  2. MXFP4: Params have no weight_loader → fall back to default_weight_loader(param, loaded_weight) which doesn't accept weight_name/shard_id/expert_id kwargs → TypeError

Changes

  1. utils.py: replace_parameter() now copies all __dict__ attributes from the old parameter, not just weight_loader
  2. marlin_utils_fp4.py: Replace raw nn.Parameter() + setattr() with replace_parameter() so MXFP4 Marlin gets the same attribute preservation

Upstream references

Test plan

  • L2 sleep/wake cycle with stepfun-ai/Step-3.5-Flash-FP8 TP=2
  • L2 sleep/wake cycle with openai/gpt-oss-20b MXFP4
  • Verify initial model loading is unaffected (no regression)
  • Run upstream test_reload.py (includes FP8_BLOCK and FP8_DYNAMIC reload tests)

🤖 Generated with Claude Code

robertgshaw2-redhat and others added 30 commits January 26, 2026 12:37
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
(cherry picked from commit 43a013c)
…llm-project#33093)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
(cherry picked from commit a2393ed)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit b539f98)
Signed-off-by: NickLucche <nlucches@redhat.com>
(cherry picked from commit 1f3a2c2)
…oject#33090)

Signed-off-by: NickLucche <nlucches@redhat.com>
(cherry picked from commit 492a798)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
(cherry picked from commit 0cd259b)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
(cherry picked from commit a97b5e2)
…roject#30207)" (vllm-project#33241)

Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
(cherry picked from commit 2e8de86)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
(cherry picked from commit f9d0359)
…M100 (vllm-project#33285)

Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit 1bd47d6)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
(cherry picked from commit abb34ac)
…3300)

Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit bfb9bda)
…oject#33129)

Signed-off-by: khluu <khluu000@gmail.com>
(cherry picked from commit 2284461)
…t#32831 (vllm-project#33366)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
(cherry picked from commit 31aedfe)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
(cherry picked from commit 0a3c71e)
…ng kv cache update to splitting ops (vllm-project#33441)

Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Richard Zou <zou3519@gmail.com>
(cherry picked from commit 15f40b2)
…oE kernels (vllm-project#33417)

Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit 0797811)
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
(cherry picked from commit d6416fd)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com>
Co-authored-by: xiewuxun <xiewuxun@stepfun.com>
Co-authored-by: zetaohong <i-hongzetao@stepfun.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
(cherry picked from commit c3b40dc)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
(cherry picked from commit 318b120)
…project#33524)

Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
(cherry picked from commit a01ef3f)
…ect#33602)

Signed-off-by: zhewenli <zhewen@inferact.ai>
Co-authored-by: zhewenli <zhewen@inferact.ai>
Signed-off-by: zhewenli <zhewen@inferact.ai>
Co-authored-by: zhewenli <zhewen@inferact.ai>
Signed-off-by: Zachary Aristei <zaristei@nvidia.com>
Co-authored-by: Zachary Aristei <zaristei@nvidia.com>
Signed-off-by: Zachary Aristei <zaristei@nvidia.com>
Co-authored-by: Zachary Aristei <zaristei@nvidia.com>
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
(cherry picked from commit 9e138cb)
…t#33184)

Signed-off-by: Richard Zou <zou3519@gmail.com>
(cherry picked from commit d9aa39a)
…e is speculative decoding (vllm-project#33624)

Signed-off-by: Richard Zou <zou3519@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
(cherry picked from commit 5eac9a1)
…LLM per-tensor FP8 MoE (vllm-project#33620)

Signed-off-by: mgoin <mgoin64@gmail.com>
(cherry picked from commit e346e2d)

Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
njhill and others added 2 commits February 3, 2026 20:28
replace_parameter() was only copying the weight_loader attribute from
old parameters to new ones, dropping other custom attributes like
quant_method. This causes two reload failures:

1. FP8 block-wise MoE models (e.g. Step-3.5-Flash-FP8): after
   process_weights_after_loading calls replace_parameter on scale
   params, the quant_method attribute is lost. On reload,
   FusedMoE.weight_loader reads quant_method=None and raises
   "quant method must be one of ['tensor', 'channel', 'group', 'block']"

2. MXFP4 MoE models (e.g. openai/gpt-oss-20b): marlin_utils_fp4.py
   uses raw setattr() to replace parameters, dropping weight_loader
   entirely. On reload, params fall back to default_weight_loader which
   doesn't accept MoE kwargs (weight_name, shard_id, expert_id),
   raising "default_weight_loader() got an unexpected keyword argument"

Fix: make replace_parameter() copy all __dict__ attributes from old
params, and use replace_parameter() in marlin_utils_fp4.py instead of
raw setattr().

Fixes vllm-project#23577

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6e725f5f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

weight_loader = old_param.weight_loader
set_weight_attrs(new_param, {"weight_loader": weight_loader})
if old_param is not None:
new_param.__dict__.update(old_param.__dict__)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve weight_loader property when replacing params

replace_parameter now copies old_param.__dict__, but many quantized weights are BasevLLMParameter subclasses where weight_loader is exposed via a class property backed by _weight_loader (vllm/model_executor/parameter.py), not an instance weight_loader field. After replacement the new object is a plain torch.nn.Parameter, so copying __dict__ keeps _weight_loader but drops accessible param.weight_loader; this breaks reload paths that expect param.weight_loader (for example step3p5.py calls it directly with shard/expert kwargs) and can regress to the same reload failures this change is trying to fix.

Useful? React with 👍 / 👎.

fergusfinn added a commit to doublewordai/llmux that referenced this pull request Feb 20, 2026
Fixes sleep/wake crash for MXFP4 MoE models (e.g. openai/gpt-oss-20b):
`default_weight_loader() got an unexpected keyword argument 'weight_name'`

The MXFP4 Marlin code in vLLM uses raw setattr() instead of
replace_parameter(), so custom parameter attributes (weight_loader etc.)
are dropped after process_weights_after_loading. On warm wake,
reload_weights falls back to default_weight_loader which doesn't
accept the kwargs.

Patch from doublewordai/vllm#2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: default_weight_loader receives unexpected weight_name kwarg in GPT-OSS load_weights