fix: preserve parameter attrs during weight reload (FP8 block + MXFP4 MoE)#2
fix: preserve parameter attrs during weight reload (FP8 block + MXFP4 MoE)#2fergusfinn wants to merge 32 commits intomainfrom
Conversation
Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> (cherry picked from commit 43a013c)
…llm-project#33093) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> (cherry picked from commit a2393ed)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit b539f98)
Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 1f3a2c2)
…oject#33090) Signed-off-by: NickLucche <nlucches@redhat.com> (cherry picked from commit 492a798)
Signed-off-by: Nick Hill <nickhill123@gmail.com> (cherry picked from commit 0cd259b)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> (cherry picked from commit a97b5e2)
…roject#30207)" (vllm-project#33241) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com> (cherry picked from commit 2e8de86)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> (cherry picked from commit f9d0359)
…M100 (vllm-project#33285) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit 1bd47d6)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> (cherry picked from commit abb34ac)
…oject#33129) Signed-off-by: khluu <khluu000@gmail.com> (cherry picked from commit 2284461)
…t#32831 (vllm-project#33366) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> (cherry picked from commit 31aedfe)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> (cherry picked from commit 0a3c71e)
…ng kv cache update to splitting ops (vllm-project#33441) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Richard Zou <zou3519@gmail.com> (cherry picked from commit 15f40b2)
…oE kernels (vllm-project#33417) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> (cherry picked from commit 0797811)
Signed-off-by: greg pereira <grpereir@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> (cherry picked from commit d6416fd)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: i-zhangmingming <i-zhangmingming@stepfun.com> Co-authored-by: xiewuxun <xiewuxun@stepfun.com> Co-authored-by: zetaohong <i-hongzetao@stepfun.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> (cherry picked from commit c3b40dc)
Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> (cherry picked from commit 318b120)
…project#33524) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu> (cherry picked from commit a01ef3f)
…ect#33602) Signed-off-by: zhewenli <zhewen@inferact.ai> Co-authored-by: zhewenli <zhewen@inferact.ai>
Signed-off-by: zhewenli <zhewen@inferact.ai> Co-authored-by: zhewenli <zhewen@inferact.ai>
Signed-off-by: Zachary Aristei <zaristei@nvidia.com> Co-authored-by: Zachary Aristei <zaristei@nvidia.com>
Signed-off-by: Zachary Aristei <zaristei@nvidia.com> Co-authored-by: Zachary Aristei <zaristei@nvidia.com>
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com> (cherry picked from commit 9e138cb)
…e is speculative decoding (vllm-project#33624) Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> (cherry picked from commit 5eac9a1)
…LLM per-tensor FP8 MoE (vllm-project#33620) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit e346e2d) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
…lm-project#33729) Signed-off-by: Nick Hill <nickhill123@gmail.com>
replace_parameter() was only copying the weight_loader attribute from old parameters to new ones, dropping other custom attributes like quant_method. This causes two reload failures: 1. FP8 block-wise MoE models (e.g. Step-3.5-Flash-FP8): after process_weights_after_loading calls replace_parameter on scale params, the quant_method attribute is lost. On reload, FusedMoE.weight_loader reads quant_method=None and raises "quant method must be one of ['tensor', 'channel', 'group', 'block']" 2. MXFP4 MoE models (e.g. openai/gpt-oss-20b): marlin_utils_fp4.py uses raw setattr() to replace parameters, dropping weight_loader entirely. On reload, params fall back to default_weight_loader which doesn't accept MoE kwargs (weight_name, shard_id, expert_id), raising "default_weight_loader() got an unexpected keyword argument" Fix: make replace_parameter() copy all __dict__ attributes from old params, and use replace_parameter() in marlin_utils_fp4.py instead of raw setattr(). Fixes vllm-project#23577 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6e725f5f4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| weight_loader = old_param.weight_loader | ||
| set_weight_attrs(new_param, {"weight_loader": weight_loader}) | ||
| if old_param is not None: | ||
| new_param.__dict__.update(old_param.__dict__) |
There was a problem hiding this comment.
Preserve
weight_loader property when replacing params
replace_parameter now copies old_param.__dict__, but many quantized weights are BasevLLMParameter subclasses where weight_loader is exposed via a class property backed by _weight_loader (vllm/model_executor/parameter.py), not an instance weight_loader field. After replacement the new object is a plain torch.nn.Parameter, so copying __dict__ keeps _weight_loader but drops accessible param.weight_loader; this breaks reload paths that expect param.weight_loader (for example step3p5.py calls it directly with shard/expert kwargs) and can regress to the same reload failures this change is trying to fix.
Useful? React with 👍 / 👎.
Fixes sleep/wake crash for MXFP4 MoE models (e.g. openai/gpt-oss-20b): `default_weight_loader() got an unexpected keyword argument 'weight_name'` The MXFP4 Marlin code in vLLM uses raw setattr() instead of replace_parameter(), so custom parameter attributes (weight_loader etc.) are dropped after process_weights_after_loading. On warm wake, reload_weights falls back to default_weight_loader which doesn't accept the kwargs. Patch from doublewordai/vllm#2.
Summary
Fixes two weight reload crashes that break sleep/wake (L2) and RL weight reloading for quantized MoE models:
stepfun-ai/Step-3.5-Flash-FP8):quant method must be one of ['tensor', 'channel', 'group', 'block']openai/gpt-oss-20b):default_weight_loader() got an unexpected keyword argument 'weight_name'Both have the same root cause: custom parameter attributes are dropped when
process_weights_after_loadingreplaces parameters with newnn.Parameterobjects.Root cause
replace_parameter()inutils.pycreates a newnn.Parameterand only copiesweight_loaderfrom the old parameter. Other attributes likequant_method(needed by FP8 block-wise) are silently dropped. Additionally,marlin_utils_fp4.pybypassesreplace_parameter()entirely, using rawsetattr()which copies nothing.After the initial load completes, these attributes are gone. When
reload_weights()restores the model from saved metadata and tries to re-run weight loading, the missing attributes cause crashes:FusedMoE.weight_loaderreadsgetattr(param, "quant_method", None)→None→ falls through all branches →ValueErrorweight_loader→ fall back todefault_weight_loader(param, loaded_weight)which doesn't acceptweight_name/shard_id/expert_idkwargs →TypeErrorChanges
utils.py:replace_parameter()now copies all__dict__attributes from the old parameter, not justweight_loadermarlin_utils_fp4.py: Replace rawnn.Parameter()+setattr()withreplace_parameter()so MXFP4 Marlin gets the same attribute preservationUpstream references
replace_parameterwas introduced in vllm-project/vllm#28480Test plan
stepfun-ai/Step-3.5-Flash-FP8TP=2openai/gpt-oss-20bMXFP4test_reload.py(includes FP8_BLOCK and FP8_DYNAMIC reload tests)🤖 Generated with Claude Code