Fix nvfp4 weight update#18085
Conversation
[2026-02-02 00:13:21] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 1083, in update_weights_from_disk
model = model_load_weights(self.model, iter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 1073, in model_load_weights
loader.load_weights_and_postprocess(model, iter, target_device)
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 692, in load_weights_and_postprocess
quant_method.process_weights_after_loading(module)
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/modelopt_quant.py", line 1510, in process_weights_after_loading
("w13", layer.w13_weight_scale),
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
raise AttributeError(
AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_2'?
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3063, in run_scheduler_process
scheduler.event_loop_overlap()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1103, in event_loop_overlap
self.process_input_requests(recv_reqs)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1322, in process_input_requests
output = self._request_dispatcher(recv_req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/utils.py", line 507, in __call__
return fn(obj)
^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_update_weights_mixin.py", line 50, in update_weights_from_disk
success, message = self.tp_worker.update_weights_from_disk(recv_req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 93, in update_weights_from_disk
success, message = self.model_runner.update_weights_from_disk(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 1091, in update_weights_from_disk
self.model = model_load_weights(self.model, iter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 1073, in model_load_weights
loader.load_weights_and_postprocess(model, iter, target_device)
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 692, in load_weights_and_postprocess
quant_method.process_weights_after_loading(module)
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/modelopt_quant.py", line 1510, in process_weights_after_loading
("w13", layer.w13_weight_scale),
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1964, in __getattr__
raise AttributeError(
AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_2'?
[2026-02-02 00:13:21] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
Summary of ChangesHello @zianglih, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical issue preventing the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the issue with updating nvfp4 weights. The introduction of the _copy_or_rebind_param helper function is a solid approach to manage parameter updates in a way that is friendly to both CUDA graphs and hot reloading. Its consistent application across the codebase for replacing direct Parameter assignments is well-executed. A crucial part of the fix is the removal of del statements that previously discarded original weights and scales, which is essential for enabling weight updates. The changes also improve robustness by handling different checkpoint formats and reusing existing parameter objects to enhance stability. The code quality is high, and the solution is well-aligned with the stated goal of the PR.
|
After recent refactoring the weight sync remains broken in main. After the merge the fix still works expected: BEFORE AFTER |
|
/tag-and-rerun-ci again |
|
Explicitly testing
This PR fixes both backends. |
|
Thanks for data. Btw, |
|
@guapisolo Jiajun will help to review this PR. |
|
/rerun-failed-ci |
I've already review this PR but not sure whether it's good impl, so I call brayden for help. |
|
@zianglih Please check the CI failures, thanks |
Head branch was pushed to by a user without write access
|
@Fridge003 previous OOM fp4 ci is fixed by 1d00042 (#18085 (comment)). Thanks! |
Motivation
@HumansAnd
The existing nvfp4
/update_weights_from_diskendpoint does not work. This PR fixes it.This feature is a pre-requisite of nvfp4 RL in miles: radixark/miles#546
Modifications
Introduce a
copy_or_rebind_paramfor in-place weight update to keep CUDA graph stable.Accuracy Tests
Testing on nvidia/Qwen3-30B-A3B-NVFP4
Testing on PTQ nvfp4 checkpoint from radixark/miles#536
Note, the complete accuracy depends on #18012 .
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci