[NPU] fix some npu error with OffloaderV2#19541
Merged
Merged
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
4b00598 to
ad27b94
Compare
Collaborator
|
/tag-and-rerun-ci |
a371579 to
03a2450
Compare
5 tasks
Collaborator
|
/rerun-failed-ci |
1 similar comment
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
2 similar comments
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
8971d9c to
d1cbd63
Compare
Collaborator
|
/rerun-failed-ci |
2 similar comments
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
Refactor weight data handling for NPU compatibility.
Contributor
Author
|
/rerun-failed-ci |
4 similar comments
Contributor
Author
|
/rerun-failed-ci |
Contributor
Author
|
/rerun-failed-ci |
Contributor
Author
|
/rerun-failed-ci |
Contributor
Author
|
/rerun-failed-ci |
ping1jing2
approved these changes
Apr 30, 2026
vguduruTT
pushed a commit
to vguduruTT/sglang
that referenced
this pull request
May 2, 2026
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com> Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
LucQueen
pushed a commit
to LucQueen/sglang
that referenced
this pull request
May 12, 2026
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com> Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
when set --offload-mode=meta or sharded_gpu in feature offloaderV2, it doesn't work with npu.Then I fix some error to support those features in npu.
Modifications
1.fix attr weight_loader of params missing in func _move_param_to_meta
2.Skip format cast for meta tensors
3.process tensor to be contiguous
For this configuration:
python -m sglang.launch_server
--model-path /home/weights/deepseekv3-lite-base-latest
--host 127.0.0.1
--port 8080
--attention-backend ascend
--mem-fraction-static 0.9
--base-gpu-id 14
--tp 1
--dp 2
--offload-num-in-group 1
--offload-prefetch-step 1
--offload-mode sharded_gpu
--offload-group-size 4
--disable-cuda-graph \
curl --location 'http://127.0.0.1:8080/generate' --header 'Content-Type:application/json' --data '{"text": "The captial of France is", "sampling_params": {"temperature": 0, "max_new_tokens": 20}}
Before:




After the first modification:
After the second modification:
After the third modification:
the output is nomal
the third modification sovled another OffloaderV1 Accuracy error



When forwarding, torch.Tensor.to(device) changes the layout of non-contiguous tensors, causing an accuracy error in the MoE model in offloaderV1 mode on TP2.
For this configuration
curl --location 'http://127.0.0.1:8080/generate' --header 'Content-Type:application/json' --data '{"text": "The captial of France is", "sampling_params": {"temperature": 0, "max_new_tokens": 20}}'
before
after
Accuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci