Skip to content

[NPU] fix some npu error with OffloaderV2#19541

Merged
sglang-npu-bot merged 3 commits into
sgl-project:mainfrom
Hide-on-bushsh:bug
Apr 30, 2026
Merged

[NPU] fix some npu error with OffloaderV2#19541
sglang-npu-bot merged 3 commits into
sgl-project:mainfrom
Hide-on-bushsh:bug

Conversation

@Hide-on-bushsh
Copy link
Copy Markdown
Contributor

@Hide-on-bushsh Hide-on-bushsh commented Feb 28, 2026

Motivation

when set --offload-mode=meta or sharded_gpu in feature offloaderV2, it doesn't work with npu.Then I fix some error to support those features in npu.

Modifications

1.fix attr weight_loader of params missing in func _move_param_to_meta
2.Skip format cast for meta tensors
3.process tensor to be contiguous

For this configuration:
python -m sglang.launch_server
--model-path /home/weights/deepseekv3-lite-base-latest
--host 127.0.0.1
--port 8080
--attention-backend ascend
--mem-fraction-static 0.9
--base-gpu-id 14
--tp 1
--dp 2
--offload-num-in-group 1
--offload-prefetch-step 1
--offload-mode sharded_gpu
--offload-group-size 4
--disable-cuda-graph \

curl --location 'http://127.0.0.1:8080/generate' --header 'Content-Type:application/json' --data '{"text": "The captial of France is", "sampling_params": {"temperature": 0, "max_new_tokens": 20}}

Before:
image
After the first modification:
image
After the second modification:
image
After the third modification:
the output is nomal
image

the third modification sovled another OffloaderV1 Accuracy error
When forwarding, torch.Tensor.to(device) changes the layout of non-contiguous tensors, causing an accuracy error in the MoE model in offloaderV1 mode on TP2.
For this configuration
image
curl --location 'http://127.0.0.1:8080/generate' --header 'Content-Type:application/json' --data '{"text": "The captial of France is", "sampling_params": {"temperature": 0, "max_new_tokens": 20}}'
before
image
after
image

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the npu label Feb 28, 2026
@Hide-on-bushsh Hide-on-bushsh force-pushed the bug branch 2 times, most recently from 4b00598 to ad27b94 Compare March 5, 2026 09:25
@Hide-on-bushsh Hide-on-bushsh changed the title [NPU] fix som npu error with OffloaderV2 [NPU] fix some npu error with OffloaderV2 Mar 6, 2026
@github-actions github-actions Bot added the quant LLM Quantization label Mar 16, 2026
@ping1jing2
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@ping1jing2
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

1 similar comment
@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

2 similar comments
@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@Hide-on-bushsh Hide-on-bushsh force-pushed the bug branch 2 times, most recently from 8971d9c to d1cbd63 Compare April 3, 2026 03:36
@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

2 similar comments
@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@sglang-npu-bot
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

Jianzhao Xu and others added 2 commits April 23, 2026 15:21
Refactor weight data handling for NPU compatibility.
@Hide-on-bushsh
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

4 similar comments
@Hide-on-bushsh
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@Hide-on-bushsh
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@Hide-on-bushsh
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@Hide-on-bushsh
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@ping1jing2 ping1jing2 self-assigned this Apr 30, 2026
@sglang-npu-bot sglang-npu-bot merged commit aa74911 into sgl-project:main Apr 30, 2026
562 of 627 checks passed
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026
Co-authored-by: Jianzhao Xu <xujianchao@huawei.com>
Co-authored-by: sglang-npu-bot <sglangnpu@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

npu quant LLM Quantization run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants