[NPU] fix some npu error with OffloaderV2 by Hide-on-bushsh · Pull Request #19541 · sgl-project/sglang

Hide-on-bushsh · 2026-02-28T02:50:06Z

Motivation

when set --offload-mode=meta or sharded_gpu in feature offloaderV2, it doesn't work with npu.Then I fix some error to support those features in npu.

Modifications

1.fix attr weight_loader of params missing in func _move_param_to_meta
2.Skip format cast for meta tensors
3.process tensor to be contiguous

For this configuration:
python -m sglang.launch_server
--model-path /home/weights/deepseekv3-lite-base-latest
--host 127.0.0.1
--port 8080
--attention-backend ascend
--mem-fraction-static 0.9
--base-gpu-id 14
--tp 1
--dp 2
--offload-num-in-group 1
--offload-prefetch-step 1
--offload-mode sharded_gpu
--offload-group-size 4
--disable-cuda-graph \

curl --location 'http://127.0.0.1:8080/generate' --header 'Content-Type:application/json' --data '{"text": "The captial of France is", "sampling_params": {"temperature": 0, "max_new_tokens": 20}}

Before:

After the first modification:

After the second modification:

After the third modification:
the output is nomal

the third modification sovled another OffloaderV1 Accuracy error
When forwarding, torch.Tensor.to(device) changes the layout of non-contiguous tensors, causing an accuracy error in the MoE model in offloaderV1 mode on TP2.
For this configuration

curl --location 'http://127.0.0.1:8080/generate' --header 'Content-Type:application/json' --data '{"text": "The captial of France is", "sampling_params": {"temperature": 0, "max_new_tokens": 20}}'
before

after

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-28T02:50:09Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ping1jing2 · 2026-03-18T20:00:40Z

/tag-and-rerun-ci

ping1jing2 · 2026-03-25T07:20:01Z

/rerun-failed-ci

sglang-npu-bot · 2026-03-30T07:47:52Z

/rerun-failed-ci

sglang-npu-bot · 2026-03-31T12:51:52Z

/rerun-failed-ci

sglang-npu-bot · 2026-04-01T03:25:57Z

/rerun-failed-ci

sglang-npu-bot · 2026-04-02T02:10:15Z

/rerun-failed-ci

sglang-npu-bot · 2026-04-15T06:38:21Z

/rerun-failed-ci

sglang-npu-bot · 2026-04-16T02:31:59Z

/rerun-failed-ci

sglang-npu-bot · 2026-04-20T06:21:08Z

/rerun-failed-ci

Refactor weight data handling for NPU compatibility.

Hide-on-bushsh · 2026-04-27T03:27:04Z

/rerun-failed-ci

Hide-on-bushsh · 2026-04-27T14:07:10Z

/rerun-failed-ci

Hide-on-bushsh · 2026-04-28T01:26:57Z

/rerun-failed-ci

Hide-on-bushsh · 2026-04-28T11:50:54Z

/rerun-failed-ci

Hide-on-bushsh · 2026-04-29T02:05:27Z

/rerun-failed-ci

Co-authored-by: Jianzhao Xu <xujianchao@huawei.com> Co-authored-by: sglang-npu-bot <sglangnpu@163.com>

Hide-on-bushsh requested review from iforgetmyname and ping1jing2 as code owners February 28, 2026 02:50

github-actions Bot added the npu label Feb 28, 2026

Hide-on-bushsh force-pushed the bug branch 2 times, most recently from 4b00598 to ad27b94 Compare March 5, 2026 09:25

Hide-on-bushsh changed the title ~~[NPU] fix som npu error with OffloaderV2~~ [NPU] fix some npu error with OffloaderV2 Mar 6, 2026

Hide-on-bushsh requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, HaiShaw, b8zhong and ch-wan as code owners March 16, 2026 08:22

github-actions Bot added the quant LLM Quantization label Mar 16, 2026

Hide-on-bushsh force-pushed the bug branch from caf66b0 to d5dfe9e Compare March 16, 2026 08:25

github-actions Bot added the run-ci label Mar 18, 2026

Hide-on-bushsh force-pushed the bug branch 3 times, most recently from a371579 to 03a2450 Compare March 23, 2026 07:02

YZY00Raiser mentioned this pull request Mar 24, 2026

[Bug][Func][SDV] Offloading function exist bug Ascend/sglang#64

Closed

5 tasks

Hide-on-bushsh force-pushed the bug branch from 414b827 to ff8b90b Compare March 30, 2026 11:43

Hide-on-bushsh force-pushed the bug branch 2 times, most recently from 8971d9c to d1cbd63 Compare April 3, 2026 03:36

Hide-on-bushsh force-pushed the bug branch from 7aab8e4 to d22a3c5 Compare April 7, 2026 09:16

Jianzhao Xu and others added 2 commits April 23, 2026 15:21

[NPU] fix some npu error with OffloaderV2

ffa71ee

Improve NPU memory usage

983d083

Refactor weight data handling for NPU compatibility.

Hide-on-bushsh force-pushed the bug branch from 3629962 to 983d083 Compare April 23, 2026 07:21

Merge branch 'main' into bug

482f15b

ping1jing2 self-assigned this Apr 30, 2026

ping1jing2 approved these changes Apr 30, 2026

View reviewed changes

sglang-npu-bot merged commit aa74911 into sgl-project:main Apr 30, 2026
562 of 627 checks passed

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

[NPU] fix some npu error with OffloaderV2 (sgl-project#19541)

8f8622f

Co-authored-by: Jianzhao Xu <xujianchao@huawei.com> Co-authored-by: sglang-npu-bot <sglangnpu@163.com>

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

[NPU] fix some npu error with OffloaderV2 (sgl-project#19541)

22dbf4a

Co-authored-by: Jianzhao Xu <xujianchao@huawei.com> Co-authored-by: sglang-npu-bot <sglangnpu@163.com>

Conversation

Hide-on-bushsh commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Feb 28, 2026

Uh oh!

ping1jing2 commented Mar 18, 2026

Uh oh!

ping1jing2 commented Mar 25, 2026

Uh oh!

sglang-npu-bot commented Mar 30, 2026

Uh oh!

sglang-npu-bot commented Mar 31, 2026

Uh oh!

sglang-npu-bot commented Apr 1, 2026

Uh oh!

sglang-npu-bot commented Apr 2, 2026

Uh oh!

sglang-npu-bot commented Apr 15, 2026

Uh oh!

sglang-npu-bot commented Apr 16, 2026

Uh oh!

sglang-npu-bot commented Apr 20, 2026

Uh oh!

Hide-on-bushsh commented Apr 27, 2026

Uh oh!

Hide-on-bushsh commented Apr 27, 2026

Uh oh!

Hide-on-bushsh commented Apr 28, 2026

Uh oh!

Hide-on-bushsh commented Apr 28, 2026

Uh oh!

Hide-on-bushsh commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hide-on-bushsh commented Feb 28, 2026 •

edited

Loading