Fix ServerArgs when using DP attention #4599

knukiban · 2025-03-20T01:16:47Z

Motivation

ServerArgs.__post_init__ is not idempotent and is called twice when running bench_offline_throughput, causing chunked_prefill_size to be halved multiple times.
Additionally, https://github.com/sgl-project/sglang/pull/3964/files#diff-700b5118b493d60d7b5994857f5f1e6a7e842ad702392b8ab199945764dfc8edR1144 caused a port collision. (fixed in #4648 though less robustly)

> python -m sglang.bench_offline_throughput --model-path=deepseek-ai/DeepSeek-V2-Lite  --trust-remote-code --enable-dp-attention --dp-size=2 --tp-size=2
INFO 03-20 00:47:17 __init__.py:190] Automatically detected platform cuda.
The following error message 'operation scheduled before its operands' can be ignored.
DP attention is enabled. The chunked prefill size is adjusted to 4096 to avoid MoE kernel issues.
DP attention is enabled. The chunked prefill size is adjusted to 2048 to avoid MoE kernel issues.
server_args=ServerArgs(..., schedule_conservativeness=0.09, ...)

... a bunch of lines ...

zmq.error.ZMQError: Address already in use (addr='tcp://127.0.0.1:30236')

Modifications

Use the user-specified --chunked-prefill-size or --schedule-conservativeness value if explicitly set. Defaults when running sglang.launch_server are unchanged from main.
Make ServerArgs.__post_init__() idempotent, so that running sglang.bench_offline_throughput uses the same values as sglang.launch_server
Fix port incrementing for generating PortArgs when using --enable_dp_attention

Result: This completes and uses default chunked prefill size 4096 on 2 GPUs.

> python -m sglang.bench_offline_throughput --model-path=deepseek-ai/DeepSeek-V2-Lite  --trust-remote-code --enable-dp-attention --dp-size=2 --tp-size=2
INFO 03-20 00:56:45 __init__.py:190] Automatically detected platform cuda.
The following error message 'operation scheduled before its operands' can be ignored.
DP attention is enabled. The chunked prefill size is adjusted to 4096 to avoid MoE kernel issues.
server_args=ServerArgs(..., schedule_conservativeness=0.3, ...)

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

github-actions · 2025-05-30T08:32:26Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen it if needed.

knukiban added 6 commits March 19, 2025 23:16

make ServerArgs __post_init__ idempotent

0563e18

Merge branch 'main' into idempotent-args-postinit

c5ce889

fix port

890fa87

fix port

02d7d10

Merge branch 'fix-port' into idempotent-args-postinit

8f5761a

format

51b85a7

knukiban requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners March 20, 2025 01:16

knukiban added 2 commits March 26, 2025 17:12

Merge branch 'main' into idempotent-args-postinit

4798dfa

Merge branch 'main' into idempotent-args-postinit

f943935

merrymercy self-assigned this Mar 28, 2025

knukiban added 2 commits March 28, 2025 17:06

Merge branch 'main' into idempotent-args-postinit

622f817

keep port_base behavior

306a2f2

github-actions bot closed this May 30, 2025

github-actions bot added the inactive label May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ServerArgs when using DP attention #4599

Fix ServerArgs when using DP attention #4599

Uh oh!

knukiban commented Mar 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix ServerArgs when using DP attention #4599

Fix ServerArgs when using DP attention #4599

Uh oh!

Conversation

knukiban commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

knukiban commented Mar 20, 2025 •

edited

Loading