Skip to content

Conversation

@knukiban
Copy link

@knukiban knukiban commented Mar 20, 2025

Motivation

ServerArgs.__post_init__ is not idempotent and is called twice when running bench_offline_throughput, causing chunked_prefill_size to be halved multiple times.
Additionally, https://github.com/sgl-project/sglang/pull/3964/files#diff-700b5118b493d60d7b5994857f5f1e6a7e842ad702392b8ab199945764dfc8edR1144 caused a port collision. (fixed in #4648 though less robustly)

> python -m sglang.bench_offline_throughput --model-path=deepseek-ai/DeepSeek-V2-Lite  --trust-remote-code --enable-dp-attention --dp-size=2 --tp-size=2
INFO 03-20 00:47:17 __init__.py:190] Automatically detected platform cuda.
The following error message 'operation scheduled before its operands' can be ignored.
DP attention is enabled. The chunked prefill size is adjusted to 4096 to avoid MoE kernel issues.
DP attention is enabled. The chunked prefill size is adjusted to 2048 to avoid MoE kernel issues.
server_args=ServerArgs(..., schedule_conservativeness=0.09, ...)

... a bunch of lines ...

zmq.error.ZMQError: Address already in use (addr='tcp://127.0.0.1:30236')

Modifications

  • Use the user-specified --chunked-prefill-size or --schedule-conservativeness value if explicitly set. Defaults when running sglang.launch_server are unchanged from main.
  • Make ServerArgs.__post_init__() idempotent, so that running sglang.bench_offline_throughput uses the same values as sglang.launch_server
  • Fix port incrementing for generating PortArgs when using --enable_dp_attention

Result: This completes and uses default chunked prefill size 4096 on 2 GPUs.

> python -m sglang.bench_offline_throughput --model-path=deepseek-ai/DeepSeek-V2-Lite  --trust-remote-code --enable-dp-attention --dp-size=2 --tp-size=2
INFO 03-20 00:56:45 __init__.py:190] Automatically detected platform cuda.
The following error message 'operation scheduled before its operands' can be ignored.
DP attention is enabled. The chunked prefill size is adjusted to 4096 to avoid MoE kernel issues.
server_args=ServerArgs(..., schedule_conservativeness=0.3, ...)

Checklist

@merrymercy merrymercy self-assigned this Mar 28, 2025
@github-actions
Copy link
Contributor

This pull request has been automatically closed due to inactivity. Please feel free to reopen it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants