Skip to content

Add Qwen3.5 FP8 B300 MTP config with spec v2#247

Open
faradawn wants to merge 1 commit into
sgl-project:mainfrom
faradawn:qwen35-b300-fp8-mtp
Open

Add Qwen3.5 FP8 B300 MTP config with spec v2#247
faradawn wants to merge 1 commit into
sgl-project:mainfrom
faradawn:qwen35-b300-fp8-mtp

Conversation

@faradawn
Copy link
Copy Markdown
Contributor

Add EAGLE speculative decoding (MTP) support for Qwen3.5-397B-A17B-FP8 on B300: update TP from 2→4, prepend SGLANG_ENABLE_SPEC_V2=1 for H200/B300 FP8 MTP (supersedes #240 for the H200 half), and add B300-specific flags (--enable-symm-mem, --moe-runner-backend flashinfer_trtllm, --chunked-prefill-size 16384, --max-prefill-tokens 16384, --stream-interval 50, --scheduler-recv-interval 10). Based on SemiAnalysisAI/InferenceX#1035.

Add EAGLE speculative decoding support for Qwen3.5-397B-A17B-FP8 on B300: update TP from 2→4, prepend SGLANG_ENABLE_SPEC_V2=1 for H200/B300 FP8 MTP, and add B300-specific flags (--enable-symm-mem, --moe-runner-backend, --chunked-prefill-size, --max-prefill-tokens, --stream-interval, --scheduler-recv-interval). Based on SemiAnalysisAI/InferenceX#1035.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant