docs: add Hunyuan 3 Preview cookbook by JustinTong0323 · Pull Request #23532 · sgl-project/sglang

JustinTong0323 · 2026-04-23T05:33:29Z

Summary

Adds a cookbook entry for Tencent Hunyuan 3 Preview (Hy3-preview / Hy3-preview-FP8 / Hy3-preview-Base) under docs_new/cookbook/autoregressive/Tencent/.

Doc (cookbook/autoregressive/Tencent/Hunyuan3-Preview.mdx): 5-section Mintlify recipe covering model intro, install/Docker tags, deployment tips, invocation examples with real output (basic, hybrid thinking high/none, non-stream + streaming tool call), and benchmarks.
Interactive generator (src/snippets/autoregressive/hunyuan3-preview-deployment.jsx): NVIDIA A100/H100/H200/B200/B300/GB300 × FP8/BF16 + hunyuan parsers + MTP toggle (prepends SGLANG_ENABLE_SPEC_V2=1); Blackwell hardware unconditionally adds --attention-backend trtllm_mha.
Nav (docs.json): adds a Tencent group under Autoregressive Models.

Model characteristics surfaced in the doc

HYV3 MoE: 80 layers (1 dense + 79 MoE), 192 routed experts + 1 shared, 8 active/token, ~276B total / ~20B active
256K context (262,144 positions), hybrid thinking via reasoning_effort, built-in MTP draft module
Tool-call grammar: <tool_call> / <arg_key> / <arg_value> — uses --tool-call-parser hunyuan and --reasoning-parser hunyuan

Benchmarks included

GSM8K: 95.0% (200 Q, 5-shot) on 4× H200
MMLU: 82.5% average (all 57 subjects, 5-shot)
Tool-Call Accuracy (MiniMax-Provider-Verifier): 100% Query-Success, 98.02% ToolCalls-Match, 96.43% Schema-Accuracy
bench_serving low / high concurrency (TTFT, TPOT, throughput) on 4× H200

Notes

HYV3ForCausalLM, hunyuan tool-call / reasoning parsers, and the MTP draft loader are not yet upstream. This PR only adds documentation; it assumes the corresponding model-code / parser PRs land separately before the Hy3-preview weights are public.
Docker tags in Section 2 (lmsysorg/sglang:hy3-preview{,-cu130}) are placeholders for the release-specific image naming.
License row in Section 1 is a TODO pending final HuggingFace model-card publication.

Test plan

python3 -c "import json; json.load(open('docs_new/docs.json'))" — nav JSON parses
Local build succeeded in the migrated cookbook layout (sgl-cookbook site, pre-migration)
mint dev preview on docs_new/ to visually verify the interactive generator and table rendering

- Section 1: MoE architecture (~276B / ~20B active), hybrid thinking (reasoning_effort high/medium/low/none), 256K context, MTP - Section 2: Docker image table (lmsysorg/sglang:hy3-preview{,-cu130}) - Section 3: Interactive Hunyuan3PreviewDeployment jsx generator (NVIDIA A100/H100/H200/B200/B300/GB300, FP8/BF16, Blackwell auto-injects --attention-backend trtllm_mha, MTP toggle prepends SGLANG_ENABLE_SPEC_V2=1 and --speculative-algorithm EAGLE flags) - Section 4: Real invocation outputs (simple completion, thinking high-effort, instant mode, non-stream tool call, streaming tool call) - Section 5: GSM8K (95.0%), MMLU (82.5%), tool-call accuracy via MiniMax-Provider-Verifier, low/high-concurrency bench_serving results on 4x H200

gemini-code-assist · 2026-04-23T05:33:32Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

- Remove Hy3-preview-FP8 from Available Models list - Remove FP8 Hardware Requirements block (Section 3.2) - JSX generator: drop A100/H100 hardware (BF16 won't fit single-node on 80GB GPUs) and drop the FP8 quantization option - All deploy commands switch from '--tp 4' to '--tp 8' (H200 BF16 default) - Docker table: narrow to H200/B200 and B300/GB300 - Section 5 benchmarks: replace FP8-Testing numbers with TODO placeholders for BF16 re-measure

JustinTong0323 requested a review from wisclmy0611 as a code owner April 23, 2026 05:33

Qiaolin-Yu approved these changes Apr 23, 2026

View reviewed changes

Qiaolin-Yu merged commit 4868e36 into sgl-project:main Apr 23, 2026
42 checks passed

zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026

docs: add Hunyuan 3 Preview cookbook (sgl-project#23532)

5aa56bb

andyluo7 mentioned this pull request Apr 23, 2026

docs: add AMD MI300X/MI325X/MI350X/MI355X to Hunyuan 3 Preview cookbook #23582

Open

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

docs: add Hunyuan 3 Preview cookbook (sgl-project#23532)

e530210

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add Hunyuan 3 Preview cookbook#23532

docs: add Hunyuan 3 Preview cookbook#23532
Qiaolin-Yu merged 2 commits into
sgl-project:mainfrom
JustinTong0323:feat/hy3-preview-cookbook

JustinTong0323 commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JustinTong0323 commented Apr 23, 2026

Summary

Model characteristics surfaced in the doc

Benchmarks included

Notes

Test plan

Uh oh!

gemini-code-assist Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants