Skip to content

docs: add Hunyuan 3 Preview cookbook#23532

Merged
Qiaolin-Yu merged 2 commits into
sgl-project:mainfrom
JustinTong0323:feat/hy3-preview-cookbook
Apr 23, 2026
Merged

docs: add Hunyuan 3 Preview cookbook#23532
Qiaolin-Yu merged 2 commits into
sgl-project:mainfrom
JustinTong0323:feat/hy3-preview-cookbook

Conversation

@JustinTong0323
Copy link
Copy Markdown
Collaborator

Summary

Adds a cookbook entry for Tencent Hunyuan 3 Preview (Hy3-preview / Hy3-preview-FP8 / Hy3-preview-Base) under docs_new/cookbook/autoregressive/Tencent/.

  • Doc (cookbook/autoregressive/Tencent/Hunyuan3-Preview.mdx): 5-section Mintlify recipe covering model intro, install/Docker tags, deployment tips, invocation examples with real output (basic, hybrid thinking high/none, non-stream + streaming tool call), and benchmarks.
  • Interactive generator (src/snippets/autoregressive/hunyuan3-preview-deployment.jsx): NVIDIA A100/H100/H200/B200/B300/GB300 × FP8/BF16 + hunyuan parsers + MTP toggle (prepends SGLANG_ENABLE_SPEC_V2=1); Blackwell hardware unconditionally adds --attention-backend trtllm_mha.
  • Nav (docs.json): adds a Tencent group under Autoregressive Models.

Model characteristics surfaced in the doc

  • HYV3 MoE: 80 layers (1 dense + 79 MoE), 192 routed experts + 1 shared, 8 active/token, ~276B total / ~20B active
  • 256K context (262,144 positions), hybrid thinking via reasoning_effort, built-in MTP draft module
  • Tool-call grammar: <tool_call> / <arg_key> / <arg_value> — uses --tool-call-parser hunyuan and --reasoning-parser hunyuan

Benchmarks included

  • GSM8K: 95.0% (200 Q, 5-shot) on 4× H200
  • MMLU: 82.5% average (all 57 subjects, 5-shot)
  • Tool-Call Accuracy (MiniMax-Provider-Verifier): 100% Query-Success, 98.02% ToolCalls-Match, 96.43% Schema-Accuracy
  • bench_serving low / high concurrency (TTFT, TPOT, throughput) on 4× H200

Notes

  • HYV3ForCausalLM, hunyuan tool-call / reasoning parsers, and the MTP draft loader are not yet upstream. This PR only adds documentation; it assumes the corresponding model-code / parser PRs land separately before the Hy3-preview weights are public.
  • Docker tags in Section 2 (lmsysorg/sglang:hy3-preview{,-cu130}) are placeholders for the release-specific image naming.
  • License row in Section 1 is a TODO pending final HuggingFace model-card publication.

Test plan

  • python3 -c "import json; json.load(open('docs_new/docs.json'))" — nav JSON parses
  • Local build succeeded in the migrated cookbook layout (sgl-cookbook site, pre-migration)
  • mint dev preview on docs_new/ to visually verify the interactive generator and table rendering

- Section 1: MoE architecture (~276B / ~20B active), hybrid thinking
  (reasoning_effort high/medium/low/none), 256K context, MTP
- Section 2: Docker image table (lmsysorg/sglang:hy3-preview{,-cu130})
- Section 3: Interactive Hunyuan3PreviewDeployment jsx generator
  (NVIDIA A100/H100/H200/B200/B300/GB300, FP8/BF16,
  Blackwell auto-injects --attention-backend trtllm_mha,
  MTP toggle prepends SGLANG_ENABLE_SPEC_V2=1 and
  --speculative-algorithm EAGLE flags)
- Section 4: Real invocation outputs (simple completion, thinking
  high-effort, instant mode, non-stream tool call, streaming tool call)
- Section 5: GSM8K (95.0%), MMLU (82.5%), tool-call accuracy via
  MiniMax-Provider-Verifier, low/high-concurrency bench_serving
  results on 4x H200
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

- Remove Hy3-preview-FP8 from Available Models list
- Remove FP8 Hardware Requirements block (Section 3.2)
- JSX generator: drop A100/H100 hardware (BF16 won't fit single-node
  on 80GB GPUs) and drop the FP8 quantization option
- All deploy commands switch from '--tp 4' to '--tp 8' (H200 BF16 default)
- Docker table: narrow to H200/B200 and B300/GB300
- Section 5 benchmarks: replace FP8-Testing numbers with TODO
  placeholders for BF16 re-measure
@Qiaolin-Yu Qiaolin-Yu merged commit 4868e36 into sgl-project:main Apr 23, 2026
42 checks passed
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants