Skip to content

[UnifiedTree] Support deepseek v4 host pool layout#25282

Merged
hzh0425 merged 19 commits into
sgl-project:mainfrom
antgroup:dsv4_layout
May 19, 2026
Merged

[UnifiedTree] Support deepseek v4 host pool layout#25282
hzh0425 merged 19 commits into
sgl-project:mainfrom
antgroup:dsv4_layout

Conversation

@huangtingwei9988
Copy link
Copy Markdown
Collaborator

@huangtingwei9988 huangtingwei9988 commented May 14, 2026

Motivation

DeepSeek-V4-Flash-FP8 / test_gsm8k / hicache_ratio=8 /  200 examples / 10 shot

layout_backend                first_acc  second_acc_after_reset  first_latency  second_latency  first_throughput  second_throughput
layer_first + direct           0.980      0.980                   37.573s        36.515s         235.277 tok/s     243.079 tok/s
page_first_direct + direct     0.970      0.970                   36.960s        35.576s         240.068 tok/s     250.393 tok/s
page_first + kernel            0.980      0.970                   37.181s        36.445s         236.655 tok/s     245.739 tok/s

Notes:
- second_acc_after_reset = first round finished, then _reset_l1_only, then run test_gsm8k again.
- All three configs logged:
  UnifiedRadixCache L1-only reset completed
  Cache L1-only reset successfully!

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

CI States

Latest PR Test (Base): ⏳ Run #26016810718
Latest PR Test (Extra): ⚠️ Not enabled -- add run-ci-extra label to opt in.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multiple memory layouts (layer_first, page_first, and page_first_direct) and transfer backends (kernel, direct) within the DeepSeek V4 host memory pools. The changes include updating the pool initialization to allocate buffers based on the selected layout, implementing layout-aware data transfer logic for both device-to-host and host-to-device operations, and refining metadata retrieval for page buffers. Additionally, the test suite has been expanded to include smoke tests for various layout and backend combinations. I have no feedback to provide as there were no review comments to evaluate.

@hzh0425 hzh0425 self-assigned this May 14, 2026
@hzh0425 hzh0425 added the run-ci label May 14, 2026
@hzh0425 hzh0425 changed the base branch from hybrid_tree/deepseek_v4_hicache_integrate to main May 15, 2026 03:22
@hzh0425
Copy link
Copy Markdown
Collaborator

hzh0425 commented May 15, 2026

/rerun-test test/registered/radix_cache/test_unified_radix_hicache_kl.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

🚀 8-gpu-h200 (1 test): ❌ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_hicache_kl.py

@huangtingwei9988
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/radix_cache/test_unified_radix_hicache_kl.py

@github-actions
Copy link
Copy Markdown
Contributor

/rerun-test is not available for fork PRs unless the commenter has write permission on the repo.

Please ask a maintainer to run this command, or use the normal CI flow.

@hzh0425
Copy link
Copy Markdown
Collaborator

hzh0425 commented May 15, 2026

/rerun-test test/registered/radix_cache/test_unified_radix_hicache_kl.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

🚀 8-gpu-h200 (1 test): ❌ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_hicache_kl.py

@huangtingwei9988
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

🚀 8-gpu-h200 (1 test): ❌ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_hicache_kl.py

@hzh0425
Copy link
Copy Markdown
Collaborator

hzh0425 commented May 15, 2026

/rerun-test test/registered/radix_cache/test_unified_radix_hicache_kl.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

🚀 8-gpu-h200 (1 test): ❌ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_hicache_kl.py

@huangtingwei9988
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@hzh0425
Copy link
Copy Markdown
Collaborator

hzh0425 commented May 16, 2026

/rerun-test test/registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

🚀 8-gpu-h200 (1 test): ❌ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@hzh0425
Copy link
Copy Markdown
Collaborator

hzh0425 commented May 17, 2026

/rerun-test test/registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 17, 2026

🚀 8-gpu-h200 (1 test): ❌ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@huangtingwei9988
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@github-actions
Copy link
Copy Markdown
Contributor

/rerun-test is not available for fork PRs unless the commenter has write permission on the repo.

Please ask a maintainer to run this command, or use the normal CI flow.

@hzh0425
Copy link
Copy Markdown
Collaborator

hzh0425 commented May 18, 2026

/rerun-test test/registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

🚀 8-gpu-h200 (1 test): ✅ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@huangtingwei9988
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

🚀 8-gpu-h200 (1 test): ✅ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@huangtingwei9988
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

🚀 8-gpu-h200 (1 test): ✅ View workflow run

cd test/ && python3 registered/radix_cache/test_unified_radix_cache_kl_hicache.py

@huangtingwei9988
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@hzh0425 hzh0425 merged commit c2a212b into sgl-project:main May 19, 2026
202 of 215 checks passed
Shunkangz pushed a commit to Shunkangz/sglang that referenced this pull request May 27, 2026
alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hicache Hierarchical Caching for SGLang run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants