[HiCache] Add CP support for HiCache by ShangmingCai · Pull Request #20977 · sgl-project/sglang

ShangmingCai · 2026-03-20T05:11:17Z

Motivation

This PR is mostly for Qwen3 CP + Hicache. MLA models + Hicache will reuse cp 0's data for all ranks, so we don't need to distinguish the key.
CC: @whybeyoung

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

gemini-code-assist · 2026-03-20T05:11:22Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…#20977) Signed-off-by: Shangming Cai <csmthu@gmail.com>

vladnosiv · 2026-03-20T07:38:33Z

Hi !
Don't you think that for full support, you also need to synchronize the cache state, as it is done here #20460 ?

ShangmingCai · 2026-03-20T07:58:30Z

Hi ! Don't you think that for full support, you also need to synchronize the cache state, as it is done here #20460 ?

Yeah, you are right, I thought it was only a tag issue, so I basically vibe this, but it turns out we need to handle the control plane coordinate as well.

stmatengss · 2026-03-23T16:09:31Z

+            self.enable_cp = self.attn_cp_size > 1
+            if self.enable_pp or self.enable_cp:
+                self.mha_suffix = (
+                    f"{self.local_rank}_{self.pp_rank}_{self.attn_cp_rank}"
+                )
+                self.mla_suffix = f"{self.pp_rank}_{self.attn_cp_rank}"


Should we consider separating checking for enable_pp and enable_cp, or if only enable_pp is used, then attn_cp_rank should be the default value (_0)?

That depends on our design. If we want to support hetero setups, then maybe we should let the suffix always be f"{self.local_rank}_{self.pp_rank}_{self.attn_cp_rank}" ). They can all be 0 for most cases. But we might also need to put the pp size and cp size in the suffix.

ShangmingCai · 2026-04-02T07:06:39Z

/tag-and-rerun-ci

ShangmingCai · 2026-04-10T09:48:30Z

CI has passed. Let us make it more robust after hicache refactor.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

…md v2 Add [PPPrefillDiag] and [PPPrefillProblem] to _PPPrefillDebugFilter so they are silenced by default (visible with SGLANG_DEBUG_HICACHE_VERBOSE=1). Update PR_PLAN.md with: - Remove PR 1 (CP support) — already merged as sgl-project#20977 - Further split PR 3 into 3a-3e sub-PRs - Add dead code / ENV-gated code inventory - Add debug log tiering strategy with minimal ENV combos for each problem type

Signed-off-by: Shangming Cai <csmthu@gmail.com>

vladnosiv · 2026-04-16T11:46:57Z

Hi !
I've been testing in the context of my PR (#20460), and I have questions about these changes.

Why is the information about CP entered in the kv cache key?
CP uses a strategy of splitting Q and replicating KV. That is, before each attention layer, it's guaranteed that the KV caches on the CP ranks are the same, and after the layer after all-gather, this guarantee is fulfilled again.

This means that each CP rank must receive the same cache when it is retrieved from the cache.
And when writing, it is enough for one rank to be written to the cache.

In other words, CP heterogeneity is automatically maintained and, in general, knowledge about CP in keys is not required.

I maybe missing something, but so far in my PR I have removed information about CP from the keys, review these changes, please

ShangmingCai · 2026-04-16T12:27:22Z

Hi ! I've been testing in the context of my PR (#20460), and I have questions about these changes.

Why is the information about CP entered in the kv cache key? CP uses a strategy of splitting Q and replicating KV. That is, before each attention layer, it's guaranteed that the KV caches on the CP ranks are the same, and after the layer after all-gather, this guarantee is fulfilled again.

This means that each CP rank must receive the same cache when it is retrieved from the cache. And when writing, it is enough for one rank to be written to the cache.

In other words, CP heterogeneity is automatically maintained and, in general, knowledge about CP in keys is not required.

I maybe missing something, but so far in my PR I have removed information about CP from the keys, review these changes, please

@vladnosiv Current CP impl makes each CP rank allocate the full KVCache length, and use allgather to fetch from peers. But we are thinking of supporting a ring-based solution without a full local copy, so I assume it for MHA/GQA now, since MHA/GQA are more memory-intensive than MLA. Anyway, I think it can be changed and optimized at that time, when we are supporting models like Qwen3.5 or others.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

[HiCache] Add CP support for HiCache

a841986

Signed-off-by: Shangming Cai <csmthu@gmail.com>

ShangmingCai requested review from Ying1123, hanming-lu, hnyls2002, hzh0425, ispobock, merrymercy, xiezhq-hermann and yizhang2077 as code owners March 20, 2026 05:11

github-actions Bot added the hicache Hierarchical Caching for SGLang label Mar 20, 2026

whybeyoung added a commit to whybeyoung/sglang that referenced this pull request Mar 20, 2026

[HiCache] Add CP support for HiCache (cherry-pick from PR sgl-project…

5b0fc48

…#20977) Signed-off-by: Shangming Cai <csmthu@gmail.com>

stmatengss reviewed Mar 23, 2026

View reviewed changes

Merge branch 'main' into support_cp_hicache

1bf325c

github-actions Bot added the run-ci label Apr 2, 2026

ShangmingCai added 2 commits April 3, 2026 10:54

Merge branch 'main' into support_cp_hicache

14b4eb0

Merge branch 'main' into support_cp_hicache

9667f66

ShangmingCai merged commit 1c76f32 into main Apr 10, 2026
299 of 358 checks passed

ShangmingCai deleted the support_cp_hicache branch April 10, 2026 09:52

Fridge003 pushed a commit that referenced this pull request Apr 11, 2026

[HiCache] Add CP support for HiCache (#20977)

41ec817

Signed-off-by: Shangming Cai <csmthu@gmail.com>

whybeyoung mentioned this pull request Apr 12, 2026

[PP + HiCache] HiCache Consistency Fix Plan #22607

Open

11 tasks

pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026

[HiCache] Add CP support for HiCache (sgl-project#20977)

f27445d

Signed-off-by: Shangming Cai <csmthu@gmail.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[HiCache] Add CP support for HiCache (sgl-project#20977)

37f225a

Signed-off-by: Shangming Cai <csmthu@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HiCache] Add CP support for HiCache#20977

[HiCache] Add CP support for HiCache#20977
ShangmingCai merged 4 commits into
mainfrom
support_cp_hicache

ShangmingCai commented Mar 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 20, 2026

Uh oh!

vladnosiv commented Mar 20, 2026

Uh oh!

ShangmingCai commented Mar 20, 2026

Uh oh!

stmatengss Mar 23, 2026

Uh oh!

ShangmingCai Apr 2, 2026

Uh oh!

ShangmingCai commented Apr 2, 2026

Uh oh!

ShangmingCai commented Apr 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

vladnosiv commented Apr 16, 2026

Uh oh!

ShangmingCai commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ShangmingCai commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Mar 20, 2026

Uh oh!

vladnosiv commented Mar 20, 2026

Uh oh!

ShangmingCai commented Mar 20, 2026

Uh oh!

stmatengss Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

ShangmingCai Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

ShangmingCai commented Apr 2, 2026

Uh oh!

ShangmingCai commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vladnosiv commented Apr 16, 2026

Uh oh!

ShangmingCai commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ShangmingCai commented Mar 20, 2026 •

edited

Loading

ShangmingCai commented Apr 10, 2026 •

edited

Loading