Skip to content

[HiCache] Add synchronization for context parallelism#20460

Merged
ShangmingCai merged 27 commits into
sgl-project:mainfrom
vladnosiv:hicache-and-cp
Apr 27, 2026
Merged

[HiCache] Add synchronization for context parallelism#20460
ShangmingCai merged 27 commits into
sgl-project:mainfrom
vladnosiv:hicache-and-cp

Conversation

@vladnosiv
Copy link
Copy Markdown
Contributor

HiCache previously synchronized state only within tp_group, which is no longer sufficient after the CP split.
This could cause different CP ranks to make different decisions about prefetch completion/revoke, write-through ack handling, and host-cache updates.

This change passes attn_cp / attn_tp groups into HiCache and switches the relevant sync points to CP-aware reductions/barriers, including storage-prefetch synchronization.

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

whybeyoung added a commit to whybeyoung/sglang that referenced this pull request Mar 20, 2026
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
voipmonitor added a commit to voipmonitor/sglang that referenced this pull request Apr 1, 2026
@libratiger
Copy link
Copy Markdown
Contributor

@vladnosiv does this feature improve the performance?

@vladnosiv
Copy link
Copy Markdown
Contributor Author

vladnosiv commented Apr 2, 2026

@vladnosiv does this feature improve the performance?

Hi ! No, according to my observations, without this change, CP + HiCache (L2) may fail after some time (up to 30-40 minutes according to my observations)
So it's for reliability.

@libratiger
Copy link
Copy Markdown
Contributor

@vladnosiv does this feature improve the performance?

Hi ! No, according to my observations, without this change, CP + HiCache (L2) may fail after some time (up to 30-40 minutes according to my observations) So it's for reliability.

Thanks, is there any script or command to reproduce this situation.

@vladnosiv
Copy link
Copy Markdown
Contributor Author

Thanks, is there any script or command to reproduce this situation.

I think that such a launch + some cache-heavy traffic like the mooncake dataset in bench_serving should reproduce the crash:

python3 -m sglang.launch_server \
      --model-path deepseek-ai/DeepSeek-V3.2 \
      --trust-remote-code \
      --tp-size 8 \
      --attn-cp-size 8 \
      --enable-nsa-prefill-context-parallel \
      --chat-template examples/chat_template/tool_chat_template_deepseekv32.jinja \
      --mem-fraction-static 0.8 \
      --enable-hierarchical-cache \
      --hicache-ratio 2.0 &> sglang.out

In my case, the repro setup was with P/D + CP on Prefills + HiCache and real prod traffic. After 30-40 minutes, prefills stopped responding to generation requests in any way without an obvious fail.
I can confirm that with this commit, I don't see any problems for a weeks.

I also saw that @whybeyoung worked on CP + PP + P/D + HiCache, maybe he has more information.

voipmonitor pushed a commit to voipmonitor/sglang that referenced this pull request Apr 12, 2026
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@vladnosiv
Copy link
Copy Markdown
Contributor Author

@hzh0425 done

@hzh0425 hzh0425 added the run-ci label Apr 17, 2026
hzh0425 and others added 2 commits April 17, 2026 10:47
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@vladnosiv
Copy link
Copy Markdown
Contributor Author

fixed the flag --moe-dp-size 2 that was lost during the moving of the test

@hzh0425
Copy link
Copy Markdown
Collaborator

hzh0425 commented Apr 18, 2026

/rerun-test test_qwen35_hicache.py test_hicache_storage_mooncake_backend.py test_hicache_storage_file_backend.py

@github-actions
Copy link
Copy Markdown
Contributor

4-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/4-gpu-models/test_qwen35_hicache.py

2-gpu-h100 (2 tests): View workflow run

cd test/ && python3 registered/hicache/test_hicache_storage_mooncake_backend.py
cd test/ && python3 registered/hicache/test_hicache_storage_file_backend.py

@vladnosiv
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@vladnosiv
Copy link
Copy Markdown
Contributor Author

stage-b test crashed a second time due to hf error 429

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/rerun-test test_hicache_storage_mooncake_backend.py test_hicache_storage_file_backend.py

@sgl-project sgl-project deleted a comment from github-actions Bot Apr 23, 2026
@hzh0425
Copy link
Copy Markdown
Collaborator

hzh0425 commented Apr 23, 2026

HiCache's CI has been temporarily removed due to some incompatibilities with the CUDA 13 environment.
We might need to wait until the CI is restored before proceeding further.

@ShangmingCai
Copy link
Copy Markdown
Collaborator

Please address conflict

# Conflicts:
#	python/sglang/srt/mem_cache/hi_mamba_radix_cache.py
@vladnosiv
Copy link
Copy Markdown
Contributor Author

Снимок экрана 2026-04-27 в 13 39 45

@vladnosiv
Copy link
Copy Markdown
Contributor Author

CI is done

@ShangmingCai ShangmingCai merged commit 28ee08c into sgl-project:main Apr 27, 2026
211 of 232 checks passed
vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026
)

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hicache Hierarchical Caching for SGLang high priority run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants