[HiCache] Add synchronization for context parallelism#20460
Conversation
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
@vladnosiv does this feature improve the performance? |
Hi ! No, according to my observations, without this change, CP + HiCache (L2) may fail after some time (up to 30-40 minutes according to my observations) |
Thanks, is there any script or command to reproduce this situation. |
I think that such a launch + some cache-heavy traffic like the mooncake dataset in bench_serving should reproduce the crash: In my case, the repro setup was with P/D + CP on Prefills + HiCache and real prod traffic. After 30-40 minutes, prefills stopped responding to generation requests in any way without an obvious fail. I also saw that @whybeyoung worked on CP + PP + P/D + HiCache, maybe he has more information. |
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
@hzh0425 done |
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
fixed the flag --moe-dp-size 2 that was lost during the moving of the test |
|
/rerun-test test_qwen35_hicache.py test_hicache_storage_mooncake_backend.py test_hicache_storage_file_backend.py |
|
✅ ✅ |
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
/rerun-failed-ci |
|
stage-b test crashed a second time due to hf error 429 |
|
/rerun-failed-ci |
|
/rerun-test test_hicache_storage_mooncake_backend.py test_hicache_storage_file_backend.py |
|
HiCache's CI has been temporarily removed due to some incompatibilities with the CUDA 13 environment. |
|
Please address conflict |
# Conflicts: # python/sglang/srt/mem_cache/hi_mamba_radix_cache.py
|
CI is done |

HiCache previously synchronized state only within
tp_group, which is no longer sufficient after the CP split.This could cause different CP ranks to make different decisions about prefetch completion/revoke, write-through ack handling, and host-cache updates.
This change passes
attn_cp/attn_tpgroups into HiCache and switches the relevant sync points to CP-aware reductions/barriers, including storage-prefetch synchronization.