[CP] Register KV cache allgather buffer with symmetric memory#24040
[CP] Register KV cache allgather buffer with symmetric memory#24040ShangmingCai merged 3 commits intosgl-project:mainfrom
Conversation
Signed-off-by: wangfakang <fakangwang@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request updates cp_utils.py to ensure that tensor allocations are performed within the use_symmetric_memory context manager. A review comment suggests moving a descriptive comment inside the context manager to improve logical grouping and code clarity.
I am having trouble creating individual review comments. Click here to see my feedback.
python/sglang/srt/layers/utils/cp_utils.py (133)
The comment 'Create output tensor with proper shape for all dimensions' is placed outside the context manager, but it describes the logic inside the context manager. It should be moved inside to maintain logical grouping.
with use_symmetric_memory(
get_attention_cp_group(), disabled=not is_allocation_symmetric()
):
# Create output tensor with proper shape for all dimensions
ShangmingCai
left a comment
There was a problem hiding this comment.
Looks reasonable.
CC: @Shunkangz could you take a look?
|
LGTM. One more small question here. Does it mean that we might need to allocate the memory through |
Yes, that's correct. However, currently when SGLang starts up, it performs a check. If symm is enabled, it will by default pre-allocate 4GB of memory for warming up. This is to avoid frequent subsequent allocations due to insufficient memory, which can lead to memory fragmentation issues. Additionally, once this PR for restructuring the symm pool is merged, the memory pool will be shared across various communication. |
Thank you for the detailed explanation. |
|
/tag-and-rerun-ci |
|
/rerun-failed-ci Trigger waiting test task. |
|
Frendly ping @ShangmingCai @Shunkangz
|
|
/rerun-failed-ci |
|
/rerun-stage stage-c-test-deepep-8-gpu-h200 |
|
/rerun-stage stage-c-test-4-gpu-h100 |
|
✅ Triggered |
|
✅ Triggered |
* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py
…oject#24040) Signed-off-by: wangfakang <fakangwang@gmail.com>



cc @ShangmingCai @Fridge003 PTAL, thx.
Motivation
[CP] Fix missing symmetric memory registration in cp_all_gather_reorganized_into_tensor_kv_cache (#22914 follow-up)
When PR #22914 refactored and consolidated NSA utils.py into cp_utils.py, it missed wrapping the KV cache allgather buffer creation with use_symmetric_memory in cp_all_gather_reorganized_into_tensor_kv_cache. This change adds the missing symmetric memory capability to ensure proper buffer registration for improved communication efficiency when symmetric memory is available.
original Register cp-atten-allgather buffers with symm memory
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci