[PD] support verbs transfer engine for prototyping purpose #5111
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Notice
We do not recommend using this engine in production. Now MoonCake TransferEngine Is Ready!
It is intended solely for prototyping purposes, to demonstrate the KV cache transmission mechanism in sglang's Prefill-Decode separation design.
Pyverbs is the official Python binding of the RDMA-core library, maintained by the Linux RDMA (Remote Direct Memory Access) subsystem community.
It was introduced to provide Python developers with direct, low-level access to RDMA verbs, which were previously available only through C APIs. These verbs allow users to perform high-performance, low-latency communication by directly reading and writing from/to remote memory over InfiniBand or RoCE-capable networks.
pyverbs makes it easier to experiment with and prototype RDMA applications without writing C code, while still offering access to most of the functionalities provided by native verbs.
Based on the closed pull request #4917, I have optimized the implementation further to make PD (Prefill/Decode separation) available for both GQA models and DeepSeek MLA series models.
Changes
Reorganized disaggregation structure to support multiple engine implementations (e.g., pyverbs, mooncake, etc.).
Now the engine modules can be dynamically selected via config or CLI flag.
Simplified the Pyverbs transfer workflow, using ZeroMQ (zmq) as the single metadata exchange channel between clients and the registry server.
All registration and query of QP and memory info is now handled via a centralized registry server based on zmq.ROUTER.
Design
See 4654