Skip to content

Conversation

@whybeyoung
Copy link
Collaborator

@whybeyoung whybeyoung commented Apr 7, 2025

Notice

We do not recommend using this engine in production. Now MoonCake TransferEngine Is Ready!

It is intended solely for prototyping purposes, to demonstrate the KV cache transmission mechanism in sglang's Prefill-Decode separation design.

Pyverbs is the official Python binding of the RDMA-core library, maintained by the Linux RDMA (Remote Direct Memory Access) subsystem community.

It was introduced to provide Python developers with direct, low-level access to RDMA verbs, which were previously available only through C APIs. These verbs allow users to perform high-performance, low-latency communication by directly reading and writing from/to remote memory over InfiniBand or RoCE-capable networks.

pyverbs makes it easier to experiment with and prototype RDMA applications without writing C code, while still offering access to most of the functionalities provided by native verbs.

Based on the closed pull request #4917, I have optimized the implementation further to make PD (Prefill/Decode separation) available for both GQA models and DeepSeek MLA series models.

Changes

Reorganized disaggregation structure to support multiple engine implementations (e.g., pyverbs, mooncake, etc.).
Now the engine modules can be dynamically selected via config or CLI flag.

Simplified the Pyverbs transfer workflow, using ZeroMQ (zmq) as the single metadata exchange channel between clients and the registry server.
All registration and query of QP and memory info is now handled via a centralized registry server based on zmq.ROUTER.

Design

See 4654

@ByronHsu
Copy link
Collaborator

ByronHsu commented May 4, 2025

Thanks for the great contribution! Let's keep verbs in a sep branch and use mooncake and nixl as primary transfer backend in sglang for now.

@ByronHsu ByronHsu closed this May 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants