Skip to content

Conversation

@casteryh
Copy link
Contributor

@casteryh casteryh commented Nov 1, 2025

Summary: Introduces async RDMA API with background completion polling and work request tracking via CompletionTracker. Large operations are automatically chunked and tracked using oneshot channels. The new read_into() and write_from() methods now use async completion tracking for better concurrency. The request_queue_pair() method is preserved for backwards compatibility; however, using previous QP polling APIs won't work if a polling task is actively polling the queue pair, as the background task will consume completions.

Differential Revision: D85962453

Summary:
Allow us to test the bandwidth of concurrent rdma operations.

actual concurrency support will be added later in the stack.

Differential Revision: D85724514
Summary:
This diff adds a FIFO waiting mechanism for acquiring the queue pair.

**Actual concurrency support will come in follow-up diff**: this will require we track wr_id and create a polling task for each queue pair, which will need further refactoring.

**Not true concurrent support**: Only one request can be in-flight at a time per queue pair connection. However, subsequent requests now wait fairly (FIFO) instead of panicking with "already checked out" errors.

Core changes:
1. **Fair Waiting via Semaphore**: Replaced Available/CheckedOut states with Connecting/Ready/ConnectionError. Added QueuePairEntry wrapping RdmaQueuePair + Arc<Semaphore>. request_queue_pair uses two-phase approach (get/create QP, then acquire semaphore permit for FIFO fairness).

2. **Refactor - Moved read_into/write_from to RdmaManagerActor**: Prevents deadlock in actor message queue. Old design had RdmaBuffer call request_queue_pair RPC → perform operation → call release_queue_pair RPC, causing release messages to queue behind waiting requests. Now entire operation (request → use → release) happens within single actor message handler.

3. **Refactored Connection Logic**: Extracted establish_connection helper handling both loopback and remote connections.

5. **Test**: Added create_buffer_pair method and concurrent tests.

Differential Revision: D85627877
Summary: Introduces async RDMA API with background completion polling and work request tracking via CompletionTracker. Large operations are automatically chunked and tracked using oneshot channels. The new read_into() and write_from() methods now use async completion tracking for better concurrency. The request_queue_pair() method is preserved for backwards compatibility; however, using previous QP polling APIs won't work if a polling task is actively polling the queue pair, as the background task will consume completions.

Differential Revision: D85962453
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 1, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 1, 2025

@casteryh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85962453.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants