[DisaggEverything] `DisaggregatedRequestManager` aka `Coordinator` [1/N] by NickLucche · Pull Request #26178 · vllm-project/vllm

NickLucche · 2025-10-03T15:56:27Z

Second step in implementing the "Disaggregated Everything" proposal #22817.
Follows from #24261 (although not a strong pre-requisite) .
This PR focuses on the following component:

Which would

Note

As I feel the name Coordinator is quite overloaded, I have renamed the component presented in the original chart to DisaggregatedRequestManager to try and have a clearer identity that would be harder to confuse. The change is totally opinionated and I am very much open to better naming suggestions.

Overview

In very concrete terms, this PR introduces the following:

A DisaggregatedRequestManager interface
A DisaggregatedRequestManagerFactory as factory builder and for registering custom OOT managers
A DisaggregatedServerMixin class, intended to be plugged in on some API endpoint (eg serving tokens in [DisaggEverything] Tokens in<>out /generate endpoint #24261 on /v1/inference/generate), meant to add the disaggregation coordination capability.
PrefillLocalDecodeRemoteManager a specialization of DisaggregatedRequestManager implementing a concrete coordination behavior.
Tests to showcase functionalities

What this PR does not:

It does NOT plug-in the DisaggregatedRequestManager capabilities into any API endpoint. No changes at all are expected in vLLM's behavior. This is laying the foundation to allow it, once we figure out the right implementation.

Design

A DisaggregatedRequestManager subclass is a particular implementation of a disaggregated protocol: eg it defines if/what request should be executed locally and if/what request should be sent for execution remotely instead.
One concrete example, the PrefillLocalDecodeRemoteManager, expects a completion request from LB/IGW, executes the prefill phase of the request locally (P instance), then sends the request to a remote D for decoding.
Mind that optionally, a deferred decode selection logic could be injected here at this point, reaching back to the EndointPicker (EPP) to get a remote address for D.

A manager can be placed independently on both P and D, depending on some startup config or dynamically added at runtime (eg in response , TODO).
Each vLLM instance can have multiple policies active at once: this enables the LB to dynamically switch coordination policy in response to to change in deployment or traffic conditions (eg Eeager vs Deferred Decode).

The DisaggregatedServerMixin class is introducing for handling multiple managers, maintaining shareable state and deciding dispatching priority (this is now fixed at startup, to allow dynamically changing priority).

The manager is also responsible for providing a single abstraction over the connection from the LB pov: when connection is dropped client-side, it should enable cleanup routines to run on both local and remote.

PrefillLocalDecodeRemote streaming example:

Future work

I will iterate on this interface based on feedback, then move on to implement more core features like streaming, and finally plug the Mixin into the /generate endpoint.

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche · 2025-10-03T15:56:47Z

cc @smarterclayton @robertgshaw2-redhat

NickLucche added 5 commits September 18, 2025 12:39

init

50405ef

Signed-off-by: NickLucche <nlucches@redhat.com>

update

4b96a2e

Signed-off-by: NickLucche <nlucches@redhat.com>

iterate

9d3322f

Signed-off-by: NickLucche <nlucches@redhat.com>

add tests

4f0d665

Signed-off-by: NickLucche <nlucches@redhat.com>

update

ada3563

Signed-off-by: NickLucche <nlucches@redhat.com>

mergify bot added the v1 label Oct 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DisaggEverything] `DisaggregatedRequestManager` aka `Coordinator` [1/N] #26178

[DisaggEverything] `DisaggregatedRequestManager` aka `Coordinator` [1/N] #26178
NickLucche wants to merge 5 commits intovllm-project:mainfrom
NickLucche:disaggev-coordinator

NickLucche commented Oct 3, 2025 •

edited by github-actions bot

Loading

Uh oh!

NickLucche commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

NickLucche commented Oct 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Design

Future work

Uh oh!

NickLucche commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NickLucche commented Oct 3, 2025 •

edited by github-actions bot

Loading