Commit c823c29
Implementation of simple load balance routing proxy server (vllm-project#1953) (vllm-project#2124)
### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
vllm-project#1953
This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest
- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23
---------
Signed-off-by: ganyi <[email protected]>1 parent ec32b99 commit c823c29
File tree
2 files changed
+518
-275
lines changed- examples/disaggregated_prefill_v1
2 files changed
+518
-275
lines changed
0 commit comments