Commit 4b3a210
authored
### What this PR does / why we need it?
The PR is the cherry-pick from v0.9.1
#1953
This PR introduce a new load balance proxy server example implementation
for disaggregated pd, which support simple token&kv_cache aware load
balance routing strategy for the disaggregated pd system compared with
origin round robin toy_proxy.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
tested on real workload and unittest
- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@ad57f23
---------
Signed-off-by: ganyi <[email protected]>
1 parent af04ee9 commit 4b3a210
File tree
2 files changed
+518
-275
lines changed- examples/disaggregated_prefill_v1
2 files changed
+518
-275
lines changed
0 commit comments