[V0] Support multiple kv connectors #18395
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Inspired by #17564 which make v1 support multiple connectors co-exist.
This PR aimed to implement MultiConnectorV0 for v0.
How to test
Start a 1P 1D cluster, and use lmcacheConnector as an offload purpose connector in the P instance, use FSConnector in the both P and D instance as a PD transfer connector.(Will submit another PR for introduce FSConnector, but you can use any other connector instead as P/D transfer connector)
CUDA_VISIBLE_DEVICES=0 \ VLLM_LOGGING_LEVEL=DEBUG \ LMCACHE_USE_EXPERIMENTAL=True LMCACHE_TRACK_USAGE=false LMCACHE_LOG_LEVEL=DEBUG \ LMCACHE_CONFIG_FILE=/disc/data1/baoloongmao/cpu/lmcache-cpu.yaml \ VLLM_MLA_DISABLE=1 VLLM_USE_V1=0 \ vllm serve /disc/data1/deepseek/DeepSeek-V2-Lite-Chat/ \ --trust-remote-code \ --served-model-name vllm_cpu_offload \ --max-model-len 32768 \ --max-seq-len-to-capture 10000 \ --max-num-seqs 64 \ --gpu-memory-utilization 0.9 \ --host 0.0.0.0 \ -tp 1 \ --no-enable-prefix-caching \ --max-num-batched-tokens 64000 \ --kv-transfer-config '{"kv_connector":"MultiConnectorV0","kv_role":"kv_both","kv_connector_extra_config":{"connectors":[{"kv_connector":"FSConnector","kv_role":"kv_producer","kv_connector_extra_config":{"fs_storage_path":"fs_local_storage","transfer":true}},{"kv_connector":"LMCacheConnector","kv_role":"kv_both"}]}}'CUDA_VISIBLE_DEVICES=1 \ VLLM_LOGGING_LEVEL=DEBUG \ LMCACHE_USE_EXPERIMENTAL=True LMCACHE_TRACK_USAGE=false LMCACHE_LOG_LEVEL=DEBUG \ LMCACHE_CONFIG_FILE=/disc/data1/baoloongmao/cpu/lmcache-cpu.yaml \ VLLM_MLA_DISABLE=1 VLLM_USE_V1=0 \ vllm serve /disc/data1/deepseek/DeepSeek-V2-Lite-Chat/ \ --trust-remote-code \ --served-model-name vllm_cpu_offload \ --max-model-len 32768 \ --max-seq-len-to-capture 10000 \ --max-num-seqs 64 \ --gpu-memory-utilization 0.9 \ --host 0.0.0.0 \ -tp 1 \ --no-enable-prefix-caching \ --max-num-batched-tokens 64000 \ --kv-transfer-config '{"kv_connector":"FSConnector","kv_role":"kv_consumer","kv_connector_extra_config":{"fs_storage_path":"fs_local_storage"}}' \ --port 8001fs_local_storagefolder