-
Notifications
You must be signed in to change notification settings - Fork 7k
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tllmserveRay Serve Related IssueRay Serve Related IssuestabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Description
What happened + What you expected to happen
When we deploy Ray Serve LLM with multiple replicas (num_replicas ≥ 2) and Tensor Parallelism (tensor_parallel_size ≥ 2), port collisions occur between TP workers from different replicas when using NIXL KV transfer backend.
What happened:
Second replica fails to start with NIXL_ERR_BACKEND or ZMQ port binding errors
Logs show workers trying to bind to already-used ports
Autoscaling from 1 to 2+ replicas breaks
Expected:
Each TP worker should get a unique port
Autoscaling should work reliably
Related Issues
PR #57771
Issue #55775
Versions / Dependencies
Ray: 2.50+ (after PR #57771)
vLLM: 0.11+
Python: 3.11+
Reproduction script
#serve_llama_3dot1_8b_quantized_tp1_2p6d.yaml
applications:
- args:
prefill_config:
model_loading_config:
model_id: neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16
accelerator_type: L4
engine_kwargs:
max_model_len: 8192
tensor_parallel_size: 1
enforce_eager: true
kv_transfer_config:
kv_connector: NixlConnector
kv_role: kv_both
deployment_config:
autoscaling_config:
min_replicas: 2
max_replicas: 2
decode_config:
model_loading_config:
model_id: neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16
accelerator_type: L4
engine_kwargs:
max_model_len: 8192
tensor_parallel_size: 1
enforce_eager: true
kv_transfer_config:
kv_connector: NixlConnector
kv_role: kv_both
deployment_config:
autoscaling_config:
min_replicas: 6
max_replicas: 6
import_path: ray.serve.llm:build_pd_openai_app
name: llm-endpoint
route_prefix: /
Error output
zmq.error.ZMQError: Address already in use
Issue Severity
Medium: It is a significant difficulty but I can work around it.
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tllmserveRay Serve Related IssueRay Serve Related IssuestabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)