-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[Bugfix] Adopt the new changes on disaggregated pd from vllm main branch #2122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |
| # Adapted from vllm-project/vllm/vllm/worker/gpu_worker.py | ||
| # | ||
|
|
||
| import copy | ||
| from typing import Optional | ||
|
|
||
| import torch | ||
|
|
@@ -27,15 +28,16 @@ | |
| from vllm.config import VllmConfig | ||
| from vllm.distributed import (ensure_model_parallel_initialized, | ||
| init_distributed_environment) | ||
| from vllm.distributed.kv_transfer import ensure_kv_transfer_initialized | ||
| from vllm.distributed.kv_transfer import (ensure_kv_transfer_initialized, | ||
| has_kv_transfer_group) | ||
| from vllm.distributed.parallel_state import get_pp_group, get_tp_group | ||
| from vllm.logger import logger | ||
| from vllm.lora.request import LoRARequest | ||
| from vllm.sequence import IntermediateTensors | ||
| from vllm.utils import STR_DTYPE_TO_TORCH_DTYPE, GiB_bytes | ||
| from vllm.v1.core.sched.output import SchedulerOutput | ||
| from vllm.v1.kv_cache_interface import KVCacheConfig, KVCacheSpec | ||
| from vllm.v1.outputs import ModelRunnerOutput | ||
| from vllm.v1.outputs import EMPTY_MODEL_RUNNER_OUTPUT, ModelRunnerOutput | ||
| from vllm.v1.worker.worker_base import WorkerBase | ||
|
|
||
| from vllm_ascend.ascend_config import init_ascend_config | ||
|
|
@@ -204,9 +206,18 @@ def execute_model( | |
| assert isinstance(output, IntermediateTensors) | ||
| get_pp_group().send_tensor_dict(output.tensors, | ||
| all_gather_group=get_tp_group()) | ||
| return None | ||
| if not has_kv_transfer_group(): | ||
| return None | ||
|
|
||
| new_output = EMPTY_MODEL_RUNNER_OUTPUT | ||
| if output.finished_sending or output.finished_recving: | ||
| new_output = copy.copy(new_output) | ||
| new_output.finished_sending = output.finished_sending | ||
| new_output.finished_recving = output.finished_recving | ||
| output = new_output | ||
|
|
||
| assert isinstance(output, ModelRunnerOutput) | ||
| return output if self.is_driver_worker else None | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can this be removed? I suppose this is usefull in spmd case
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. vllm's main branch removed this in https://github.com/vllm-project/vllm/pull/21473/files , now every tp return its own result, so the tp scenario will encounter the None input case with our previous code
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. emm, but it may break PP function I guess. @MengqingCao can you take a check as well?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is is ok with pp, LGTM
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Double checked with contributor for PP and RL case. It's OK to do the change |
||
| return output | ||
|
|
||
| def load_model(self) -> None: | ||
| if self.vllm_config.model_config.enable_sleep_mode: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM now use
kv_connector_metadatato store finished_sending and finished_receiving, @Potabk will fix it in a new PR