-
Notifications
You must be signed in to change notification settings - Fork 1k
Run omni with latest vllm commit 1b6cb920e6ebcac57154e6154578c39d4892a16c #2182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -341,7 +341,7 @@ def _launch_llm_stage( | |
| log_stats=False, | ||
| addresses=addresses, | ||
| ) | ||
| engine_manager, coordinator, addresses = launch_cm.__enter__() | ||
| engine_manager, coordinator, addresses, _ = launch_cm.__enter__() | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this 4th value (tensor_queue) actually be wired into |
||
| started_stage = StartedLlmStage( | ||
| stage_id=metadata.stage_id, | ||
| metadata=metadata, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -704,11 +704,13 @@ def _dummy_run( | |
| seq_lens = [1] * num_decode_tokens + [num_prefill_tokens + 1] # type: ignore[assignment] | ||
| else: | ||
| seq_lens = max_query_len # type: ignore[assignment] | ||
| self.seq_lens.np[:num_reqs] = seq_lens | ||
| self.seq_lens.np[num_reqs:] = 0 | ||
| self.seq_lens.copy_to_gpu() | ||
| self.optimistic_seq_lens_cpu[:num_reqs] = seq_lens | ||
| self.optimistic_seq_lens_cpu[num_reqs:].fill_(0) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing a vllm dependency bump — |
||
| self.seq_lens.copy_(self.optimistic_seq_lens_cpu, non_blocking=True) | ||
|
|
||
| cum_num_tokens, _ = self._get_cumsum_and_arange(num_scheduled_tokens) | ||
| cum_num_tokens = self._get_cumsum_and_arange( | ||
| num_scheduled_tokens, self.query_pos.np | ||
| ) | ||
| self.query_start_loc.np[1 : num_reqs + 1] = cum_num_tokens | ||
| self.query_start_loc.copy_to_gpu() | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
launch_core_engines()now yields a fourth value (tensor_queue), but this code discards it (_) and never propagates it intoStartedLlmStage/client_addressesforStageEngineCoreClient. In the new vLLM path, that queue is what enablesAsyncMPClientto set up out-of-band tensor IPC for multimodal payloads; dropping it forces fallback serialization for tensor data and can cause major latency/memory regressions for multimodal requests (especially whenmm_tensor_ipc=torch_shmis configured).Useful? React with 👍 / 👎.