Skip to content

add DpCoordinator LoadBalancer#3

Open
chickeyton wants to merge 1 commit intomainfrom
data_parallel
Open

add DpCoordinator LoadBalancer#3
chickeyton wants to merge 1 commit intomainfrom
data_parallel

Conversation

@chickeyton
Copy link
Copy Markdown
Owner

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@chickeyton
Copy link
Copy Markdown
Owner Author

OmniServer stopping...
(Worker pid=105749) INFO 04-09 03:11:59 [multiproc_executor.py:764] Parent process exited, terminating worker queues
(Worker pid=105749) INFO 04-09 03:11:59 [multiproc_executor.py:859] WorkerProc shutting down.
(APIServer pid=105156) WARNING 04-09 03:12:05 [stage_diffusion_client.py:257] StageDiffusionProc was killed by signal 15; treating as external shutdown.
(APIServer pid=105156) INFO 04-09 03:12:09 [omni_base.py:290] [AsyncOmni] Shutting down
(APIServer pid=105156) INFO 04-09 03:12:09 [async_omni_engine.py:1293] [AsyncOmniEngine] Shutting down Orchestrator
(APIServer pid=105156) INFO 04-09 03:12:09 [orchestrator.py:212] [Orchestrator] Received shutdown signal
(APIServer pid=105156) INFO 04-09 03:12:09 [orchestrator.py:885] [Orchestrator] Shutting down all stages
(APIServer pid=105156) INFO 04-09 03:12:09 [orchestrator.py:889] [Orchestrator] Stage 0 shut down
(APIServer pid=105156) INFO 04-09 03:12:09 [orchestrator.py:889] [Orchestrator] Stage 1 shut down
(APIServer pid=105156) INFO 04-09 03:12:09 [launcher.py:137] Shutting down FastAPI HTTP server.
(APIServer pid=105156) INFO 04-09 03:12:09 [omni_base.py:290] [AsyncOmni] Shutting down
(APIServer pid=105156) INFO: Shutting down
(APIServer pid=105156) INFO: Waiting for application shutdown.
(APIServer pid=105156) INFO: Application shutdown complete.
(APIServer pid=105156) Traceback (most recent call last):
(APIServer pid=105156) File "", line 198, in _run_module_as_main
(APIServer pid=105156) File "", line 88, in _run_code
(APIServer pid=105156) File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/entrypoints/cli/main.py", line 63, in
(APIServer pid=105156) main()
(APIServer pid=105156) File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/entrypoints/cli/main.py", line 57, in main
(APIServer pid=105156) args.dispatch_function(args)
(APIServer pid=105156) File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/entrypoints/cli/serve.py", line 94, in cmd
(APIServer pid=105156) uvloop.run(omni_run_server(args))
(APIServer pid=105156) File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/uvloop/init.py", line 96, in run
(APIServer pid=105156) return __asyncio.run(
(APIServer pid=105156) ^^^^^^^^^^^^^^
(APIServer pid=105156) File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
(APIServer pid=105156) return runner.run(main)
(APIServer pid=105156) ^^^^^^^^^^^^^^^^
(APIServer pid=105156) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=105156) return self._loop.run_until_complete(task)
(APIServer pid=105156) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=105156) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=105156) File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=105156) return await main
(APIServer pid=105156) ^^^^^^^^^^
(APIServer pid=105156) File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/entrypoints/openai/api_server.py", line 268, in omni_run_server
(APIServer pid=105156) await omni_run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=105156) File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/entrypoints/openai/api_server.py", line 356, in omni_run_server_worker
(APIServer pid=105156) app.state.openai_serving_speech.shutdown()
(APIServer pid=105156) ^^^^^^^^^
(APIServer pid=105156) AttributeError: 'FastAPI' object has no attribute 'state'

@chickeyton
Copy link
Copy Markdown
Owner Author

(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] StageEngineCoreProc failed to start.
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] Traceback (most recent call last):
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/engine/stage_engine_core_proc.py", line 86, in run_stage_core
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] engine_core = StageEngineCoreProc(
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] return func(*args, **kwargs)
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in init
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] super().init(
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in init
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] self.model_executor = executor_class(vllm_config)
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] return func(*args, **kwargs)
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in init
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] self._init_executor()
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/executor/uniproc_executor.py", line 47, in _init_executor
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] self.driver_worker.init_device()
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] self.worker.init_device() # type: ignore
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] return func(*args, **kwargs)
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/worker/gpu_ar_worker.py", line 102, in init_device
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] self.model_runner = GPUARModelRunner(self.vllm_config, self.device)
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/worker/gpu_ar_model_runner.py", line 73, in init
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] super().init(*args, **kwargs)
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/worker/gpu_model_runner.py", line 42, in init
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] super().init(*args, **kwargs)
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 781, in init
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] MultiModalBudget(self.vllm_config, self.mm_registry)
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/multimodal/encoder_budget.py", line 117, in init
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] encoder_compute_budget, encoder_cache_size = compute_mm_encoder_budget(
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/core/encoder_cache_manager.py", line 302, in compute_mm_encoder_budget
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] raise ValueError(
(StageEngineCoreProc pid=165894) ERROR 04-10 02:31:48 [stage_engine_core_proc.py:98] ValueError: Chunked MM input disabled but max_tokens_per_mm_item (8625) is larger than max_num_batched_tokens (8192). Please increase max_num_batched_tokens.
(StageEngineCoreProc pid=165894) Process StageEngineCoreProc:

@chickeyton
Copy link
Copy Markdown
Owner Author

(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] WorkerProc failed to start.
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] Traceback (most recent call last):
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs)
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] return func(*args, **kwargs)
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] self.worker.init_device()
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] self.worker.init_device() # type: ignore
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] return func(*args, **kwargs)
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/worker/gpu_ar_worker.py", line 55, in init_device
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] assert self.parallel_config.local_world_size <= visible_device_count, (
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184725) ERROR 04-10 07:38:22 [multiproc_executor.py:857] AssertionError: local_world_size (2) must be less than or equal to the number of visible devices (1).
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] WorkerProc failed to start.
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] Traceback (most recent call last):
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs)
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] return func(*args, **kwargs)
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] self.worker.init_device()
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] self.worker.init_device() # type: ignore
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/env_b/.venv/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] return func(*args, **kwargs)
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] File "/nvme4n1/n00645750/repos/github/chickeyton/vllm-omni_refine_2006/vllm_omni/worker/gpu_ar_worker.py", line 51, in init_device
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] assert self.local_rank < torch.accelerator.device_count(), (
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=184726) ERROR 04-10 07:38:22 [multiproc_executor.py:857] AssertionError: DP adjusted local rank 1 is out of bounds.

@chickeyton
Copy link
Copy Markdown
Owner Author

● Here's a summary of all resolved conflicts:

  1. tests/engine/test_orchestrator.py (line 74)
  • Resolution: Merged both parameters into signature: process_engine_inputs(self, stage_list, prompt=None, streaming_context=None, source_client=None)
  1. vllm_omni/engine/async_omni_engine.py (2 conflicts)
  • Imports (line 86-87): Kept both StagePool and PDDisaggregationMixin imports
  • Orchestrator constructor (line 1031-1038): Passes pd_config=pd_config (from HEAD's PD disaggregation feature) but drops stage_clients, output_processors,
    stage_vllm_configs since those are now accessed through stage_pools (from zwg's StagePool architecture)
  1. vllm_omni/engine/orchestrator.py (6 conflicts + 2 fixes)
  • Dataclass fix (line 109-140): Moved chosen_replica: dict[int, StageReplica] from StreamingInputState to OrchestratorRequestState -- it's a per-request field, and
    StagePool.select_replica() accesses it as req_state.chosen_replica, not req_state.streaming.chosen_replica
  • _route_output (line 464-481): Uses stage_replica (StagePool) while keeping HEAD's streaming session logic (two calls for non-final + final update)
  • _forward_to_next_stage body (line 707-712): Uses StagePool-based next_pool.select_replica() and keeps next_stage_resumable for streaming
  • PD disaggregation block (line 750-796): Preserved full PD prefill-decode routing logic, converted all self.stage_clients[i] / self.stage_vllm_configs[i] /
    self.output_processors[i] references to use stage_replica.client / next_replica.vllm_config / next_replica.output_processor
  • process_engine_inputs call (line 801-806): Passes both streaming_context and source_client
  • build_engine_core_request_from_tokens (line 826-828): Uses next_replica.vllm_config.model_config (StagePool) while keeping mm_features and resumable args (HEAD)
  • _handle_add_request (line 924-934): Sets req_state.streaming.enabled = is_streaming (HEAD) and uses stage_pools[stage_id] with select_replica() (zwg)
  • Non-conflict fix (line 820): Changed next_client to next_replica.client -- next_client was only defined in HEAD's code path and would be a NameError after resolution
  1. vllm_omni/engine/stage_engine_core_client.py (line 305-306)
  • Resolution: Merged both parameters: streaming_context: Any | None = None, source_client: Any | None = None. The method body already uses both -- streaming_context for
    the custom processor streaming path, source_client for multi-replica upstream client selection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant