[Bugfix]Hunyuan image3 ar batch sampler#3589
Conversation
Signed-off-by: chickeyton <ngton2014@gmail.com>
LeastQueueLengthBalancer relies on heartbeats as the only periodic source of live load, because StageEngineCoreProc/StageDiffusionProc refresh ``queue_length`` just-in-time via the ``_on_heartbeat`` hook before each heartbeat send. The coordinator's heartbeat handler was only updating ``last_heartbeat`` though, so it kept publishing the initial queue_length (usually 0) and the least-queue policy could pick busy replicas as if they were idle. Copy ``event.queue_length`` into ``info.queue_length`` on heartbeat events and request a broadcast when it changes so subscribers see fresh load promptly. Coalescing in the periodic loop keeps the wire traffic bounded. Also corrects the now-outdated docstring on ``_send_event`` that claimed heartbeats sent ``queue_length=null``. Signed-off-by: chickeyton <ngton2014@gmail.com>
- Drop unused ``vllm_config`` local in ``StageEngineCoreProc.run_stage_core`` (F841); the comment about the removed hardcoded data_parallel_size is retained. - Wrap the long ``[Headless] Launching ... OmniMasterServer`` log line in serve.py to keep it under the 120-char limit (E501). - Reflow multi-line ``raise`` / ``logger`` calls that fit on one line per ``ruff format`` rules in stage_diffusion_proc, async_omni_engine, omni_coord_client_for_hub, omni_core_engine_proc_manager, orchestrator, stage_engine_core_proc and serve. Signed-off-by: chickeyton <ngton2014@gmail.com>
Signed-off-by: chickeyton <ngton2014@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3e687ca1ff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def compute_replica_layout( | ||
| stage_configs: Sequence[Any], | ||
| *, | ||
| allow_zero: bool = False, |
There was a problem hiding this comment.
Honor zero-replica stages in head mode
This adds an allow_zero mode for the documented head-distributed case where non-self stages are filled by dynamic registrations, but the new parameter defaults to False and the head initialization still calls compute_replica_layout(self.stage_configs) without overriding it. As a result, a stage configured with runtime.num_replicas: 0 is still clamped to one remote replica, so the head pre-allocates and waits for a registration instead of starting with an empty pool and attaching later.
Useful? React with 👍 / 👎.
| Task, | ||
| ) | ||
| from .messages import InstanceEvent, InstanceInfo, InstanceList, StageStatus | ||
| from .messages import ReplicaEvent, ReplicaInfo, ReplicaList, StageStatus |
There was a problem hiding this comment.
Preserve the coordinator Instance aliases
Removing the exported InstanceEvent / InstanceInfo / InstanceList names breaks existing coordinator imports in this repo (for example the tests/distributed/omni_coordinator tests still import InstanceInfo and InstanceList) and any downstream code using the public package export. Unless all call sites are migrated in the same change, keep aliases to the new Replica* classes so those imports continue to work.
Useful? React with 👍 / 👎.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR aims to remove the assertion of batch_size in the sampler of Hunyuan-Image-3.0. It enables single ar and multiple DiTs.
Test Plan
After merging pr #3569, set the deploy yaml like:
I command concurrent requests 3 times in a row.
Test Result
The output shows:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)