Skip to content

[Core]Add GPU Diffusion Runner#822

Merged
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
princepride:add-gpu-diffusion-runner
Jan 17, 2026
Merged

[Core]Add GPU Diffusion Runner#822
hsliuustc0106 merged 3 commits into
vllm-project:mainfrom
princepride:add-gpu-diffusion-runner

Conversation

@princepride
Copy link
Copy Markdown
Collaborator

@princepride princepride commented Jan 16, 2026

Purpose

Related: #800

This PR refactors the GPU diffusion worker architecture to improve code organization and maintainability:

  • Separated model runner logic: Extracted GPUDiffusionModelRunner from GPUDiffusionWorker to follow the separation of concerns principle
  • Improved naming consistency: Renamed gpu_worker.pygpu_diffusion_worker.py and test_gpu_worker.pytest_gpu_diffusion_worker.py for better clarity
  • Adjust NPU worker: Updated npu_worker.py to align with the new architecture and add missing functionality
  • Added comprehensive unit tests: Implemented detailed tests for load_weights, sleep, and wake_up methods with proper mocking

Test Plan

Unit Test

pytest tests/diffusion/test_gpu_diffusion_worker.py -v

Result:

============================================ test session starts ============================================
platform linux -- Python 3.13.11, pytest-9.0.2, pluggy-1.6.0 -- /proj-tango-pvc/users/zhipeng.wang/workspace/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /proj-tango-pvc/users/zhipeng.wang/workspace/vllm-omni
configfile: pyproject.toml
plugins: cov-7.0.0, anyio-4.12.1
collected 8 items                                                                                           

tests/diffusion/test_gpu_diffusion_worker.py::TestGPUDiffusionWorkerLoadWeights::test_load_weights_calls_pipeline PASSED [ 12%]
tests/diffusion/test_gpu_diffusion_worker.py::TestGPUDiffusionWorkerLoadWeights::test_load_weights_empty_iterable PASSED [ 25%]
tests/diffusion/test_gpu_diffusion_worker.py::TestGPUDiffusionWorkerSleep::test_sleep_level_1 PASSED  [ 37%]
tests/diffusion/test_gpu_diffusion_worker.py::TestGPUDiffusionWorkerSleep::test_sleep_level_2 PASSED  [ 50%]
tests/diffusion/test_gpu_diffusion_worker.py::TestGPUDiffusionWorkerSleep::test_sleep_memory_freed_validation PASSED [ 62%]
tests/diffusion/test_gpu_diffusion_worker.py::TestGPUDiffusionWorkerWakeUp::test_wake_up_without_buffers PASSED [ 75%]
tests/diffusion/test_gpu_diffusion_worker.py::TestGPUDiffusionWorkerWakeUp::test_wake_up_with_buffers PASSED [ 87%]
tests/diffusion/test_gpu_diffusion_worker.py::TestGPUDiffusionWorkerWakeUp::test_wake_up_partial_buffer_restore PASSED [100%]

Test Run Diffusion Model

python examples/offline_inference/text_to_image/text_to_image.py

Result:

image

Signed-off-by: princepride <wangzhipeng628@gmail.com>
@princepride
Copy link
Copy Markdown
Collaborator Author

@ZJY0516 @hsliuustc0106

@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Jan 16, 2026

Hi! Could we please wait for #774, which micro-refactors diffusion_worker to be hardware-agnostic? Then we don't need to modify the platform_utils and npu_worker. And the gpu word will be removed. I'd like to make #774 merged first, but it's also okay to merge this one first. I have some concerns about that #774 is becoming larger and larger.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Hi! Could we please wait for #774, which micro-refactors diffusion_worker to be hardware-agnostic? Then we don't need to modify the platform_utils and npu_worker. And the gpu word will be removed. I'd like to make #774 merged first, but it's also okay to merge this one first. I have some concerns about that #774 is becoming larger and larger.

I think #774 may need more discussions

Copy link
Copy Markdown
Member

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread vllm_omni/diffusion/worker/gpu_diffusion_worker.py
@ZJY0516 ZJY0516 requested a review from SamitHuang January 17, 2026 02:38
@ZJY0516 ZJY0516 added the ready label to trigger buildkite CI label Jan 17, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Comment thread vllm_omni/diffusion/worker/gpu_diffusion_worker.py
destroy_distributed_env()


class WorkerProc:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is its function similar to executor? Not now, but do we have plan to refractor it as executor in the future?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZJY0516 What do you think?

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

any speed difference before and after this PR?

@princepride
Copy link
Copy Markdown
Collaborator Author

any speed difference before and after this PR?

I am testing it.

@princepride
Copy link
Copy Markdown
Collaborator Author

I use this script python examples/offline_inference/text_to_image/text_to_image.py compare the speed on H200, the average e2e time of original version is 15236ms, and the current version is 15233ms.

Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@hsliuustc0106 hsliuustc0106 merged commit 36c2876 into vllm-project:main Jan 17, 2026
7 checks passed
erfgss pushed a commit to erfgss/vllm-omni that referenced this pull request Jan 19, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: Chen Yang <2082464740@qq.com>
with1015 pushed a commit to with1015/vllm-omni that referenced this pull request Jan 20, 2026
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants