[feat]: General diffusers adapter backend to run diffusion models by fhfuih · Pull Request #2724 · vllm-project/vllm-omni

fhfuih · 2026-04-13T06:44:07Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Fulfill #2403

Test Plan

L1 Unit test (no GPU):

DiffusersAdapterPipeline can forward underlying DiffusionPipeline's output
DiffusersAdapterPipeline raise when the user requests (yet) unsupported diffusion features
When an arbitrary data structure is returned by the underlying DiffusionPipeline （typically when calling call with return_dict=True）, wrap this output as-is with our diffusionoutput
DiffusersAdapterPipeline can correctly forward call arguments
L2 test with GPU:
Running random Qwen Image model with diffusers pipeline

Test Result

Passed on my side

Release Note

Support running diffusion models with diffusers backend. Turn this feature on with --diffusion-load-format diffusers. Check out the documentation here

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

hsliuustc0106 · 2026-04-13T09:23:01Z

🔴 Pre-commit gate failing. Fix before requesting review.

This is a substantial PR (1445 LOC, 12 files). After fixing gates, please run L3 tests locally and paste results in PR description:

https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/test_guide/#l3-level--l4-level

Test Plan/Test Result sections show "Drafting" - need concrete validation evidence for this feature.

sayakpaul

Thanks a lot for the RFC. @DN6 please take a look as well.

Do we think http://hf.co/blog/modular-diffusers would be a better base candidate for the integration?

sayakpaul · 2026-04-15T12:26:13Z

+
+Any model loadable via `DiffusionPipeline.from_pretrained()` is supported, including:
+
+- **Text-to-Image:** SD 1.5, SD 2.1, SDXL, PixArt-Σ, Kandinsky, DeepFloyd IF


We could probably mention more recent variants such as Flux, QwenImage, etc.?

sayakpaul · 2026-04-15T12:27:00Z

+The diffusers backend is a black-box adapter. The following features are NOT supported:
+
+- CFG parallel execution
+- Sequence parallel execution


We should be able to depend on Diffusers' extensive CP support for this no?
https://huggingface.co/docs/diffusers/main/en/training/distributed_inference#context-parallelism

Thanks for the info! I thought it was done only externally by xdit. But for these parallelism features, I will also need to confirm whether it plays well with our architecture

We do support CP natively :)

sayakpaul · 2026-04-15T12:28:43Z

+
+- CFG parallel execution
+- Sequence parallel execution
+- TeaCache / Cache-DiT acceleration


https://huggingface.co/docs/diffusers/main/en/optimization/cache

CacheDiT is supported too: https://github.com/vipshop/cache-dit?tab=readme-ov-file#quick-start-cache-parallelism-and-quantization

TeaCache is incoming: huggingface/diffusers#12652

Cc: @DN6 we should probably prioritize that PR?

Thanks for the clarification. I also learned that it is possible to turn on these features. Apart from Cache-DIT, there seem to be also:

dtype & quantization

cpu offloading

Attention backend

VAE sliding and tiling

Torch compile (eagerness)

Yup.

Then there's this concept of regional compilation which provides a trade-off:
https://pytorch.org/blog/torch-compile-and-diffusers-a-hands-on-guide-to-peak-performance/

TeaCache is incoming: huggingface/diffusers#12652

Cc: @DN6 we should probably prioritize that PR?

You can take your time on your TeaCache support :)

After a careful study of both codebases, I think the support for caching in the adapter layer is non-trivial. It can be deferred to a later PR. Put some notes here #2403 (comment)

sayakpaul · 2026-04-15T12:49:08Z

+    # Step-wise execution — explicitly rejected
+    # ------------------------------------------------------------------
+
+    def prepare_encode(self, state: Any, **kwargs: Any) -> Any:


Well, our pipelines are implemented in a way, where we could compute text encodings and then use the precomputed text encodings for denoising + decoding.

Thanks for the info. The "Step-wise execution" is a new experimetal feature on our side. Glad to know that you also have this. Do you mean the Modular Blocks https://huggingface.co/docs/diffusers/main/en/modular_diffusers/quickstart ?

fhfuih · 2026-04-17T01:36:56Z

Thanks @DN6 for adding to the review! Based on both of you's review, I have added some discussions on the feature support and adaptation to the companion issue page #2403 accordingly.

Maybe I can also paste it here:

Doable in the first PR

To avoid unnecessary complication, only support an optimization feature if

It can be turned on with a simple toggle/config of the pipeline object
It's main business logic (if only) is within the boundary of a pipeline object, both for diffusers and for vllm-omni

Specifically, these optimization toggles can be added:

Setting model dtype: trivial pipeline load-time configuration
VAE slicing and tiling: trial pipeline load-time configuration
Attention Backend: vllm-omni defines external attention classes, creates and calls them inside pipeline's transformer modules. A complete pipeline rewrite skips vllm-omni attention utilities, and we can forward attention backend configuration to underlying diffusers pipeline classes.

Optionally, I can figure out how to integrate the Modular Pipeline to check input/output modalities.

Deferred to sequel PRs

torch.compile: finding and configuring transformer blocks. vllm-omni enables it in model runner (direct wrapper for pipelines). The logic is straightforward: looking for transformers and transformers_2 blocks and torch.compile them. Need to test whether model data structures and the same, and these blocks are discoverable using the current implementation.
CacheDiT: vllm-omni enables caching (calling cache_dit.enable_cache(pipe)) in model runner (wrapper for pipelines), and our CacheDiTBackend wraps cache_dit library with extra validation routines. Theoretically, once DiffusionModelRunner::pipe is loaded as a diffusers pipeline, the logic to enable CacheDiT is already there. But testing & validation is definitely required.
Quant: The load-time quantization logic is different. vllm-omni loads the model weights into our customized XPipeline classes, and then post-process quantization configuration. For diffusers, it is a bundled config kwarg at from_pretrained. Enabling it would require modifications to the DiffusersPipelineLoader.
Context Parallel ("Sequence Parallel" in vllm-omni): vllm-omni borrows the hook system from diffusers and CP implementation. The hooks are applied during model weight-loading time. To enable it, we need to either (1) check if the current SP is compatible with diffusers pipeline format, or (2) route away vllm-omni SP implementation, enable diffuser's CP, and see if the parallelism plays well with vllm-omni's higher-level orchestration layers.
"Block"-wise inference ("Step"-wise execution in vllm-omni): The new diffusers Modular Pipeline also supports this. But since the step-wise execution in vllm-omni is even more experimental, this can be deferred to a future PR.

fhfuih · 2026-04-20T00:57:45Z

TODO list for this draft

Adapt GPU device settings
Remove irrelevant load-time and run-time settings
Add basic profiling timing
Go over the AI-generated tests
Attach image output, profiling info, and memory traces to this PR

chatgpt-codex-connector · 2026-04-21T02:06:45Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a general Hugging Face Diffusers adapter backend so vLLM-Omni can serve arbitrary DiffusionPipeline.from_pretrained() models via a new diffusers diffusion load format.

Changes:

Added DiffusersAdapterPipeline plus loader/registry/config wiring to enable diffusion_load_format=diffusers.
Exposed CLI + stage-config knobs to pass through from_pretrained() and pipeline.__call__() kwargs.
Added unit + e2e coverage and an online serving example for the adapter workflow.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
vllm_omni/entrypoints/cli/serve.py	Adds CLI flags for selecting the diffusers load format + passing JSON kwargs.
vllm_omni/engine/async_omni_engine.py	Plumbs diffusers kwargs into the default diffusion stage config.
vllm_omni/diffusion/worker/diffusion_model_runner.py	Changes default `load_format` handling for diffusion runner.
vllm_omni/diffusion/registry.py	Registers `DiffusersAdapterPipeline` in the diffusion model registry.
vllm_omni/diffusion/models/diffusers_adapter/pipeline_diffusers_adapter.py	Implements the black-box adapter around `DiffusionPipeline`.
vllm_omni/diffusion/models/diffusers_adapter/init.py	Exposes adapter pipeline symbol for imports.
vllm_omni/diffusion/model_loader/diffusers_loader.py	Adds loader branch to construct/load the adapter pipeline.
vllm_omni/diffusion/data.py	Adds config fields + validation + enrich behavior for diffusers adapter.
tests/e2e/online_serving/test_diffusers_adapter.py	E2E coverage for serving and calling a diffusers-backed model.
tests/diffusion/test_diffusers_adapter.py	Unit tests for adapter guards, kwargs mapping, and output wrapping.
examples/online_serving/diffusers_pipeline_adapter/stage_config.yaml	Example stage config enabling the diffusers adapter.
examples/online_serving/diffusers_pipeline_adapter/README.md	Usage docs and limitations for the adapter workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fhfuih · 2026-04-21T02:43:34Z

Notes

Feature (Optimization+Parallelism) Support

This PR only enables basic backend adaptation and simplest feature toggle. As is listed in the RFC, some features forwarding may be deferred to later PRs. The above comments on these deferred features are deliberately not resolved---for future reference.

YAML Config

#2383 and #2887 was working on YAML config system refactoring. Seems the "new" config system still has some problems with passing diffusion-specific configurations. Since this config is also continuing, and the old config system works well, this PR still uses the old config system (i.e., "stage config" instead of "deploy config"). After the continuing work on the config system, relevant content here can be updated later.

Perf

Running Qwen-Image With vllm-omni + diffusers backend:

vllm serve /data/models/Qwen/Qwen-Image --stage-configs-path examples/online_serving/diffusers_pipeline_adapter/stage_config.yaml --omni --port 12345 --enable-diffusion-pipeline-profiler

Note that the diffusers backend is a black box. We can only get one total time. Everything is counted in forward.

INFO 04-21 10:07:55 [diffusion_pipeline_profiler.py:31] [DiffusionPipelineProfiler] DiffusersAdapterPipeline.forward took 2.684139s
INFO 04-21 10:07:55 [diffusion_model_runner.py:213] Peak GPU memory (this request): 55.25 GB reserved, 54.84 GB allocated, 0.41 GB pool overhead (0.7%)
(APIServer pid=741892) INFO 04-21 10:07:55 [diffusion_engine.py:126] Generation completed successfully.
(APIServer pid=741892) INFO 04-21 10:07:55 [diffusion_engine.py:173] Post-processing completed in 0.0000 seconds
(APIServer pid=741892) INFO 04-21 10:07:55 [diffusion_engine.py:176] DiffusionEngine.step breakdown: preprocess=0.00 ms, add_req_and_wait=2689.99 ms, postprocess=0.00 ms, total=2690.32 ms

Running with native backend:

vllm serve Qwen/Qwen-Image --omni --port 12345 --enable-diffusion-pipeline-profiler

INFO 04-21 10:31:43 [diffusion_pipeline_profiler.py:31] [DiffusionPipelineProfiler] QwenImagePipeline.text_encoder.forward took 0.333929s
INFO 04-21 10:31:45 [diffusion_pipeline_profiler.py:31] [DiffusionPipelineProfiler] QwenImagePipeline.diffuse took 1.965977s
INFO 04-21 10:31:45 [diffusion_pipeline_profiler.py:31] [DiffusionPipelineProfiler] QwenImagePipeline.vae.decode took 0.034542s
INFO 04-21 10:31:45 [diffusion_pipeline_profiler.py:31] [DiffusionPipelineProfiler] QwenImagePipeline.forward took 2.341323s
INFO 04-21 10:31:45 [diffusion_model_runner.py:213] Peak GPU memory (this request): 55.11 GB reserved, 54.79 GB allocated, 0.32 GB pool overhead (0.6%)
(APIServer pid=748505) INFO 04-21 10:31:45 [diffusion_engine.py:126] Generation completed successfully.
(APIServer pid=748505) INFO 04-21 10:31:45 [diffusion_engine.py:173] Post-processing completed in 0.0176 seconds
(APIServer pid=748505) INFO 04-21 10:31:45 [diffusion_engine.py:176] DiffusionEngine.step breakdown: preprocess=0.00 ms, add_req_and_wait=2369.65 ms, postprocess=17.61 ms, total=2387.85 ms

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> unit test for diffusers pipeline argument passing Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> L2 e2e test (random weight model, only e2e infer) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> BUGFIX Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> bugfix Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> ensure pipeline device is correctly set Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> fix generator not set if seed not present Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> adjust doc Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> change test model Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> improve type check Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> fix wrong function call Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> attn backend not read Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> revert irrelevant changes Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> typo Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> add hardware mark Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> optimize CLI deault arg per AI comment Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com> revert irrelevant changes Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

fhfuih · 2026-04-21T06:39:38Z

@hsliuustc0106 @Gaohan123 This PR is ready, PTAL and add a ready tag. Thanks

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

SamitHuang

LGTM

hsliuustc0106 · 2026-04-22T17:13:04Z

@@ -0,0 +1,31 @@
+# Example stage config for diffusers backend


when we are going to rm this yaml?

I previously could not successfully forward some diffusion engine_args under the new config system (from deploy yaml to OmniDiffusionConfig). I planned to wait for #2987. But saw it just closed yesterday. I can look further into this, see if I can somehow get the new config system working

hsliuustc0106 · 2026-04-22T17:16:10Z

I didn't see any test results here: how do vllm-omni successfully serve a model using diffusion backend? Have you caompre the acc/perf compared with diffusers? how many models are supported now? is there any doc upates about these infos. Correct me if I miss something, otherwise, I suggest to revert this PR

hsliuustc0106 · 2026-04-22T17:16:22Z

cc @Gaohan123 @SamitHuang

hsliuustc0106 · 2026-04-22T17:18:26Z

+vllm serve "stable-diffusion-v1-5/stable-diffusion-v1-5" \
+    --omni \
+    --diffusion-load-format diffusers \
+    --diffusers-load-kwargs '{"use_safetensors": true}' \


can we reuse the kwargs from vllm serve cli args instead of introducing 3 more args? I suggest to only keep one

--diffusion-load-format is already there. I reuse it and add a new value. --diffusers-load-kwargs and --diffusers-call-kwargs are pass-throughs so that when a specific model has any niche parameters, users have a fallback way to set them

fhfuih · 2026-04-23T01:29:15Z

I didn't see any test results here: how do vllm-omni successfully serve a model using diffusion backend? Have you caompre the acc/perf compared with diffusers? how many models are supported now? is there any doc upates about these infos. Correct me if I miss something, otherwise, I suggest to revert this PR

I have a perf compare with vllm-omni above. Indeed there is no acc/perf comparison with diffusers, or the model coverage. I will work on them now and attach them later.

…lm-project#2724) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

fhfuih · 2026-04-23T03:32:22Z

@hsliuustc0106 Quick update of a test with Qwen Image:

Accuracy

Diffusers backend	Diffusers lib

>>> a = Image.open('diffusers-lib-output.png')
>>> b = Image.open('diffusers-backend-output.png')
>>> compute_image_ssim_psnr(prediction=a, reference=b)
(1.0, inf)

Perf

	Load time	Generation time
Diffusers backend	177.892670 s	5086.76 ms
Diffusers lib	203.10 seconds	14.63 seconds

Somehow, our backend is even faster than the bare bone library 🤔

The setup is as follow

Ours (the preprocess and postprocess are all included in the forward run)

> vllm serve /data/models/Qwen/Qwen-Image --omni --port 12345 --enable-diffusion-pipeline-profiler --diffusion-load-format diffusers --diffusers-load-kwargs '{"use_safetenors": true}'

...With logging, it uses `_native_flash` in my environment
...
INFO 04-23 10:15:27 [diffusion_model_runner.py:142] Model loading took 53.7914 GiB and 177.892670 seconds

> python examples/online_serving/text_to_image/openai_chat_client.py \
    --server http://127.0.0.1:12345 \
    --negative 'angry facial expression' \
    --steps 20 \
    --height 512 \
    --width 512  \
    --seed 40 \
    --prompt 'a cat wearing furry bee costume and enjoying a cup of honey water' \
    --output 'diffusers-output.png'

INFO 04-23 10:15:28 [diffusion_worker.py:183] Worker 0: Process-scoped GPU memory after model loading: 54.49 GiB.
INFO 04-23 10:18:58 [diffusion_pipeline_profiler.py:31] [DiffusionPipelineProfiler] DiffusersAdapterPipeline.forward took 5.080019s
INFO 04-23 10:18:58 [diffusion_model_runner.py:213] Peak GPU memory (this request): 55.25 GB reserved, 54.84 GB allocated, 0.41 GB pool overhead (0.7%)
...
(APIServer pid=875076) INFO 04-23 10:18:58 [diffusion_engine.py:176] DiffusionEngine.step breakdown: preprocess=0.00 ms, add_req_and_wait=5086.49 ms, postprocess=0.00 ms, total=5086.76 ms

Diffusers (following our defaults of bfloat16 dtype and use_safetensors)

import time

from diffusers import QwenImagePipeline
import torch

start_time = time.perf_counter()
pipe = QwenImagePipeline.from_pretrained("/data/models/Qwen/Qwen-Image", torch_dtype=torch.bfloat16, use_safetensors=True)
pipe.to("cuda")
pipe.transformer.set_attention_backend("_native_flash")
end_time = time.perf_counter()
print(f"Diffusers pipeline loading time: {end_time - start_time:.2f} seconds")

with torch.inference_mode():
    start_time = time.perf_counter()
    image = pipe(
        'a cat wearing furry bee costume and enjoying a cup of honey water',
        num_inference_steps=20,
        negative_prompt='angry facial expression',
        height=512,
        width=512,
        generator=torch.Generator("cuda").manual_seed(40),
    ).images[0]
    end_time = time.perf_counter()
image.save("diffusers-lib-output.png")
print(f"Diffusers pipeline execution time: {end_time - start_time:.2f} seconds")

I'll

…lm-project#2724) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

fhfuih mentioned this pull request Apr 13, 2026

[RFC]: Diffusers Backend Integration for Extended Model Coverage #2403 JiusiServe/vllm-omni#196

Closed

fhfuih force-pushed the diffusers-backend branch from 3ff4eb4 to c307b39 Compare April 13, 2026 08:02

fhfuih force-pushed the diffusers-backend branch from 358340e to 2005b0f Compare April 15, 2026 03:12

sayakpaul reviewed Apr 15, 2026

View reviewed changes

Gaohan123 added this to the v0.20.0 milestone Apr 15, 2026

fhfuih mentioned this pull request Apr 16, 2026

[RFC]: Diffusers Backend Integration for Extended Model Coverage #2403

Open

1 task

DN6 reviewed Apr 16, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/diffusers_adapter/pipeline_diffusers_adapter.py Outdated

Comment thread vllm_omni/diffusion/models/diffusers_adapter/pipeline_diffusers_adapter.py Outdated

fhfuih force-pushed the diffusers-backend branch from 2005b0f to 292d5fa Compare April 17, 2026 02:34

fhfuih force-pushed the diffusers-backend branch from 5678c56 to 85371a3 Compare April 20, 2026 02:52

BBuf mentioned this pull request Apr 20, 2026

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 BBuf/how-to-optim-algorithm-in-cuda#14

Open

fhfuih force-pushed the diffusers-backend branch 2 times, most recently from 2012d67 to a36cf33 Compare April 20, 2026 13:11

fhfuih marked this pull request as ready for review April 21, 2026 02:06

fhfuih requested a review from hsliuustc0106 as a code owner April 21, 2026 02:06

Copilot AI review requested due to automatic review settings April 21, 2026 02:06

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Copilot started reviewing on behalf of fhfuih April 21, 2026 02:16 View session

yenuo26 reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/diffusion/test_diffusers_adapter.py Outdated

yenuo26 reviewed Apr 21, 2026

View reviewed changes

Comment thread tests/e2e/online_serving/test_diffusers_adapter.py

fhfuih force-pushed the diffusers-backend branch from 57d37b4 to d69ecd8 Compare April 21, 2026 03:26

fhfuih added 2 commits April 21, 2026 11:32

replace unnittest mock with pytest mock

93098dc

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

impl the same attn backend default as registry.py

792cfd2

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

sync doc

af9a8be

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

add seed/generator assertion

ce3f90d

Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

yenuo26 added the ready label to trigger buildkite CI label Apr 21, 2026

SamitHuang approved these changes Apr 22, 2026

View reviewed changes

Gaohan123 merged commit d8cc7a0 into vllm-project:main Apr 22, 2026
8 checks passed

hsliuustc0106 reviewed Apr 22, 2026

View reviewed changes

qinganrice pushed a commit to qinganrice/vllm-omni that referenced this pull request Apr 23, 2026

[feat]: General diffusers adapter backend to run diffusion models (vl…

8517403

…lm-project#2724) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

fhfuih mentioned this pull request Apr 24, 2026

[bugfix][CI] Diffusers backend update #3096

Merged

5 tasks

fhfuih deleted the diffusers-backend branch April 28, 2026 02:38

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[feat]: General diffusers adapter backend to run diffusion models (vl…

4d75b5c

…lm-project#2724) Signed-off-by: Huang, Zeyu <11222265+fhfuih@users.noreply.github.com>

wtomin mentioned this pull request May 7, 2026

[RFC]: vLLM-Omni Diffusion Module — Q2 2026 Roadmap #2226

Open

25 tasks

SamitHuang mentioned this pull request May 12, 2026

[RFC] v0.1 Release Tracker verl-project/verl-omni#47

Open

44 tasks


		Any model loadable via `DiffusionPipeline.from_pretrained()` is supported, including:

		- Text-to-Image: SD 1.5, SD 2.1, SDXL, PixArt-Σ, Kandinsky, DeepFloyd IF

		@@ -0,0 +1,31 @@
		# Example stage config for diffusers backend

Conversation

fhfuih commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Release Note

Uh oh!

hsliuustc0106 commented Apr 13, 2026

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fhfuih Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhfuih commented Apr 17, 2026

Doable in the first PR

Deferred to sequel PRs

Uh oh!

fhfuih commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhfuih commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Feature (Optimization+Parallelism) Support

YAML Config

Perf

Uh oh!

fhfuih commented Apr 21, 2026

Uh oh!

SamitHuang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

fhfuih commented Apr 13, 2026 •

edited

Loading

fhfuih Apr 16, 2026 •

edited

Loading

fhfuih commented Apr 20, 2026 •

edited

Loading

fhfuih commented Apr 21, 2026 •

edited

Loading

SamitHuang left a comment •

edited

Loading

fhfuih commented Apr 23, 2026 •

edited

Loading