[SKILL] add-diffusion-model by SamitHuang · Pull Request #15 · hsliuustc0106/vllm-omni-skills

SamitHuang · 2026-03-16T11:09:42Z

Summary

Add a new skill vllm-omni-add-diffusion-model that guides developers through adding a new diffusion model to vLLM-Omni.

What it covers

Two migration paths: diffusers-based models (Path A) and custom/private repo models (Path B)
Step-by-step workflow with checklist: analyze → adapt transformer → adapt pipeline → register → test → docs
Custom model patterns (Path B) based on real implementations:
- Weight loading bypass (DreamID-Omni style: eager load in __init__, no-op load_weights)
- Standard loader with custom name remapping (BAGEL style: weights_sources + custom load_weights)
- Code placement rules (what goes in vllm_omni/diffusion/models/<name>/ vs external deps vs download scripts)
- model_index.json generation for multi-repo weights
- Multi-modal I/O protocol classes (SupportImageInput, SupportAudioInput)

Reference files

File	Content
`SKILL.md` (307 lines)	Main workflow — Path A (diffusers) and Path B (custom)
`references/custom-model-patterns.md`	Weight loading, directory layout, external deps, download scripts
`references/transformer-adaptation.md`	Attention replacement, mixin removal, QKV fusion, `load_weights`
`references/parallelism-patterns.md`	TP, SP, CFG parallel, HSDP
`references/troubleshooting.md`	Common errors and debugging workflow

Tested by

Applied the skill to add Wan-AI/Wan2.1-T2V-1.3B-Diffusers and stable diffusion 1.5 support to vLLM-Omni — see companion PR at vllm-project/vllm-omni.

Covers both diffusers-based and custom/private repo models with step-by-step workflow, code patterns, and troubleshooting guide. Reference files: - transformer-adaptation.md: porting diffusers transformers - custom-model-patterns.md: patterns for non-diffusers models (DreamID-Omni, BAGEL, HunyuanImage3) - parallelism-patterns.md: TP, SP, CFG parallel - troubleshooting.md: common errors and fixes Made-with: Cursor

gemini-code-assist · 2026-03-16T11:09:58Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new 'skill' designed to standardize and streamline the integration of diffusion models into the vLLM-Omni framework. It provides a structured, step-by-step guide for developers, covering both standard Diffusers-based models and custom implementations. The added documentation aims to simplify complex tasks such as model adaptation, weight management, and the application of various parallelism techniques, thereby enhancing the framework's capability to support a wider range of diffusion models efficiently.

Highlights

New Skill Introduction: A new skill, vllm-omni-add-diffusion-model, has been added to guide developers through the process of integrating new diffusion models into vLLM-Omni.
Dual Migration Paths: The skill provides detailed workflows for two distinct migration paths: Diffusers-based models (Path A) and custom/private repository models (Path B), accommodating various model sources.
Comprehensive Reference Documentation: Extensive reference files have been included, covering critical aspects such as custom model patterns, transformer adaptation, parallelism strategies (TP, SP, CFG parallel, HSDP), and a dedicated troubleshooting guide.
Practical Implementation Guidance: The documentation offers practical patterns for weight loading (including bypass and custom remapping), handling external dependencies, generating model_index.json for custom models, and implementing multi-modal input protocols.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

skills/vllm-omni-add-diffusion-model/SKILL.md
- Added a comprehensive guide for integrating new diffusion models into vLLM-Omni, detailing two migration paths (Diffusers-based and Custom/Private Repo) and common steps.
skills/vllm-omni-add-diffusion-model/references/custom-model-patterns.md
- Added a reference document outlining patterns for integrating custom diffusion models, including directory structure, weight loading, model_index.json creation, external dependencies, and multi-modal input protocols.
skills/vllm-omni-add-diffusion-model/references/parallelism-patterns.md
- Added a reference document detailing various parallelism patterns (Tensor Parallelism, CFG Parallelism, Sequence Parallelism, VAE Patch Parallelism, HSDP) and a recommended incremental approach for adding them.
skills/vllm-omni-add-diffusion-model/references/transformer-adaptation.md
- Added a reference document providing a step-by-step guide for adapting Diffusers transformers to vLLM-Omni, covering mixin removal, attention replacement, logger replacement, config support, and weight loading.
skills/vllm-omni-add-diffusion-model/references/troubleshooting.md
- Added a reference document listing common errors encountered when adding diffusion models, along with their causes, fixes, and a general debugging workflow.

Activity

No specific human activity (comments, reviews, progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-structured guide (a "skill") for adding new diffusion models to vLLM-Omni. This is a valuable addition that will significantly help developers extend the library's capabilities. The documentation clearly outlines the process for both standard diffusers-based models and custom models, covering important topics like weight loading, parallelism, and troubleshooting. My review includes several suggestions to enhance the clarity and completeness of the code examples provided, which should make the guide even more user-friendly.

gemini-code-assist · 2026-03-16T11:11:33Z

+        self.vae = custom_init_vae(model, device=self.device)
+        self.text_encoder = custom_init_text_encoder(model, device=self.device)


In this code snippet, self.device is used without being defined, which could be confusing for developers following the guide. Please clarify how the device should be obtained. It might be retrieved from od_config or passed as an argument to __init__.

gemini-code-assist · 2026-03-16T11:11:33Z

+            # Remap original weight names to vllm-omni module names
+            name = self._remap_weight_name(name)
+            if name in params:
+                default_weight_loader(params[name], tensor)


The function default_weight_loader is used here, but its import is not shown in the example. For clarity and consistency with other code snippets in this guide, please mention that it should be imported from vllm.model_executor.model_loader.weight_utils.

gemini-code-assist · 2026-03-16T11:11:33Z

+def download_dependency():
+    CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    with open(LOCK_FILE, "w") as f:
+        fcntl.flock(f, fcntl.LOCK_EX)
+        if not DEPENDENCY_DIR.exists():
+            subprocess.run([
+                "git", "clone", "--depth", "1",
+                REPO_URL, "--branch", BRANCH,
+                str(DEPENDENCY_DIR)
+            ], check=True)
+        fcntl.flock(f, fcntl.LOCK_UN)
+
+    # Add to Python path via .pth file
+    site_packages = Path(site.getsitepackages()[0])
+    pth_file = site_packages / "vllm_omni_dependency.pth"
+    pth_file.write_text(str(DEPENDENCY_DIR))


The download_dependency function example uses several undefined variables (e.g., CACHE_DIR, LOCK_FILE, DEPENDENCY_DIR, REPO_URL, BRANCH). To make this code snippet more practical and easier for a developer to adapt, consider adding placeholder definitions for these variables as module-level constants.

gemini-code-assist · 2026-03-16T11:11:33Z

+```python
+# examples/offline_inference/<name>/download_<name>.py
+from huggingface_hub import snapshot_download
+import json, os


For better readability and to follow common Python style guides (like PEP 8), it's recommended to have one import per line.

Suggested change

import json, os

import json

import os

gemini-code-assist · 2026-03-16T11:11:33Z

+        json.dump(config, f, indent=2)
+
+    # Install external code dependency (if needed)
+    download_dependency()


The Download Script Template includes a call to download_dependency(), but its definition is provided much earlier in this document. This might be confusing for someone who is just using the template. To improve clarity, you could add a comment indicating where to find or how to implement this function.

For example:

# Install external code dependency (if needed) # See "External Dependency Management" section for implementation. download_dependency()

gemini-code-assist · 2026-03-16T11:11:34Z

+        else:
+            # Normal loading
+            ...


The ... placeholder under the # Normal loading comment is a bit vague. To make this advanced example for QKV fusion more complete and easier to understand, please replace the placeholder with the actual code for handling non-fused parameters. Also, ensure default_weight_loader is imported if it's not in scope.

Suggested change

else:

# Normal loading

...

else:

# Normal loading

if name in params:

param = params[name]

if hasattr(param, "weight_loader"):

param.weight_loader(param, tensor)

else:

default_weight_loader(param, tensor)

loaded.add(name)

SamitHuang · 2026-03-16T11:12:54Z

@wtomin PTAL

Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

wtomin · 2026-03-17T02:05:01Z

+### Step 8: Add E2E Tests (Recommended)
+
+Create `tests/e2e/online_serving/test_your_model_expansion.py`.
+


Suggested change

- For diffusion models e2e test, take `tests/e2e/online_serving/test_qwen3_omni_expansion.py` as reference.

- All the features (acceleration, quantization) supported for this model should be tested.

test_qwen3_omni_expansion is Qwen 3 Omni, not diffusion

For diffusion, Qwen Image Edit is okay for now, but if the Qwen Image PR is merged (vllm-project/vllm-omni#1869), can use that as a better reference because more features are included.

Or a canonical ruleset is as follow:

1 GPU: TeaCache & GGUF (or fallback to FP8, or disable it) & Layer-wise CPU offloading (or fallback to Module-wise)

2 GPUs: Cache-DiT & FP8 (or fallback to GGUF, or disable it) & Ulysses = 2

2 GPUs: Cache-DiT & GGUF (or fallback to FP8, or disable it) & Ring = 2

2 GPUs: TeaCache & FP8 (or fallback to GGUF, or disable it) & CFG Parallel = 2

2 GPUs: Cache-DiT & FP8 (or fallback to GGUF, or disable it) & Tensor Parallel = 2 & VAE Patch Parallel = 2

2 GPUs: Cache-DiT & GGUF (or fallback to FP8, or disable it) & HSDP = 2 & VAE Patch Parallel = 2

wtomin · 2026-03-17T02:07:14Z

@fhfuih, @david6666666 PTAL

hsliuustc0106

I think we may also add a option for benchmark serving for the new model and comparing it with diffusers

hsliuustc0106 · 2026-03-17T04:08:06Z

I think we may also add a option for benchmark serving for the new model and comparing it with diffusers

also, specify the hardware

SamitHuang · 2026-03-17T04:25:39Z

I think we may also add a option for benchmark serving for the new model and comparing it with diffusers

should we add another benchmark serving skill?

hsliuustc0106 · 2026-03-17T04:42:15Z

🔍 PR #15 Critical Review

Changes: +1016 lines (new skill)

🔴 Critical Issues

#	Issue	Severity
1	Missing ARCHITECTURE.md update - New skill not added to project structure docs	HIGH
2	Missing README.md update - Skills index doesn't list new skill	HIGH
3	Inconsistent skill name - Frontmatter uses `add-diffusion-model` but project convention is `vllm-omni-<topic>`	MEDIUM

⚠️ Technical Concerns

Issue	Details
Unverifiable import paths	Many imports like `vllm_omni.diffusion.attention.layer.Attention`, `CFGParallelMixin` cannot be verified against this skills repo
Test file references unverified	References `examples/offline_inference/` paths that don't exist in this repo
Duplicate coverage?	`vllm-omni-contrib` skill already exists — scope overlap unclear

✅ Content Quality

Strengths:

Clear two-path structure (diffusers vs custom) matches real-world scenarios
Concrete code examples with proper patterns
Troubleshooting section addresses real pain points
Weight loading patterns well-documented

Weaknesses:

SKILL.md at 308 lines — pushing soft limit; could move more to references
No version indicator for which vLLM-Omni version this targets

📋 Structure Compliance

Check	Status
Skill directory naming (`vllm-omni-*`)	✅
SKILL.md with YAML frontmatter	✅
References in subfolder	✅
ARCHITECTURE.md updated	❌ MISSING
README.md skills index updated	❌ MISSING

🚦 Recommendation

Block until:

Add entry to docs/ARCHITECTURE.md under skills tree
Add row to README.md skills index table

Consider:
3. Change frontmatter name: to vllm-omni-add-diffusion-model for consistency
4. Clarify relationship with existing vllm-omni-contrib skill (is this a specialized subset?)

Review generated with Claude Code

hsliuustc0106 · 2026-03-18T12:18:37Z

I think we may also add a option for benchmark serving for the new model and comparing it with diffusers

should we add another benchmark serving skill?

sounds good. we can trigger that benchmark serving skill in this skill.

hsliuustc0106 · 2026-03-18T12:19:20Z

update the repo readme.md please

fhfuih · 2026-03-19T03:24:55Z

+
+### Step 8: Add E2E Tests (Recommended)
+
+Create `tests/e2e/online_serving/test_your_model_expansion.py`.


Maybe add a short description of how the tests should be written:

- Pick a common or suggested combination of diffusion features (parallelism, quantization, caching, CPU offloading, etc.) and write one test case with these feautre(s) turned on. - The test case should be named `def test_{your_model_name}` - Refer to `tests/e2e/online_serving/test_qwen_image_edit_expansion.py` for the available helper functions, constants, and fixtures to reuse in your test. (Do not need to apply multiple test cases of complete diffusion feature set in this file. Only add one test case as instructed above) - Set num_inference_steps to 2, set image dimension to 512*512. For any other input & params, also do it similarly to `tests/e2e/online_serving/test_qwen_image_edit_expansion.py`

SamitHuang mentioned this pull request Mar 16, 2026

[Docs] Add Wan2.1-T2V as supported video generation models vllm-project/vllm-omni#1920

Merged

gemini-code-assist Bot reviewed Mar 16, 2026

View reviewed changes

SamitHuang changed the title ~~Add skill: add-diffusion-model~~ [SKILL] add-diffusion-model Mar 16, 2026

wtomin reviewed Mar 16, 2026

View reviewed changes

Comment thread skills/vllm-omni-add-diffusion-model/SKILL.md Outdated

SamitHuang mentioned this pull request Mar 16, 2026

[New Model][Skill Eval] Add Stable Diffusion v1.5 as first UNet-based diffusion model vllm-project/vllm-omni#1924

Open

Update skills/vllm-omni-add-diffusion-model/SKILL.md

c376cdc

Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>

wtomin reviewed Mar 17, 2026

View reviewed changes

hsliuustc0106 reviewed Mar 17, 2026

View reviewed changes

wtomin mentioned this pull request Mar 17, 2026

[Skill] Adding A Diffusion Model PR review #16

Merged

10 tasks

fhfuih reviewed Mar 19, 2026

View reviewed changes

hsliuustc0106 merged commit ab59462 into hsliuustc0106:main Mar 21, 2026

		self.vae = custom_init_vae(model, device=self.device)
		self.text_encoder = custom_init_text_encoder(model, device=self.device)

-        else:
-            # Normal loading
-            ...
+        else:
+            # Normal loading
+            if name in params:
+                param = params[name]
+                if hasattr(param, "weight_loader"):
+                    param.weight_loader(param, tensor)
+                else:
+                    default_weight_loader(param, tensor)
+                loaded.add(name)

		### Step 8: Add E2E Tests (Recommended)

		Create `tests/e2e/online_serving/test_your_model_expansion.py`.


	- For diffusion models e2e test, take `tests/e2e/online_serving/test_qwen3_omni_expansion.py` as reference.
	- All the features (acceleration, quantization) supported for this model should be tested.

Conversation

SamitHuang commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What it covers

Reference files

Tested by

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

SamitHuang commented Mar 16, 2026

Uh oh!

Uh oh!

wtomin Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

fhfuih Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

wtomin commented Mar 17, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Mar 17, 2026

Uh oh!

SamitHuang commented Mar 17, 2026

Uh oh!

hsliuustc0106 commented Mar 17, 2026

🔍 PR #15 Critical Review

🔴 Critical Issues

⚠️ Technical Concerns

✅ Content Quality

📋 Structure Compliance

🚦 Recommendation

Uh oh!

hsliuustc0106 commented Mar 18, 2026

Uh oh!

hsliuustc0106 commented Mar 18, 2026

Uh oh!

fhfuih Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SamitHuang commented Mar 16, 2026 •

edited

Loading

fhfuih Mar 19, 2026 •

edited

Loading