[Feature] Support multi-LoRA composition for diffusion models by ultism · Pull Request #2309 · vllm-project/vllm-omni

ultism · 2026-03-29T13:02:01Z

Summary

Implements multi-LoRA composition for diffusion models, enabling multiple LoRA adapters to be active simultaneously on a single request with independent per-adapter scales. This is a core workflow for the diffusion community (CivitAI, ComfyUI, etc.) and was tracked in RFC #2149.

Key changes:

Slot-based buffer management: DiffusionLoRAManager now manages N GPU buffer slots (max_loras) instead of a single slot. Each adapter occupies one slot; unused slots are reset.
Additive composition in apply(): DiffusionBaseLinearLayerWithLoRA.apply() iterates all active adapter slots and accumulates LoRA deltas: y += Σ (x @ Aᵢᵀ) @ Bᵢᵀ
API surface updated to plural: lora_request/lora_scale → lora_requests: list/lora_scales: list across sampling params, entrypoints, and OpenAI-compatible protocol
OpenAI endpoint accepts single dict or list of dicts for the lora field in /v1/images/generations, /v1/images/edits, /v1/chat/completions, and /v1/videos/generations
LRU cache manages adapter registration independently of activation — max_cpu_loras controls cache size, max_loras controls concurrent GPU slots

New config parameter:

max_loras: int = 1   # max adapters composed per request (GPU slot count)
max_cpu_loras: int    # max adapters cached on CPU (LRU), defaults to max_loras

API Usage

Python (offline inference)

from vllm_omni.inputs.data import OmniDiffusionSamplingParams
from vllm_omni.lora.request import LoRARequest

params = OmniDiffusionSamplingParams(
    lora_requests=[lora_a, lora_b],
    lora_scales=[0.8, 0.6],
    # ... other params
)

OpenAI-compatible endpoint

{
  "prompt": "1girl, portrait",
  "lora": [
    {"name": "style_a", "path": "/path/to/lora_a", "scale": 0.8},
    {"name": "style_b", "path": "/path/to/lora_b", "scale": 0.6}
  ]
}

E2E Validation

Tested on RTX 5090 with Z-Image Turbo base model and two CivitAI LoRAs:

Jinx Arcane (character style)
Character Design Sheet Helper (layout style)

XY grid (Jinx scale × Chardesign scale, values: 0, 0.25, 0.5, 1.0):

The grid demonstrates:

Each axis independently controls its respective adapter's influence
Composition is additive — both styles blend when both scales are non-zero
Scale 0 correctly disables the adapter without affecting the other

Files Changed (15 files, +775/-381)

Area	Files	Description
Core	`diffusion/lora/manager.py`	Multi-slot activation, extracted `_set_lora_for_layer()`, `_activate_adapters()`
Core	`diffusion/lora/layers/base_linear.py`	Multi-adapter `apply()` loop, per-slot active-slice tracking
Config	`diffusion/data.py`	Added `max_loras` param, validation
Data	`inputs/data.py`	`lora_request` → `lora_requests`, `lora_scale` → `lora_scales`
Worker	`diffusion/worker/diffusion_worker.py`	Plumbed `max_loras`, updated `set_active_adapter` calls
Engine	`engine/async_omni_engine.py`	Pass `max_loras` through engine config
Entrypoints	`entrypoints/async_omni_diffusion.py`	Plural `lora_requests` parameter
Entrypoints	`entrypoints/openai/utils.py`	New `parse_lora_requests()` for `dict \| list[dict] \| None`
Entrypoints	`entrypoints/openai/api_server.py`	Updated image gen/edit endpoints
Entrypoints	`entrypoints/openai/serving_chat.py`	Updated chat-based image generation
Entrypoints	`entrypoints/openai/serving_video.py`	Updated video generation
Protocol	`entrypoints/openai/protocol/images.py`	`lora` field accepts `list[dict]`
Protocol	`entrypoints/openai/protocol/videos.py`	`lora` field accepts `list[dict]`
Tests	`tests/diffusion/lora/test_lora_manager.py`	7 new multi-LoRA unit tests + updated existing tests
Tests	`tests/e2e/offline_inference/test_diffusion_lora.py`	New `test_diffusion_multi_lora_composition` E2E test

Test Plan

Updated existing LoRA unit tests to use the new plural API (set_active_adapters, slot-indexed set_calls/reset_calls)
Added 7 new multi-LoRA unit tests: multi-slot activation, unused slot reset, max_loras exceeded, mismatched lengths, empty deactivation, zero-scale filtering, scale comparison — all passed
Added E2E test test_diffusion_multi_lora_composition: creates two synthetic LoRA adapters perturbing different QKV slices, verifies composed output differs from each adapter alone and from baseline — passed
Manual E2E: XY grid with real CivitAI LoRAs on RTX 5090 (Z-Image Turbo, 16 image combinations) — passed

Breaking Changes

OmniDiffusionSamplingParams.lora_request → lora_requests (list)
OmniDiffusionSamplingParams.lora_scale → lora_scales (list)
DiffusionLoRAManager.set_active_adapter() → set_active_adapters()

These are internal APIs not yet part of a stable release.

Closes #2149

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d04ada9f74

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T13:06:02Z

+        if lora_requests:
+            sampling_params.lora_requests = lora_requests


Set default scales when lora_requests override sampling params

When generate_batch(..., lora_requests=...) (and similarly generate) is used, only sampling_params.lora_requests is populated while sampling_params.lora_scales is left at its default empty list. The worker then calls set_active_adapters() and hits the length check (lora_requests vs lora_scales), so requests fail with ValueError whenever callers rely on this new convenience argument without manually setting scales in sampling_params.

Useful? React with 👍 / 👎.

Thanks for catching this! Fixed in commit ca4d418 — when lora_requests is passed via generate() / generate_batch() without an explicit lora_scales, lora_scales now defaults to [1.0] * len(lora_requests) so the length check in set_active_adapters no longer trips. Callers using the convenience argument without specifying scales will get uniform 1.0 scaling, which matches the most common use case.

chatgpt-codex-connector · 2026-03-29T13:06:02Z

+        if len(lora_requests) > self.max_loras:
+            raise ValueError(
+                f"Requested {len(lora_requests)} adapters but max_loras={self.max_loras}"


Filter zero-scale adapters before enforcing max_loras

This limit check runs before zero-scale adapters are removed, so requests like [adapter_a, adapter_b] with scales [1.0, 0.0] are rejected when max_loras=1 even though only one adapter would actually be active. Since zero scales are explicitly treated as disabled later in the method, enforcing max_loras on the unfiltered list causes avoidable request failures.

Useful? React with 👍 / 👎.

Thanks for flagging — this is actually intentional. In this design, scale=0.0 is semantically distinct from omitting the adapter from lora_requests:

0.0 → adapter stays registered and occupies a LoRA cache slot, but contributes nothing to this forward pass.

Absent from the list → slot is released.

So max_loras is enforced against the full lora_requests list on purpose. If we filtered zero-scale adapters first, the same list [A, B] would pass under max_loras=1 with scales [1.0, 0.0] but fail with [1.0, 1.0] — a state-dependent capacity error that's harder to reason about than rejecting up front.

To release a slot, callers should remove the adapter from the request rather than pass 0.0. I've added an inline comment on the check to make this contract explicit.

Enable composing multiple LoRA adapters per request (e.g. pose + style + control), which is standard practice in CivitAI/ComfyUI workflows. Each adapter occupies its own slot in the pre-allocated lora_a_stacked/lora_b_stacked buffers, and apply() accumulates deltas from all active slots. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

…erate() When callers pass lora_requests through the generate/generate_batch convenience parameter without explicitly setting lora_scales in sampling_params, the worker would hit a length mismatch ValueError. Default to scale 1.0 for each adapter when scales are not provided. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Remove extra blank line and collapse single-expression ValueError raises to satisfy the ruff formatter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Expand multi-argument LoRALayerWeights() calls and long type annotations to satisfy the ruff formatter line-length rules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

yenuo26 · 2026-03-30T02:45:24Z


+
+@pytest.mark.parametrize("model_name", models)
+def test_diffusion_model(model_name: str, tmp_path: Path):


please add @pytest.mark.diffusion @pytest.mark.advanced_model
For more information about tags, please refer to docs/contributing/ci/tests_markers.md

Both markers are in place — @pytest.mark.diffusion and @pytest.mark.advanced_model are applied to test_diffusion_model at test_diffusion_lora.py:93-94, matching docs/contributing/ci/tests_markers.md. Thanks for flagging.

SamitHuang

For the test plan, please give the whole test script or add an example

SamitHuang · 2026-03-30T09:44:39Z

please also update the lora feature docs.

Done — docs/user_guide/diffusion/lora.md has been reworked to cover the new flows:

355ecea5 — rewrote the guide around init-time vs per-request LoRA.

5be2773f — clarified that sampling_params_list is stage-indexed (not per-prompt) and noted CLI auto-sizes max_loras.

7f5f7fd8 — documented the --axis XYZ system for Cartesian-product grids across lora_scale[i]/prompt/guidance_scale/num_inference_steps/seed.

The guide now has separate "Init-time LoRA" and "Per-request LoRA" sections with zero-scale slot semantics, max_loras sizing, multi-LoRA composition examples, and CLI snippets. Let me know if anything's still missing.

lishunyang12

left a few comments. the zero-scale filtering ordering bug is the main one.

lishunyang12 · 2026-04-02T15:10:26Z

+        if len(lora_requests) > self.max_loras:
+            raise ValueError(f"Requested {len(lora_requests)} adapters but max_loras={self.max_loras}")
+
+        # Filter out zero-scale adapters


+1 on the codex bot's comment — max_loras is enforced before zero-scale adapters are filtered out a few lines below, so [a, b] with scales [1.0, 0.0] blows up when max_loras=1 even though only one adapter would actually be active.

Move the len(lora_requests) > self.max_loras check to after the zero-scale filter loop, operating on active_requests instead.

Good catch, but this ordering is intentional and I'd like to keep it. The key semantic distinction is that scale=0.0 is not equivalent to omitting the adapter — a zero-scale adapter is still registered and occupies a LoRA cache slot, it just contributes nothing to this forward pass. Releasing the slot requires removing the adapter from the request entirely.

If we moved the max_loras check after the zero-scale filter, [A, B] with scales [1.0, 0.0] would pass under max_loras=1, but the same list with scales [1.0, 1.0] (or even [1.0, 0.5]) would fail. That state-dependent capacity behavior is more surprising than the current upfront rejection.

I've pushed a commit that adds an inline comment on the check, so the intent is obvious at the call site. Happy to revisit if you see a concrete use case where zero-scale "soft disable" without a slot is required — but that feels like a distinct feature (e.g. an explicit "reserved but inactive" flag) rather than overloading 0.0.

Will address the _are_active_at_scales ordering TODO and the dead _parse_lora_request cleanup in a follow-up commit.

lishunyang12 · 2026-04-02T15:10:26Z

+                return False
+            if self._get_rounded_scale(scale) != self._get_rounded_scale(active_scale):
+                return False
+        return True


_are_active_at_scales is order-sensitive — requesting [A, B] then [B, A] with the same scales will trigger a full re-activation even though the composed result is identical (addition is commutative). Probably fine for now but worth a TODO.

Good call — added the TODO at manager.py:508-511 inline above the length check, noting the multiset comparison as the follow-up. Keeping the order-sensitive behavior for now since it's correct, just suboptimal.

lishunyang12 · 2026-04-02T15:10:26Z


+def _parse_lora_requests(lora_body: dict[str, Any] | list[dict[str, Any]] | None):
+    try:
+        return parse_lora_requests(lora_body)


Nit: is _parse_lora_request (singular) still called anywhere? Both call sites now use _parse_lora_requests. If it's dead code, remove it.

Cleaned up in the multi-LoRA refactor — the singular _parse_lora_request no longer exists anywhere in the tree (rg _parse_lora_request returns nothing). What's at api_server.py:1672 now is the plural _parse_lora_requests, which is a thin wrapper that translates ValueError from parse_lora_requests (in entrypoints/openai/utils.py) into HTTPException(400) — kept because the util is also called from non-HTTP paths (serving_video.py, serving_chat.py) that shouldn't raise HTTP errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Resolve conflicts from upstream refactor (PR vllm-project#2006 "Separate AR/Generation/Diffusion stage processes", RFC vllm-project#967): - Accept deletion of vllm_omni/entrypoints/async_omni_diffusion.py; AsyncOmniDiffusion is gone, replaced by StageDiffusionProc subprocess. - Migrate lora_scales defaulting from AsyncOmniDiffusion.generate() to OmniDiffusionSamplingParams.__post_init__ so any caller setting lora_requests without lora_scales still gets 1.0 per adapter. - Merge stage_engine_args refactor in async_omni_engine.py; add max_loras=kwargs.get("max_loras", 1) into the unified dict. - Accept upstream's test conftest (FakeLinearBase / DummyBaseLayerWithLoRA public names); upgrade conftest fixtures to multi-slot signatures so both upstream and multi-LoRA tests pass. - Migrate tests/e2e/offline_inference/test_diffusion_lora.py to OmniRunner context manager. - Keep parse_lora_requests (plural) in serving_chat/serving_video/utils; drop upstream's isinstance(lora_body, dict) guard so list-of-dicts still works. - Fix stale self._active_adapter_id (singular) ref in DiffusionLoRAManager._deactivate_all_adapters uncovered during merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

…nearLayerWithLoRA Commit 86553dc (multi-LoRA composition) inadvertently replaced the original Punica explanation — including the Shrink/Expand semantics matching PunicaWrapperGPU.add_lora_linear() — with a multi-LoRA blurb. Restore the original text from vllm-project#758 and append the multi-LoRA note as a separate paragraph so both pieces of context coexist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

- Add @pytest.mark.diffusion @pytest.mark.advanced_model to test_diffusion_model and test_diffusion_multi_lora_composition (yenuo26). - Add TODO on _are_active_at_scales flagging order-sensitivity — LoRA delta composition is commutative so same-set-different-order re-requests needlessly trigger full re-activation (lishunyang12). - Remove dead singular _parse_lora_request wrapper and its import from api_server.py; only _parse_lora_requests (plural) is called (lishunyang12). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

…s API PR vllm-project#2309 renamed DiffusionLoRAManager.set_active_adapter (singular) to set_active_adapters (plural) with list signatures. LTX2 distilled stage was added to upstream in vllm-project#2260 after vllm-project#2309 branched, so its two call sites were written against the old singular API and broke when this branch merged upstream/main. Wrap the single LoRARequest / scale in one-element lists to match the new signature; behavior is identical. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

… plot The existing CLI only exposed init-time LoRA (--lora-path / --lora-scale) which pre-loads a single adapter in the engine. Add per-request flags and a grid-plot mode to surface the multi-LoRA composition added in PR vllm-project#2309 and to give reviewers an end-to-end example script (SamitHuang asked for either a full test script or an example). New CLI args: - --prompts (nargs='+') batched generation in one omni.generate() call - --lora-paths (nargs='+') per-request adapters; mutex with --lora-path - --lora-scales matching scales (default 1.0 each) - --max-loras sizes the adapter cache; auto-derived from len(--lora-paths) when omitted - --xyz render {baseline, each adapter alone, composed} x prompts matrix, stitch to grid.png - --output-dir required whenever the run produces >1 image; cells saved as p{p}_c{c}_n{n}.png Behavior: - No --lora-paths: one implicit combo, same path as before. - --xyz: resolves to baseline + each-adapter + composed (when >=2 adapters). - Per combo, a fresh torch.Generator(seed) is constructed so seed is the only source of noise variance across combos (matches the assertion pattern in tests/e2e/offline_inference/test_diffusion_lora.py). README.md updated with Init-time / Per-request / XYZ subsections under Advanced Features > LoRA and new rows in the Key Arguments table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

The previous LoRA guide documented an Omni.generate(lora_request=..., lora_scale=...) signature that never existed on the unified AsyncOmni.generate API — callers following the docs hit "unexpected keyword argument". It also pre-dated multi-LoRA composition (PR vllm-project#2309) and treated LoRA as a single feature. Restructure into two top-level sections that match the cpu_offload_diffusion style: - Init-time LoRA: Omni(lora_path=..., lora_scale=...) pre-loads a single adapter and applies it to every request. Lowest overhead, no runtime swap. - Per-request LoRA: sampling_params.lora_requests=[...] and .lora_scales attach 0..N adapters per request, enabling multi-LoRA composition and per-request adapter switching. Includes the zero-scale semantics (scale=0.0 occupies a slot but contributes nothing, distinct from omitting the adapter) and the max_loras cache-size constraint. Multi-LoRA is folded into the Per-request section as the natural list-length-greater-than-one case rather than a separate chapter. All code examples reference real APIs and are runnable. The Wan2.2 LightX2V appendix and See Also links are preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Previously when the user passed --max-loras smaller than the number of --lora-paths together with --xyz, the composed combo would fail at request time with a confusing cache-full rejection. Add a pre-flight check so the failure mode is obvious. Also tighten the --xyz help text to call out that the composed column is skipped for a single-adapter run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

The previous lora.md claim that different requests in one omni.generate batch can carry distinct adapter sets via sampling_params_list= is incorrect. sampling_params_list on omni.generate is stage-indexed — one entry per pipeline stage — and is shared across all prompts in the batch (see vllm_omni/entrypoints/omni.py:131). Per-prompt adapter variance must be done via separate generate() calls. Also annotate the max_loras paragraph with the example CLI's auto-sizing behavior so readers don't think they must always set it explicitly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

_reconstruct_sampling_params only reconverted the legacy singular lora_request key; the new plural lora_requests list was left as raw msgspec-serialized items after IPC, so worker-side iteration surfaced each element as list and crashed at manager.set_active_adapters with 'list' object has no attribute lora_int_id. Iterate and convert every element; drop the dead singular branch since the field is gone from OmniDiffusionSamplingParams. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

cell_images was keyed as (combo_idx, p_idx) but _compose_grid and the num_rows/num_cols call site both treat the key as (row=prompt, col=combo). With 4 combos × 2 prompts the combo index overflowed the canvas height and the right half of grid.png came out blank. Key as (p_idx, combo_idx) and unpack accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

--xy-sweep SCALE... runs an N×N grid for exactly two --lora-paths, with the first adapter's scale varying along X (columns) and the second along Y (rows). Cells are saved as xy_x{x}_y{y}_n{n}.png and stitched into a labeled grid.png. _compose_grid gains optional row_labels/col_labels, so --xyz mode's grid now shows combo labels (adapter name + scale) as column headers and prompts as row headers instead of an unlabeled 2D paste. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Any parameter can go on any axis via NAME=TYPE:v1|v2|... — types are prompt, lora_scale[i], guidance_scale, num_inference_steps, seed. X/Y compose a 2D grid; Z produces one grid_z{k}.png per Z value. Cells emit as cell_x{x}_y{y}_z{z}_n{n}.png. Grid composer gains an optional title banner (used for Z slice); row labels honor explicit newlines from axis labels so LoRA-scale headers like "lora_chardesign\n1.00" wrap cleanly in the margin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Replace --xyz / --xy-sweep references with the unified --axis syntax. Update output filename convention (cell_x{x}_y{y}_z{z}.png and grid_z{k}.png) and add an example showing two LoRA scales on X/Y plus prompt on Z to demonstrate the Cartesian-product semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Annotate the four weight-shape cases (missing / packed / fused multi-slice / fused single-slice) and the non-obvious sub-suffix lookup for packed layers, so the branch structure is readable without tracing call sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

ultism · 2026-04-17T18:24:59Z

Re: test plan — added both:

E2E test: tests/e2e/offline_inference/test_diffusion_lora.py covers init-time LoRA, per-request single/multi-LoRA, zero-scale slot semantics, and max_loras rejection, using a synthetic Z-Image PEFT adapter (tagged @pytest.mark.diffusion @pytest.mark.advanced_model).
Runnable example: examples/offline_inference/text_to_image/text_to_image.py now exposes --lora-path (init-time), --lora-paths + --lora-scales (per-request composition), and --axis x=…/y=…/z=… for Cartesian-product XYZ grids over lora_scale[i] / prompt / guidance_scale / num_inference_steps / seed.

Full walkthroughs with CLI snippets are in docs/user_guide/diffusion/lora.md. cc @SamitHuang

Signed-off-by: ultranationalism <www913363043@gmail.com>

Upstream main added new call sites that reference the singular `_parse_lora_request` / `parse_lora_request` helpers and the corresponding singular `.lora_request` / `.lora_scale` fields, both of which were removed by the multi-LoRA refactor. Update the three new callers in `api_server.py` and `serving_chat.py` to use the plural `_parse_lora_requests` / `parse_lora_requests` helpers and assign the plural `.lora_requests` / `.lora_scales` fields, widening the type guard to accept `list` so multi-LoRA bodies are not silently dropped. Also apply `ruff format` touch-ups to `text_to_image.py` and `stage_diffusion_proc.py` that became due after the merge. Signed-off-by: ultranationalism <www913363043@gmail.com>

Signed-off-by: ultranationalism <www913363043@gmail.com> # Conflicts: # examples/offline_inference/text_to_image/text_to_image.py

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

ultism · 2026-04-25T14:24:22Z

@hsliuustc0106 Could you take a look when you have time? The branch has been rebased onto latest upstream/main and pre-commit is now passing. Ready for review.

ultism requested a review from hsliuustc0106 as a code owner March 29, 2026 13:02

chatgpt-codex-connector Bot reviewed Mar 29, 2026

View reviewed changes

ultism and others added 2 commits March 29, 2026 22:34

ultism force-pushed the multi-lora-v2 branch from 315b55a to ca4d418 Compare March 29, 2026 14:35

ultism and others added 2 commits March 29, 2026 23:38

[CI] Fix ruff formatting for pre-commit checks

fcfd6ed

Remove extra blank line and collapse single-expression ValueError raises to satisfy the ruff formatter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

yenuo26 reviewed Mar 30, 2026

View reviewed changes

wtomin mentioned this pull request Mar 30, 2026

[RFC]: Diffusion Models Features Supports Plan #814

Open

54 tasks

SamitHuang reviewed Mar 30, 2026

View reviewed changes

lishunyang12 reviewed Apr 2, 2026

View reviewed changes

wtomin mentioned this pull request Apr 12, 2026

[RFC]: vLLM-Omni Diffusion Module — Q2 2026 Roadmap #2226

Open

25 tasks

ultism and others added 16 commits April 14, 2026 23:43

[Docs] Clarify zero-scale semantics in set_active_adapters

eee6318

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

[Docs] Drop stale reference to removed --xyz mode in README

035283a

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

Merge remote-tracking branch 'upstream/main' into multi-lora-v2

58154a6

Signed-off-by: ultranationalism <www913363043@gmail.com>

ultism mentioned this pull request Apr 20, 2026

[vllm_omni, rollout] fix: forward-compat with vllm-omni multi-LoRA API rename verl-project/verl#6077

Closed

6 tasks

ultism and others added 2 commits April 25, 2026 22:12

Merge remote-tracking branch 'upstream/main' into multi-lora-v2

c6c83a4

Signed-off-by: ultranationalism <www913363043@gmail.com> # Conflicts: # examples/offline_inference/text_to_image/text_to_image.py

[Bugfix] Fix leftover singular _parse_lora_request call after merge

f01722e

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: ultranationalism <www913363043@gmail.com>

		if lora_requests:
		sampling_params.lora_requests = lora_requests



		@pytest.mark.parametrize("model_name", models)
		def test_diffusion_model(model_name: str, tmp_path: Path):

Conversation

ultism commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes:

New config parameter:

API Usage

Python (offline inference)

OpenAI-compatible endpoint

E2E Validation

Files Changed (15 files, +775/-381)

Test Plan

Breaking Changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamitHuang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ultism commented Apr 17, 2026

Uh oh!

ultism commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ultism commented Mar 29, 2026 •

edited

Loading