[feat] Add cache-dit support for helios by JasonJ2021 · Pull Request #2473 · vllm-project/vllm-omni

JasonJ2021 · 2026-04-03T10:22:16Z

Purpose

Accelerate Helios model with cache-dit

Test Plan

python end2end.py \
    --cache-backend cache_dit --enable-cache-dit-summary --model BestWishYsh/Helios-Base \
    --sample-type t2v \
    --prompt "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train." \
    --guidance-scale 5.0 \
    --output helios_t2v_base.mp4

Test Result

On 1xH20 Server: E2E time is reduced from 801s to 311s, 2.57x acceleration.

Total generation time: 311.7618 seconds (311761.76 ms)

Final output video:

helios_t2v_base.mp4

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>

Copilot

Pull request overview

Adds cache-dit acceleration support to the Helios diffusion pipeline, and exposes a minimal CLI surface in the Helios offline example to enable it.

Changes:

Introduce a Helios-specific cache-dit enabler (enable_cache_for_helios) using BlockAdapter and register it in CUSTOM_DIT_ENABLERS.
Extend examples/offline_inference/helios/end2end.py with --cache-backend cache_dit and --enable-cache-dit-summary, wiring them into Omni(...).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`vllm_omni/diffusion/cache/cache_dit_backend.py`	Adds and registers a Helios cache-dit enabler + refresh logic.
`examples/offline_inference/helios/end2end.py`	Adds CLI flags and passes cache-dit configuration into the Omni diffusion run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T10:28:20Z

+def enable_cache_for_helios(pipeline: Any, cache_config: Any) -> Callable[[int], None]:
+    """Enable cache-dit for Helios pipeline.
+
+    Args:
+        pipeline: The Helios pipeline instance.
+        cache_config: DiffusionCacheConfig instance with cache configuration.
+
+    Returns:
+        A refresh function that can be called to update cache context with new num_inference_steps.
+    """


The return type annotation and docstring for enable_cache_for_helios don’t match what is actually returned/used. The returned refresh_cache_context expects (pipeline, num_inference_steps, verbose) (as required by CacheDiTBackend.refresh), but the function is annotated as Callable[[int], None] and the doc says it can be called with just num_inference_steps. Update the annotation/docstring (or wrap the function) so the signature is accurate and type-checkable.

Copilot · 2026-04-03T10:28:21Z

        "BagelPipeline": enable_cache_for_bagel,
        "GlmImagePipeline": enable_cache_for_glm_image,
        "Flux2Pipeline": enable_cache_for_flux2,
+        "HeliosPipeline": enable_cache_for_helios,
    }
 )


HeliosPipeline was added to CUSTOM_DIT_ENABLERS, but there’s no unit test covering this new custom enabler (unlike the existing HunyuanImage3Pipeline registration test). Add a test in tests/diffusion/cache/test_cache_backends.py to assert the registry entry exists and that enabling on a mocked HeliosPipeline calls cache_dit.enable_cache with a BlockAdapter configured against pipeline.transformer/pipeline.transformer.blocks and that backend.refresh() targets pipeline.transformer.

JasonJ2021 · 2026-04-03T13:03:55Z

@wtomin @hsliuustc0106 PTAL, thx~

hsliuustc0106 · 2026-04-03T14:00:24Z

can you compare with the original paper and check how the perf can reach up to 20 fps in H100 device? is there still any gap between vllm-omni and the recommended inference?

JasonJ2021 · 2026-04-03T14:16:00Z

can you compare with the original paper and check how the perf can reach up to 20 fps in H100 device? is there still any gap between vllm-omni and the recommended inference?

Sadly, I don’t have access to an H100 machine. However, I can compare the performance of vllm-omni with the official inference framework on H20 later.

SamitHuang

Missing test for this new feature. Please add tests in tests/diffusion/cache/test_cache_backends.py for the new HeliosPipeline custom enabler, similar to the existing HunyuanImage3Pipeline tests, to verify the registry and adapter configuration.

SamitHuang · 2026-04-03T15:59:56Z



+def enable_cache_for_helios(pipeline: Any, cache_config: Any) -> Callable[[int], None]:
+    """Enable cache-dit for Helios pipeline.


The return type annotation Callable[[int], None] does not match the actual returned function signature (pipeline, num_inference_steps, verbose). Please update the annotation to Callable[..., None] or correct the type hint to match.

fixed, cc @SamitHuang

Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>

hsliuustc0106 · 2026-04-09T06:14:02Z

Hi, I want to know whether vllm-omni currently can achieve better inference speed compared with the official repo? can you help the community make comparison and check the profiling details?

JasonJ2021 · 2026-04-09T06:15:17Z

hsliuustc0106 requested a review from SamitHuang

yeah, i will work on this

hsliuustc0106 · 2026-04-09T06:15:38Z

here is the official repo: https://github.com/PKU-YuanGroup/Helios

JasonJ2021 · 2026-04-13T15:40:54Z

Performance Comparison: Helios Base T2V Model

I ran performance tests comparing the official Helios repo with vllm-omni (both with and without cache-dit enabled) for generating a time-lapse video on H20. The results seem to be significantly slower than the official data, even when considering that the H20 has only 1/13th the TFLOPS of the H100 (148 TFLOPS vs 1,979 TFLOPS in FP16).

Test Setup:

Model: BestWishYsh/Helios-Base
Task: Text-to-Video (T2V)
Prompt: "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train."
Guidance Scale: 5.0
Frames: 99
FPS: 16

Results

Model	Time Taken (Seconds)
Helios Official Repo	814.51s
vllm-omni (without cache-dit)	796.60s
vllm-omni (with cache-dit)	310.05s

Commands:

Helios Official Repo

python infer_helios.py \
    --base_model_path "BestWishYsh/Helios-Base" \
    --transformer_path "BestWishYsh/Helios-Base" \
    --sample_type "t2v" \
    --num_frames 99 \
    --fps 16 \
    --prompt "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train." \
    --guidance_scale 5.0 \
    --enable_compile \
    --output_folder "./output_helios/helios-base"

vllm-omni without cache-dit

python end2end.py \
--model BestWishYsh/Helios-Base \
--sample-type t2v \
--prompt "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train." \
--guidance-scale 5.0 \
--output helios_t2v_base.mp4

vllm-omni with cache-dit

python end2end.py \
--cache-backend cache_dit --enable-cache-dit-summary --model BestWishYsh/Helios-Base \
--sample-type t2v \
--prompt "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train." \
--guidance-scale 5.0 \
--output helios_t2v_base.mp4

@hsliuustc0106 @SamitHuang

lishunyang12

Review: [feat] Add cache-dit support for helios

Overall this is a clean addition that follows the existing Flux2 enabler pattern well. The 2.57x speedup result is impressive. A few items to address:

Issues

1. Tautological assertion in test (test_cache_backends.py, line ~206)

assert adapter_kwargs["forward_pattern"] == adapter_kwargs["forward_pattern"].__class__.Pattern_2

This compares adapter_kwargs["forward_pattern"] with a value derived from itself -- it will always pass regardless of what the value actually is. It should be:

assert adapter_kwargs["forward_pattern"] == ForwardPattern.Pattern_2

You need to import ForwardPattern in the test (or reference it from the mock) and assert against the expected enum value directly.

2. Unrelated test change (test_cache_backends.py, line 60)

The change from Fn_compute_blocks: 2 to Fn_compute_blocks: 1 in test_enable_single_transformer is unrelated to Helios support. If this was needed to fix a pre-existing test failure, it should be called out in the PR description. If not, please revert it to keep the PR focused.

Suggestions (non-blocking)

3. Hardcoded cache config in end2end.py

The cache config dict in end2end.py (lines 219-227) hardcodes values like residual_diff_threshold: 0.24, max_continuous_cached_steps: 3, etc. Consider either:

Exposing key tunables (at least residual_diff_threshold) as CLI arguments, or
Adding a comment noting these are tuned defaults for Helios-Base so future users know they may need adjustment for other Helios variants.

This is fine for an initial PR but worth a follow-up.

4. Trailing comma in log string

In enable_cache_for_helios (around line 1251 of cache_dit_backend.py):

f"W={db_cache_config.max_warmup_steps}, "

The trailing comma+space at the end of the f-string is a minor cosmetic issue (also present in some existing enablers). Not blocking.

Looks Good

The enabler correctly uses BlockAdapter with transformer.blocks, ForwardPattern.Pattern_2, has_separate_cfg=True, and check_forward_pattern=True -- appropriate for a Helios-style architecture.
The refresh function properly handles both plain refresh and SCM mask policy refresh, consistent with other enablers.
The pipeline.transformer is None guard in both enable and refresh paths is good defensive coding.
Tests cover the happy path, missing-transformer-on-enable, and missing-transformer-on-refresh scenarios -- good coverage.
Registration in CUSTOM_DIT_ENABLERS is correct.

Please fix issue #1 (tautological assertion). Issue #2 needs clarification or revert.

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>

JasonJ2021 · 2026-04-19T05:41:38Z

Blocking Issues fixed. cc. @lishunyang12

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>

hsliuustc0106 · 2026-04-20T14:50:17Z

can you help add a recipe for this model?

Signed-off-by: Jason <72191212+JasonJ2021@users.noreply.github.com>

JasonJ2021 · 2026-04-20T15:13:50Z

can you help add a recipe for this model?

Yes, and Should I add the recipe in the vllm-project/recipes repo?

lishunyang12 · 2026-04-23T11:28:09Z

can you help add a recipe for this model?

Yes, and Should I add the recipe in the vllm-project/recipes repo?

Please refer to #2645 and submit your prs.

feat: support cache-dit for helios

e0ae95e

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>

Copilot AI review requested due to automatic review settings April 3, 2026 10:22

JasonJ2021 requested a review from hsliuustc0106 as a code owner April 3, 2026 10:22

Copilot started reviewing on behalf of JasonJ2021 April 3, 2026 10:22 View session

format

24c7d11

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>

Copilot AI reviewed Apr 3, 2026

View reviewed changes

SamitHuang requested changes Apr 3, 2026

View reviewed changes

add test, fix type mismatch

1eeea1e

Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>

hsliuustc0106 requested a review from SamitHuang April 9, 2026 06:14

wtomin mentioned this pull request Apr 15, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

lishunyang12 reviewed Apr 16, 2026

View reviewed changes

fix

c279330

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>

JasonJ2021 requested a review from lishunyang12 April 19, 2026 07:25

refine logger info

e659b06

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>

JasonJ2021 force-pushed the dev-cachedit branch from bb0f2d6 to e659b06 Compare April 20, 2026 15:03

Merge branch 'main' into dev-cachedit

88ed1a9

Signed-off-by: Jason <72191212+JasonJ2021@users.noreply.github.com>



		def enable_cache_for_helios(pipeline: Any, cache_config: Any) -> Callable[[int], None]:
		"""Enable cache-dit for Helios pipeline.

Conversation

JasonJ2021 commented Apr 3, 2026

Purpose

Test Plan

Test Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

JasonJ2021 commented Apr 3, 2026

Uh oh!

hsliuustc0106 commented Apr 3, 2026

Uh oh!

JasonJ2021 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamitHuang left a comment

Choose a reason for hiding this comment

Uh oh!

SamitHuang Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

JasonJ2021 Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Apr 9, 2026

Uh oh!

JasonJ2021 commented Apr 9, 2026

Uh oh!

hsliuustc0106 commented Apr 9, 2026

Uh oh!

JasonJ2021 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Comparison: Helios Base T2V Model

Test Setup:

Results

Commands:

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Review: [feat] Add cache-dit support for helios

Issues

Suggestions (non-blocking)

Looks Good

Uh oh!

JasonJ2021 commented Apr 19, 2026

Uh oh!

hsliuustc0106 commented Apr 20, 2026

Uh oh!

JasonJ2021 commented Apr 20, 2026

Uh oh!

lishunyang12 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JasonJ2021 commented Apr 3, 2026 •

edited

Loading

JasonJ2021 Apr 9, 2026 •

edited

Loading

JasonJ2021 commented Apr 13, 2026 •

edited

Loading