Skip to content

[feat] Add cache-dit support for helios#2473

Open
JasonJ2021 wants to merge 6 commits intovllm-project:mainfrom
JasonJ2021:dev-cachedit
Open

[feat] Add cache-dit support for helios#2473
JasonJ2021 wants to merge 6 commits intovllm-project:mainfrom
JasonJ2021:dev-cachedit

Conversation

@JasonJ2021
Copy link
Copy Markdown
Contributor

Purpose

Accelerate Helios model with cache-dit

Test Plan

python end2end.py \
    --cache-backend cache_dit --enable-cache-dit-summary --model BestWishYsh/Helios-Base \
    --sample-type t2v \
    --prompt "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train." \
    --guidance-scale 5.0 \
    --output helios_t2v_base.mp4

Test Result

On 1xH20 Server: E2E time is reduced from 801s to 311s, 2.57x acceleration.

Total generation time: 311.7618 seconds (311761.76 ms)

Final output video:

helios_t2v_base.mp4

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>
Copilot AI review requested due to automatic review settings April 3, 2026 10:22
Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds cache-dit acceleration support to the Helios diffusion pipeline, and exposes a minimal CLI surface in the Helios offline example to enable it.

Changes:

  • Introduce a Helios-specific cache-dit enabler (enable_cache_for_helios) using BlockAdapter and register it in CUSTOM_DIT_ENABLERS.
  • Extend examples/offline_inference/helios/end2end.py with --cache-backend cache_dit and --enable-cache-dit-summary, wiring them into Omni(...).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vllm_omni/diffusion/cache/cache_dit_backend.py Adds and registers a Helios cache-dit enabler + refresh logic.
examples/offline_inference/helios/end2end.py Adds CLI flags and passes cache-dit configuration into the Omni diffusion run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1200 to +1209
def enable_cache_for_helios(pipeline: Any, cache_config: Any) -> Callable[[int], None]:
"""Enable cache-dit for Helios pipeline.

Args:
pipeline: The Helios pipeline instance.
cache_config: DiffusionCacheConfig instance with cache configuration.

Returns:
A refresh function that can be called to update cache context with new num_inference_steps.
"""
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation and docstring for enable_cache_for_helios don’t match what is actually returned/used. The returned refresh_cache_context expects (pipeline, num_inference_steps, verbose) (as required by CacheDiTBackend.refresh), but the function is annotated as Callable[[int], None] and the doc says it can be called with just num_inference_steps. Update the annotation/docstring (or wrap the function) so the signature is accurate and type-checkable.

Copilot uses AI. Check for mistakes.
Comment on lines 1288 to 1293
"BagelPipeline": enable_cache_for_bagel,
"GlmImagePipeline": enable_cache_for_glm_image,
"Flux2Pipeline": enable_cache_for_flux2,
"HeliosPipeline": enable_cache_for_helios,
}
)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HeliosPipeline was added to CUSTOM_DIT_ENABLERS, but there’s no unit test covering this new custom enabler (unlike the existing HunyuanImage3Pipeline registration test). Add a test in tests/diffusion/cache/test_cache_backends.py to assert the registry entry exists and that enabling on a mocked HeliosPipeline calls cache_dit.enable_cache with a BlockAdapter configured against pipeline.transformer/pipeline.transformer.blocks and that backend.refresh() targets pipeline.transformer.

Copilot uses AI. Check for mistakes.
@JasonJ2021
Copy link
Copy Markdown
Contributor Author

@wtomin @hsliuustc0106 PTAL, thx~

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

can you compare with the original paper and check how the perf can reach up to 20 fps in H100 device? is there still any gap between vllm-omni and the recommended inference?

@JasonJ2021
Copy link
Copy Markdown
Contributor Author

JasonJ2021 commented Apr 3, 2026

can you compare with the original paper and check how the perf can reach up to 20 fps in H100 device? is there still any gap between vllm-omni and the recommended inference?

Sadly, I don’t have access to an H100 machine. However, I can compare the performance of vllm-omni with the official inference framework on H20 later.

Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test for this new feature. Please add tests in tests/diffusion/cache/test_cache_backends.py for the new HeliosPipeline custom enabler, similar to the existing HunyuanImage3Pipeline tests, to verify the registry and adapter configuration.



def enable_cache_for_helios(pipeline: Any, cache_config: Any) -> Callable[[int], None]:
"""Enable cache-dit for Helios pipeline.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation Callable[[int], None] does not match the actual returned function signature (pipeline, num_inference_steps, verbose). Please update the annotation to Callable[..., None] or correct the type hint to match.

Copy link
Copy Markdown
Contributor Author

@JasonJ2021 JasonJ2021 Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, cc @SamitHuang

Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Hi, I want to know whether vllm-omni currently can achieve better inference speed compared with the official repo? can you help the community make comparison and check the profiling details?

@hsliuustc0106 hsliuustc0106 requested a review from SamitHuang April 9, 2026 06:14
@JasonJ2021
Copy link
Copy Markdown
Contributor Author

@hsliuustc0106 hsliuustc0106 requested a review from SamitHuang

yeah, i will work on this

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

here is the official repo: https://github.com/PKU-YuanGroup/Helios

@JasonJ2021
Copy link
Copy Markdown
Contributor Author

JasonJ2021 commented Apr 13, 2026

Performance Comparison: Helios Base T2V Model

I ran performance tests comparing the official Helios repo with vllm-omni (both with and without cache-dit enabled) for generating a time-lapse video on H20. The results seem to be significantly slower than the official data, even when considering that the H20 has only 1/13th the TFLOPS of the H100 (148 TFLOPS vs 1,979 TFLOPS in FP16).

Test Setup:

  • Model: BestWishYsh/Helios-Base
  • Task: Text-to-Video (T2V)
  • Prompt: "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train."
  • Guidance Scale: 5.0
  • Frames: 99
  • FPS: 16

Results

Model Time Taken (Seconds)
Helios Official Repo 814.51s
vllm-omni (without cache-dit) 796.60s
vllm-omni (with cache-dit) 310.05s

Commands:

  1. Helios Official Repo
    python infer_helios.py \
        --base_model_path "BestWishYsh/Helios-Base" \
        --transformer_path "BestWishYsh/Helios-Base" \
        --sample_type "t2v" \
        --num_frames 99 \
        --fps 16 \
        --prompt "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train." \
        --guidance_scale 5.0 \
        --enable_compile \
        --output_folder "./output_helios/helios-base"
  2. vllm-omni without cache-dit
    python end2end.py \
    --model BestWishYsh/Helios-Base \
    --sample-type t2v \
    --prompt "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train." \
    --guidance-scale 5.0 \
    --output helios_t2v_base.mp4
    
  3. vllm-omni with cache-dit
    python end2end.py \
    --cache-backend cache_dit --enable-cache-dit-summary --model BestWishYsh/Helios-Base \
    --sample-type t2v \
    --prompt "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train." \
    --guidance-scale 5.0 \
    --output helios_t2v_base.mp4
    

@hsliuustc0106 @SamitHuang

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: [feat] Add cache-dit support for helios

Overall this is a clean addition that follows the existing Flux2 enabler pattern well. The 2.57x speedup result is impressive. A few items to address:

Issues

1. Tautological assertion in test (test_cache_backends.py, line ~206)

assert adapter_kwargs["forward_pattern"] == adapter_kwargs["forward_pattern"].__class__.Pattern_2

This compares adapter_kwargs["forward_pattern"] with a value derived from itself -- it will always pass regardless of what the value actually is. It should be:

assert adapter_kwargs["forward_pattern"] == ForwardPattern.Pattern_2

You need to import ForwardPattern in the test (or reference it from the mock) and assert against the expected enum value directly.

2. Unrelated test change (test_cache_backends.py, line 60)

The change from Fn_compute_blocks: 2 to Fn_compute_blocks: 1 in test_enable_single_transformer is unrelated to Helios support. If this was needed to fix a pre-existing test failure, it should be called out in the PR description. If not, please revert it to keep the PR focused.

Suggestions (non-blocking)

3. Hardcoded cache config in end2end.py

The cache config dict in end2end.py (lines 219-227) hardcodes values like residual_diff_threshold: 0.24, max_continuous_cached_steps: 3, etc. Consider either:

  • Exposing key tunables (at least residual_diff_threshold) as CLI arguments, or
  • Adding a comment noting these are tuned defaults for Helios-Base so future users know they may need adjustment for other Helios variants.

This is fine for an initial PR but worth a follow-up.

4. Trailing comma in log string

In enable_cache_for_helios (around line 1251 of cache_dit_backend.py):

f"W={db_cache_config.max_warmup_steps}, "

The trailing comma+space at the end of the f-string is a minor cosmetic issue (also present in some existing enablers). Not blocking.

Looks Good

  • The enabler correctly uses BlockAdapter with transformer.blocks, ForwardPattern.Pattern_2, has_separate_cfg=True, and check_forward_pattern=True -- appropriate for a Helios-style architecture.
  • The refresh function properly handles both plain refresh and SCM mask policy refresh, consistent with other enablers.
  • The pipeline.transformer is None guard in both enable and refresh paths is good defensive coding.
  • Tests cover the happy path, missing-transformer-on-enable, and missing-transformer-on-refresh scenarios -- good coverage.
  • Registration in CUSTOM_DIT_ENABLERS is correct.

Please fix issue #1 (tautological assertion). Issue #2 needs clarification or revert.

Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>
@JasonJ2021
Copy link
Copy Markdown
Contributor Author

Blocking Issues fixed. cc. @lishunyang12

@JasonJ2021 JasonJ2021 requested a review from lishunyang12 April 19, 2026 07:25
Signed-off-by: JasonJ2021 <jasonj@zju.edu.cn>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

can you help add a recipe for this model?

Signed-off-by: Jason <72191212+JasonJ2021@users.noreply.github.com>
@JasonJ2021
Copy link
Copy Markdown
Contributor Author

can you help add a recipe for this model?

Yes, and Should I add the recipe in the vllm-project/recipes repo?

@lishunyang12
Copy link
Copy Markdown
Collaborator

can you help add a recipe for this model?

Yes, and Should I add the recipe in the vllm-project/recipes repo?

Please refer to #2645 and submit your prs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants