Skip to content

[feature] : add cache-dit for stable-audio-open-1.0#1341

Merged
SamitHuang merged 15 commits into
vllm-project:mainfrom
akshatvishu:cache-dit-sao
Apr 11, 2026
Merged

[feature] : add cache-dit for stable-audio-open-1.0#1341
SamitHuang merged 15 commits into
vllm-project:mainfrom
akshatvishu:cache-dit-sao

Conversation

@akshatvishu
Copy link
Copy Markdown
Contributor

@akshatvishu akshatvishu commented Feb 11, 2026

Part of #1217

Purpose

Add cache-dit support for stable audio open 1.0

Test Plan

    omni = Omni(
        model=MODEL_PATH,
        dtype="float16",
        num_workers=1,
        cache_backend=cache_backend,
        cache_config=cache_config
    )
}

sampling_params = OmniDiffusionSamplingParams(
    num_inference_steps=100,
    guidance_scale=7.0,
    seed=42,
    extra_args={"audio_end_in_s": 10.0}
)

outputs = omni.generate(
    {"prompt": "The sound of a hammer hitting a wooden surface", "negative_prompt": "Low quality, noisy"},
    sampling_params
)

full comprehensive testing can be found in this kaggle_notebook

Test Result

  • Device: cuda

  • GPU: NVIDIA Tesla T4

  • Prompt : The sound of a hammer hitting a wooden surface

  • num_inference_steps=100

  • guidance_scale=7.0,

  • max_audio_length = 10 seconds

Baseline:

Configuration Time Speed Up(vs baseline) file (mp3)
Baseline( OMNI) 25.91 s - baseline.mp3
Baseline (HF Diffuser ) 27.78s - baseline_hf.mp3

Config1:

    "configA_balanced": {
        "Fn_compute_blocks": 2,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 4,
        "residual_diff_threshold": 0.22,
        "max_continuous_cached_steps": 3,
        "enable_taylorseer": True,
        "taylorseer_order": 1,
    },
Configuration Time Speed Up(vs baseline) file (mp3)
OMNI 22.08s 1.17x configA_balanced.mp3
HF Diffuser + CacheDit 24.69s 1.13x configA_balanced_hf.mp3

Config2:

    "configB_aggressive": {
        "Fn_compute_blocks": 1,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 3,
        "residual_diff_threshold": 0.30,
        "max_continuous_cached_steps": 5,
        "enable_taylorseer": True,
        "taylorseer_order": 1,
    },
Configuration Time Speed Up(vs baseline) file (mp3)
OMNI 20.15 s 1.29x configB_aggressive.mp3
HF Diffuser + CacheDit 24.05s 1.16x configB_aggressive_hf.mp3

Config3:

    "configC_ultra": {
        "Fn_compute_blocks": 1,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 2,
        "residual_diff_threshold": 0.35,
        "max_continuous_cached_steps": 6,
        "enable_taylorseer": True,
        "taylorseer_order": 2,
    }
}
Configuration Time Speed Up(vs baseline) file (mp3)
OMNI 19.16 s 1.35x configC_ultra.mp3
HF Diffuser + CacheDit 20.90s 1.33x configC_ultra_hf.mp3

Files are in .mp3 format as github doesn't support .wav in comments.

Note :

  • Stable Audio Open 1.0 exhibits a high natural step-to-step drift (median residual ≈0.34) as seen in cache-dit.summary() when running the same config as vllm-omni in hf diffuser+cache-dit setup. To achieve significant speedups on T4 hardware, it is necessary to use a residual_diff_threshold near or above this drift value as using conservative residual_diff_threshold like 0.12 resulted in 1.00x speedup (or even slowdowns) because the cache missed on nearly every step, leaving only the management overhead without any compute savings.

  • The vllm-omni orchestrator performs a 1-step dummy warmup run during server initialization, If a user provides an SCM (Step Computation Masking) policy, the engine crashes with the following error:

AssertionError: Only total_steps=4 or 6 is supported for predefined masks while total_steps < 8. Got total_steps=1.

Thus, I am wondering if we should a guard condition like below or it's an acceptable behavior.

def refresh_cache_context(pipeline: Any, num_inference_steps: int, verbose: bool = True) -> None:
    """
    Refresh cache context. 
    Guards against 1-step dummy warmup causing SCM mask generation errors.
    """
    # Disable SCM policy for the 1-step dummy warmup to prevent AssertionError
    effective_mask_policy = cache_config.scm_steps_mask_policy if num_inference_steps > 1 else None
   
  • Also added the missing _repeated_blocks = ["StableAudioDiTBlock"] to StableAudioDiTModel to enable regional compilation and backend patching.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf50517d5d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/cache/cache_dit_backend.py
@SamitHuang SamitHuang added the ready label to trigger buildkite CI label Feb 12, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

fix DCO please

@hsliuustc0106 hsliuustc0106 removed the ready label to trigger buildkite CI label Feb 12, 2026
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…g warmup"

This reverts commit e4c5a1f.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@akshatvishu
Copy link
Copy Markdown
Contributor Author

@hsliuustc0106 Sorry! I've updated it !

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…p/cache-dit

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…tion

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition -- the backend code looks correct after the Pattern_3 + cache_config fixes, but the docs table has a column count mismatch that will render broken.

Comment thread docs/user_guide/diffusion_acceleration.md Outdated
Comment thread docs/user_guide/diffusion_acceleration.md Outdated
Comment thread vllm_omni/diffusion/cache/cache_dit_backend.py Outdated
Comment thread vllm_omni/diffusion/cache/cache_dit_backend.py
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@Gaohan123
Copy link
Copy Markdown
Collaborator

@akshatvishu Please resolve reviews and conficts. Thanks!

@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 17, 2026
@linyueqian
Copy link
Copy Markdown
Collaborator

@akshatvishu any updates?

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@akshatvishu
Copy link
Copy Markdown
Contributor Author

@linyueqian Ready to go from my side ! Happy to conduct any more test if needed!

@akshatvishu
Copy link
Copy Markdown
Contributor Author

mkdocs ci were failing due to :

ERROR - mkdocstrings: Couldn't load inventory https://psutil.readthedocs.io/en/stable/objects.inv through handler 'python': HTTP Error 404: Not Found

Since main already has the fix, pulled the latest changes.

@linyueqian linyueqian added the ready label to trigger buildkite CI label Mar 24, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

resolve conflicts please

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@akshatvishu
Copy link
Copy Markdown
Contributor Author

@hsliuustc0106 done!

Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SamitHuang SamitHuang merged commit c9e8411 into vllm-project:main Apr 11, 2026
7 of 8 checks passed
@akshatvishu
Copy link
Copy Markdown
Contributor Author

thanks for all the reviews @linyueqian !

daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants