Skip to content

[feature] stable_audio_open_1 teacache support#1314

Merged
linyueqian merged 14 commits into
vllm-project:mainfrom
akshatvishu:sao_teacache
Mar 24, 2026
Merged

[feature] stable_audio_open_1 teacache support#1314
linyueqian merged 14 commits into
vllm-project:mainfrom
akshatvishu:sao_teacache

Conversation

@akshatvishu
Copy link
Copy Markdown
Contributor

@akshatvishu akshatvishu commented Feb 10, 2026

Part of #1217

Purpose

Add teacache support for stable audio open 1.0

Test Plan

omni = Omni(
    model="stabilityai/stable-audio-open-1.0",
    dtype="float16",
    num_workers=1,
    cache_backend="tea_cache",
    cache_config={"rel_l1_thresh": 0.2,"coefficients": [0.0, 0.0, 0.0, 1.0, 0.0],},
)

sampling_params = OmniDiffusionSamplingParams(
    num_inference_steps=50,
    guidance_scale=7.0,
    seed=42,
    extra_args={"audio_end_in_s": 5.0}
)

outputs = omni.generate(
    {"prompt": "Glass bottle shattering", "negative_prompt": "Low quality"},
    sampling_params
)

full comprehensive testing can be found in this kaggle_notebook

Test Result

  • Device: cuda

  • GPU: NVIDIA Tesla T4

  • Coefficient : [0.0, 0.0, 0.0, 1.0, 0.0] (identity)

  • Prompt : Glass bottle shattering

  • num_inference_steps=50

  • guidance_scale=7.0,

  • max_audio_length = 5 seconds

Configuration Time Speed Up file (mp3)
Baseline(no teacache) 15.32s - baseline_audio.mp3
rel_l1_thresh = 0.2 14.77s 1.04x teacache_audio_identity_0_2.mp3
rel_l1_thresh = 0.4 13.64s 1.12x teacache_audio_identity_0_4.mp3
rel_l1_thresh = 0.6 8.67s 1.77x teacache_audio_identity_0_6.mp3
rel_l1_thresh = 0.8 8.43s 1.82x teacache_audio_identity_0_8.mp3

Files are in .mp3 format as github doesn't support .wav in comments.

Note :


ToDo:

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0eca24da5c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/cache/teacache/coefficient_estimator.py Outdated
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Feb 11, 2026

The kaggle_notebook is not valid. @akshatvishu

These commits look good to me.

Can you also run the online serving example with stable_audio_open with teacache enabled? Just to verify it works correctly.

@akshatvishu
Copy link
Copy Markdown
Contributor Author

akshatvishu commented Feb 11, 2026

Can you also run the online serving example with stable_audio_open with teacache enabled? Just to verify it works correctly.

Do you mean this : https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_audio ? Because I can't find any text_to_audio examples in https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving.

@akshatvishu
Copy link
Copy Markdown
Contributor Author

The kaggle_notebook is not valid. @akshatvishu

Can you please elaborate further? Do you mean it's not opening?

@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Feb 11, 2026

Do you mean this : https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_audio ? Because I can't find any text_to_audio examples in https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving.

Oh, I forgot that the text-to-audio does not support online serving. My bad.

Here is the message I got after clicking kaggle_notebook: "We can't find that page. You can search Kaggle above or visit our homepage."

@akshatvishu
Copy link
Copy Markdown
Contributor Author

akshatvishu commented Feb 11, 2026

Here is the message I got after clicking kaggle_notebook: "We can't find that page. You can search Kaggle above or visit our homepage."

My bad! I forget to turn-on the global sharing setting for the notebook at kaggle!

Here, is the notebook and it should work: https://www.kaggle.com/code/akshatnayak/vllm-omni-sao-teacache

And if you scroll-down to the cell with the teacache config in the same notebook; it shows this message in the logs :

[Stage-0] INFO 02-10 13:18:16 [backend.py:117] TeaCache applied with rel_l1_thresh=0.2, transformer_class=StableAudioDiTModel

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…tialization during estimation

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Comment thread vllm_omni/diffusion/cache/teacache/config.py Outdated
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid first pass for audio TeaCache integration; a few things worth addressing before merge.

Comment thread vllm_omni/diffusion/cache/teacache/extractors.py Outdated
Comment thread vllm_omni/diffusion/cache/teacache/extractors.py
Comment thread vllm_omni/diffusion/cache/teacache/extractors.py
Comment thread vllm_omni/diffusion/cache/teacache/extractors.py
Comment thread vllm_omni/diffusion/cache/teacache/coefficient_estimator.py
Comment thread vllm_omni/diffusion/cache/teacache/config.py Outdated
Comment thread docs/user_guide/diffusion_acceleration.md
Comment thread vllm_omni/diffusion/cache/teacache/extractors.py
Extractor:
- Trim docstring to architecture notes only
- Drop hardcoded C=64, use C=in_channels instead
- Replace attention_mask patching with an assertion
- Add comment explaining why plain LayerNorm is sufficient as cache signal
- Slice postprocess output with original_seq_len instead of hardcoded 1:

Coefficient estimator:
- Fix DiffusersPipelineLoader instantiation to take LoadConfig, not pipeline
- Move StableAudioPipeline import to module level to match BagelAdapter style

Docs:
- Mark CFG-Parallel as blank for Stable Audio Open instead of unsupported

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…ompts list

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…ompt

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Replaces the placeholder identity coefficients for Stable Audio Open 1.0 with
empirically estimated coefficients.

Data collected over 60 stress-test prompts (2940 transitions) using a 4th-order
polynomial fit.
Input Diffs: min=0.12, max=0.53, mean=0.47
Output Diffs: min=0.13, max=1.69, mean=0.37"

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

@vllm-omni-reviewer

@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 17, 2026
@Gaohan123
Copy link
Copy Markdown
Collaborator

@akshatvishu Hello, any updates?

@akshatvishu
Copy link
Copy Markdown
Contributor Author

@akshatvishu Hello, any updates?

This is ready from my side!

Copy link
Copy Markdown
Collaborator

@wtomin wtomin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wtomin wtomin added the ready label to trigger buildkite CI label Mar 20, 2026
@wtomin
Copy link
Copy Markdown
Collaborator

wtomin commented Mar 23, 2026

Please resolve conflicts @akshatvishu We can run CI again.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
@akshatvishu
Copy link
Copy Markdown
Contributor Author

I've resolved the conflicts. Thank you for the reviews and guidance, @wtomin and @lishunyang12 .

@akshatvishu
Copy link
Copy Markdown
Contributor Author

I see that the AMD CI jobs are currently failing due to a missing IAM permission on the AWS account:

AccessDeniedException: User: arn:aws:iam::936637512419:user/ecr-public-read
is not authorized to perform: ecr-public:GetAuthorizationToken on resource: *

It looks like the IAM identity arn:aws:iam::936637512419:user/ecr-public-read may be missing permission for:

  • ecr-public:GetAuthorizationToken
  • sts:GetServiceBearerToken

From AWS documentation and similar cases, this error typically occurs when those permissions are not granted (usually with "Resource": "*"). In case it helps, this is the minimal policy commonly used:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr-public:GetAuthorizationToken",
        "sts:GetServiceBearerToken"
      ],
      "Resource": "*"
    }
  ]
}

ref:

@linyueqian linyueqian merged commit ad5e4e5 into vllm-project:main Mar 24, 2026
7 of 8 checks passed
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants