[feature] stable_audio_open_1 teacache support by akshatvishu · Pull Request #1314 · vllm-project/vllm-omni

akshatvishu · 2026-02-10T15:08:21Z

Part of #1217

Purpose

Add teacache support for stable audio open 1.0

Test Plan

omni = Omni(
    model="stabilityai/stable-audio-open-1.0",
    dtype="float16",
    num_workers=1,
    cache_backend="tea_cache",
    cache_config={"rel_l1_thresh": 0.2,"coefficients": [0.0, 0.0, 0.0, 1.0, 0.0],},
)

sampling_params = OmniDiffusionSamplingParams(
    num_inference_steps=50,
    guidance_scale=7.0,
    seed=42,
    extra_args={"audio_end_in_s": 5.0}
)

outputs = omni.generate(
    {"prompt": "Glass bottle shattering", "negative_prompt": "Low quality"},
    sampling_params
)

full comprehensive testing can be found in this kaggle_notebook

Test Result

Device: cuda
GPU: NVIDIA Tesla T4
Coefficient : [0.0, 0.0, 0.0, 1.0, 0.0] (identity)
Prompt : Glass bottle shattering
num_inference_steps=50
guidance_scale=7.0,
max_audio_length = 5 seconds

Configuration	Time	Speed Up	file (mp3)
Baseline(no teacache)	15.32s	-	baseline_audio.mp3
rel_l1_thresh = 0.2	14.77s	1.04x	teacache_audio_identity_0_2.mp3
rel_l1_thresh = 0.4	13.64s	1.12x	teacache_audio_identity_0_4.mp3
rel_l1_thresh = 0.6	8.67s	1.77x	teacache_audio_identity_0_6.mp3
rel_l1_thresh = 0.8	8.43s	1.82x	teacache_audio_identity_0_8.mp3

Files are in .mp3 format as github doesn't support .wav in comments.

Note :

Using coefficients of tango_flux (Text-to-Audio (TTA) generative mode) from https://github.com/ali-vilab/TeaCache/blob/main/TeaCache4TangoFlux/teacache_tango_flux.py didn't led to any significant performance gain then using identity coefficients as the architecture don't match.(details can be find in above attached kaggle notebook)

ToDo:

Calculate custom coefficients of stable audio open 1.0 as detailed in https://github.com/vllm-project/vllm-omni/blob/main/docs/contributing/features/teacache.md#customization

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0eca24da5c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

wtomin · 2026-02-11T07:14:35Z

The kaggle_notebook is not valid. @akshatvishu

These commits look good to me.

Can you also run the online serving example with stable_audio_open with teacache enabled? Just to verify it works correctly.

akshatvishu · 2026-02-11T10:49:21Z

Can you also run the online serving example with stable_audio_open with teacache enabled? Just to verify it works correctly.

Do you mean this : https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_audio ? Because I can't find any text_to_audio examples in https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving.

akshatvishu · 2026-02-11T11:42:58Z

The kaggle_notebook is not valid. @akshatvishu

Can you please elaborate further? Do you mean it's not opening?

wtomin · 2026-02-11T12:12:20Z

Do you mean this : https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_audio ? Because I can't find any text_to_audio examples in https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving.

Oh, I forgot that the text-to-audio does not support online serving. My bad.

Here is the message I got after clicking kaggle_notebook: "We can't find that page. You can search Kaggle above or visit our homepage."

akshatvishu · 2026-02-11T12:54:12Z

Here is the message I got after clicking kaggle_notebook: "We can't find that page. You can search Kaggle above or visit our homepage."

My bad! I forget to turn-on the global sharing setting for the notebook at kaggle!

Here, is the notebook and it should work: https://www.kaggle.com/code/akshatnayak/vllm-omni-sao-teacache

And if you scroll-down to the cell with the teacache config in the same notebook; it shows this message in the logs :

[Stage-0] INFO 02-10 13:18:16 [backend.py:117] TeaCache applied with rel_l1_thresh=0.2, transformer_class=StableAudioDiTModel

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

…tialization during estimation Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

lishunyang12

test

lishunyang12

Solid first pass for audio TeaCache integration; a few things worth addressing before merge.

Extractor: - Trim docstring to architecture notes only - Drop hardcoded C=64, use C=in_channels instead - Replace attention_mask patching with an assertion - Add comment explaining why plain LayerNorm is sufficient as cache signal - Slice postprocess output with original_seq_len instead of hardcoded 1: Coefficient estimator: - Fix DiffusersPipelineLoader instantiation to take LoadConfig, not pipeline - Move StableAudioPipeline import to module level to match BagelAdapter style Docs: - Mark CFG-Parallel as blank for Stable Audio Open instead of unsupported Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

…ompts list Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

…ompt Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Replaces the placeholder identity coefficients for Stable Audio Open 1.0 with empirically estimated coefficients. Data collected over 60 stress-test prompts (2940 transitions) using a 4th-order polynomial fit. Input Diffs: min=0.12, max=0.53, mean=0.47 Output Diffs: min=0.13, max=1.69, mean=0.37" Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

hsliuustc0106 · 2026-02-24T07:08:01Z

@vllm-omni-reviewer

Gaohan123 · 2026-03-17T15:35:16Z

@akshatvishu Hello, any updates?

akshatvishu · 2026-03-17T23:43:30Z

@akshatvishu Hello, any updates?

This is ready from my side!

wtomin

LGTM

wtomin · 2026-03-23T12:32:45Z

Please resolve conflicts @akshatvishu We can run CI again.

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu · 2026-03-23T13:46:37Z

I've resolved the conflicts. Thank you for the reviews and guidance, @wtomin and @lishunyang12 .

akshatvishu · 2026-03-23T16:40:08Z

I see that the AMD CI jobs are currently failing due to a missing IAM permission on the AWS account:

AccessDeniedException: User: arn:aws:iam::936637512419:user/ecr-public-read
is not authorized to perform: ecr-public:GetAuthorizationToken on resource: *

It looks like the IAM identity arn:aws:iam::936637512419:user/ecr-public-read may be missing permission for:

ecr-public:GetAuthorizationToken
sts:GetServiceBearerToken

From AWS documentation and similar cases, this error typically occurs when those permissions are not granted (usually with "Resource": "*"). In case it helps, this is the minimal policy commonly used:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr-public:GetAuthorizationToken",
        "sts:GetServiceBearerToken"
      ],
      "Resource": "*"
    }
  ]
}

ref:

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu requested a review from hsliuustc0106 as a code owner February 10, 2026 15:08

chatgpt-codex-connector Bot reviewed Feb 10, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/cache/teacache/coefficient_estimator.py Outdated

akshatvishu mentioned this pull request Feb 10, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

akshatvishu added 4 commits February 11, 2026 19:29

wip: add teacache to stable_audio_open

91eab7d

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

teacache: add placeholder coefficients for StableAudioDiTModel

280ca9e

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix(teacache): load weights in StableAudioAdapter to avoid random ini…

c9824a3

…tialization during estimation Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

docs: update model support table to include teacache for AudioGen

a168624

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

akshatvishu force-pushed the sao_teacache branch from e93a950 to a168624 Compare February 11, 2026 13:59

wtomin reviewed Feb 12, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/cache/teacache/config.py Outdated

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

akshatvishu force-pushed the sao_teacache branch from 4ebf05e to e24cb2c Compare February 23, 2026 17:54

akshatvishu added 8 commits February 23, 2026 23:50

Update collect_from_prompt to use OmniDiffusionSamplingParams with pr…

0938a03

…ompts list Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix import

2f68ce5

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Cast extractor inputs to module dtype to fix float32/float16 mismatch

9991e3d

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix(teacache): fix FP32 weight initialization in Stable Audio estimator

3c87202

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix syntax err

eef2450

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

fix new_forward not supporting bfloat16

b1c1075

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

add torch.cuda.empty_cache() in collect_from_prompt to not OOM b/w pr…

9034550

…ompt Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Gaohan123 added this to the v0.18.0 milestone Mar 17, 2026

wtomin approved these changes Mar 20, 2026

View reviewed changes

wtomin added the ready label to trigger buildkite CI label Mar 20, 2026

Merge branch 'main' into sao_teacache

9874423

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

linyueqian merged commit ad5e4e5 into vllm-project:main Mar 24, 2026
7 of 8 checks passed

zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026

[feature] stable_audio_open_1 teacache support (vllm-project#1314)

d08a943

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

zhangj1an pushed a commit to zhangj1an/vllm-omni that referenced this pull request Mar 26, 2026

[feature] stable_audio_open_1 teacache support (vllm-project#1314)

3fcb61a

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

zhangj1an mentioned this pull request Mar 31, 2026

[Test] Add Stable Audio offline e2e TeaCache Test #2377

Merged

5 tasks

linyueqian mentioned this pull request May 4, 2026

[Bug] [CI failure]: CUDA illegal memory access during dummy run for stabilityai/stable-audio-open-1.0 #3334

Closed

1 task

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[feature] stable_audio_open_1 teacache support (vllm-project#1314)

aceaed0

Signed-off-by: akshatvishu <akshatnayak197@gmail.com>

Conversation

akshatvishu commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

ToDo:

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

wtomin commented Feb 11, 2026

Uh oh!

akshatvishu commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akshatvishu commented Feb 11, 2026

Uh oh!

wtomin commented Feb 11, 2026

Uh oh!

akshatvishu commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Feb 24, 2026

Uh oh!

Gaohan123 commented Mar 17, 2026

Uh oh!

akshatvishu commented Mar 17, 2026

Uh oh!

wtomin left a comment

Choose a reason for hiding this comment

Uh oh!

wtomin commented Mar 23, 2026

Uh oh!

akshatvishu commented Mar 23, 2026

Uh oh!

akshatvishu commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

akshatvishu commented Feb 10, 2026 •

edited

Loading

akshatvishu commented Feb 11, 2026 •

edited

Loading

akshatvishu commented Feb 11, 2026 •

edited

Loading