[feature] stable_audio_open_1 teacache support#1314
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0eca24da5c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
The kaggle_notebook is not valid. @akshatvishu These commits look good to me. Can you also run the online serving example with stable_audio_open with teacache enabled? Just to verify it works correctly. |
Do you mean this : https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_audio ? Because I can't find any |
Can you please elaborate further? Do you mean it's not opening? |
Oh, I forgot that the text-to-audio does not support online serving. My bad. Here is the message I got after clicking kaggle_notebook: "We can't find that page. You can search Kaggle above or visit our homepage." |
My bad! I forget to turn-on the global sharing setting for the notebook at kaggle! Here, is the notebook and it should work: https://www.kaggle.com/code/akshatnayak/vllm-omni-sao-teacache And if you scroll-down to the cell with the teacache config in the same notebook; it shows this message in the logs : [Stage-0] INFO 02-10 13:18:16 [backend.py:117] TeaCache applied with rel_l1_thresh=0.2, transformer_class=StableAudioDiTModel |
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…tialization during estimation Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
e93a950 to
a168624
Compare
lishunyang12
left a comment
There was a problem hiding this comment.
Solid first pass for audio TeaCache integration; a few things worth addressing before merge.
Extractor: - Trim docstring to architecture notes only - Drop hardcoded C=64, use C=in_channels instead - Replace attention_mask patching with an assertion - Add comment explaining why plain LayerNorm is sufficient as cache signal - Slice postprocess output with original_seq_len instead of hardcoded 1: Coefficient estimator: - Fix DiffusersPipelineLoader instantiation to take LoadConfig, not pipeline - Move StableAudioPipeline import to module level to match BagelAdapter style Docs: - Mark CFG-Parallel as blank for Stable Audio Open instead of unsupported Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
4ebf05e to
e24cb2c
Compare
…ompts list Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
…ompt Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Replaces the placeholder identity coefficients for Stable Audio Open 1.0 with empirically estimated coefficients. Data collected over 60 stress-test prompts (2940 transitions) using a 4th-order polynomial fit. Input Diffs: min=0.12, max=0.53, mean=0.47 Output Diffs: min=0.13, max=1.69, mean=0.37" Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
|
@vllm-omni-reviewer |
|
@akshatvishu Hello, any updates? |
This is ready from my side! |
|
Please resolve conflicts @akshatvishu We can run CI again. |
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
|
I've resolved the conflicts. Thank you for the reviews and guidance, @wtomin and @lishunyang12 . |
|
I see that the AMD CI jobs are currently failing due to a missing IAM permission on the AWS account: AccessDeniedException: User: arn:aws:iam::936637512419:user/ecr-public-read
is not authorized to perform: ecr-public:GetAuthorizationToken on resource: *It looks like the IAM identity
From AWS documentation and similar cases, this error typically occurs when those permissions are not granted (usually with "Resource": "*"). In case it helps, this is the minimal policy commonly used: {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr-public:GetAuthorizationToken",
"sts:GetServiceBearerToken"
],
"Resource": "*"
}
]
}ref: |
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Signed-off-by: akshatvishu <akshatnayak197@gmail.com>
Part of #1217
Purpose
Add teacache support for stable audio open 1.0
Test Plan
full comprehensive testing can be found in this kaggle_notebook
Test Result
Device:
cudaGPU:
NVIDIA Tesla T4Coefficient :
[0.0, 0.0, 0.0, 1.0, 0.0](identity)Prompt :
Glass bottle shatteringnum_inference_steps=
50guidance_scale=
7.0,max_audio_length =
5 secondsFiles are in
.mp3format as github doesn't support.wavin comments.Note :
tango_flux(Text-to-Audio (TTA) generative mode) from https://github.com/ali-vilab/TeaCache/blob/main/TeaCache4TangoFlux/teacache_tango_flux.py didn't led to any significant performance gain then using identity coefficients as the architecture don't match.(details can be find in above attached kaggle notebook)ToDo:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)