[Bugfix][StableAudio] Pass model_class_name to Omni() and declare audio class attrs#3406
Open
linyueqian wants to merge 5 commits into
Open
[Bugfix][StableAudio] Pass model_class_name to Omni() and declare audio class attrs#3406linyueqian wants to merge 5 commits into
linyueqian wants to merge 5 commits into
Conversation
…io class attrs The L4 nightly test_stable_audio_quantization_and_teacache fails with 'image' != 'audio'. PR vllm-project#2077 added an engine branch in async_omni_engine._create_default_diffusion_stage_cfg that sets final_output_type='audio' when kwargs has model_class_name pointing at a pipeline whose support_audio_output is True, and tightened the test assertion. The model_class_name auto-resolution from model_index.json runs later (in OmniDiffusionConfig.enrich_config); by the time it runs, the default stage cfg's final_output_type is already locked to 'image'. Mirror the AudioX offline test, which already passes model_class_name='AudioXPipeline' explicitly. Also align StableAudioPipeline with AudioXPipeline by declaring support_audio_output and audio_sample_rate as class attributes (the latter is read by diffusion_engine._audio_mm to populate multimodal_output['audio_sample_rate']). Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Signed-off-by: Yueqian Lin <linyueqian@outlook.com>
Collaborator
Author
|
Closing+reopening to retrigger RTD with the now-exposed pull/3406/head ref (RTD's earlier attempts raced GitHub's async ref propagation). |
Collaborator
hsliuustc0106
left a comment
There was a problem hiding this comment.
LGTM. The root cause analysis is clear (auto-resolution of model_class_name happens after default stage cfg is built), and the fix is minimal and targeted. Adding explicit class attributes for support_audio_output and audio_sample_rate is a nice cleanup that aligns with AudioXPipeline pattern.
Collaborator
|
Hi @linyueqian, friendly reminder — this PR hasn't had any activity (commits or reviews) in the past 9 days. 🕐 Could you please provide an update?
Thanks for your contribution! 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
The L4 nightly run for
test_stable_audio_quantization_and_teacache(build 9093) fails with:#2077 added a branch in
async_omni_engine._create_default_diffusion_stage_cfgthat sets the default stage'sfinal_output_type="audio"whenkwargs["model_class_name"]resolves to a pipeline whosesupport_audio_outputflag isTrue, and tightened the stable-audio assertion to== "audio". The catch:OmniDiffusionConfig.enrich_config()is what auto-resolvesmodel_class_namefrommodel_index.json, and it runs after the default stage cfg is built. So at the time the engine branches onkwargs.get("model_class_name", None)it's stillNone, theelsearm fires, and the outer stage carriesfinal_output_type="image".The companion
tests/e2e/offline_inference/test_audiox_model.pyalready side-steps this by passingmodel_class_name="AudioXPipeline"explicitly intoOmni(). Mirror the same pattern in the stable-audio test.While I was there, also align
StableAudioPipeline's class header withAudioXPipeline's by declaring the audio-output contract explicitly:support_audio_output: ClassVar[bool] = True— currently inherited from theSupportAudioOutputProtocol, which works because Protocol class attributes carry through subclasses, but making it explicit matches the AudioX/OmniVoice pattern and removes the dependency on Protocol-default-attribute semantics.audio_sample_rate: ClassVar[int] = 44100— picked up bydiffusion_engine._audio_mmsomultimodal_output[\"audio_sample_rate\"]is populated; downstream consumers no longer need to hardcode 44.1 kHz for Stable Audio Open.Verification
On h20-server-0 against
vllm-project/vllm-omni:main(3c85ca55):supports_audio_output(\"StableAudioPipeline\")True(Protocol inheritance already provided the flag, so the class-attr addition is defensive, not load-bearing for the pass/fail)_create_default_diffusion_stage_cfg(kwargs)withkwargs={\"model\": \"...\"}final_output_type=\"image\"becausekwargs[\"model_class_name\"]isNone(auto-resolution hasn't run) → matches the failing assertionkwargs={\"model\": \"...\", \"model_class_name\": \"StableAudioPipeline\"}final_output_type=\"audio\"✓Test Plan
tests/e2e/offline_inference/test_stable_audio_expansion.py::test_stable_audio_quantization_and_teacacheshould now go green on the next L4 nightly. The companiontest_audiox_modelis unchanged and should still pass.The full L4 stable-audio inference run was not exercised on h20 (the 16 GB FP8 weights + tea_cache combination is L4-shaped), but the prompt-shape mismatch that produced the assertion is fully reproducible with a Python-only check of
_create_default_diffusion_stage_cfg's output.Essential Elements of an Effective PR Description Checklist