[Feat]add cpu-offload/layerwise-offload for stable-audio-open & fix output inconsistency with same seed by sphinxkkkbc · Pull Request #2909 · vllm-project/vllm-omni

sphinxkkkbc · 2026-04-19T03:44:45Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

1.Add cpu-offloading(layerwise-offload) for stable-audio-open
2.fix output inconsistency with same seed

Test Plan

python vllm-omni/examples/offline_inference/text_to_audio/text_to_audio.py \
  --model stabilityai/stable-audio-open-1.0 \
  --prompt "The sound of a dog barking" \
  --enable-cpu-offload \
  --audio-length 10.0 \
  --num-inference-steps 100 \
  --guidance-scale 7.0 \
  --seed 42 \
  --output dog_barking_cpu_offload.wav

python vllm-omni/examples/offline_inference/text_to_audio/text_to_audio.py \
  --model stabilityai/stable-audio-open-1.0 \
  --prompt "The sound of a dog barking" \
  --enable-layerwise-offload \
  --audio-length 10.0 \
  --num-inference-steps 100 \
  --guidance-scale 7.0 \
  --seed 42 \
  --output dog_barking_layerwise_offload.wav

python vllm-omni/examples/offline_inference/text_to_audio/text_to_audio.py \
  --model stabilityai/stable-audio-open-1.0 \
  --prompt "The sound of a dog barking" \
  --audio-length 10.0 \
  --num-inference-steps 100 \
  --guidance-scale 7.0 \
  --seed 42 \
  --output dog_barking.wav

Test Result

Offload Strategy	Peak Memory	Generation Time	Output Wav
LayerWise Offload	11.00 GB reserved, 5.70 GB allocated	18.59s	dog_barking_layerwise_offload.wav
CPU Offload(ModelWise)	11.70 GB reserved, 7.39 GB allocated	12.20s	dog_barking_cpu_offload.wav
No Offload	12.81 GB reserved, 7.60 GB allocated	9.30s	dog_barking.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector · 2026-04-19T03:44:51Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc · 2026-04-20T08:27:47Z

@hsliuustc0106 PTAL, any other test to do?

sphinxkkkbc · 2026-04-21T03:31:35Z

previous implementation of stable-audio-open doesn't generate same output with same seed, i added eval() and generator in denoising loop, but it still output different and regularly presents two outputs with subtle differences like this. @linyueqian does this happened before?These are the results of my three experiments

sphinxkkkbc · 2026-04-22T11:49:13Z

@hsliuustc0106 @linyueqian I've implemented CPU offloading for stable-audio-open. During testing, I noticed that even without offloading, the output with the same seed can be inconsistent across runs (see screenshots above). I tried adding

self.transformer.eval()

and a generator

 latents = self.scheduler.step(noise_pred, t, latents, generator).prev_sample

in the denoising loop, but the issue persists. Any advice would be appreciated. Thanks!

linyueqian · 2026-04-23T13:55:46Z

can you check with HF original implementation? a side by side comparation of embeddings in each step may help

sphinxkkkbc · 2026-04-23T14:50:12Z

can you check with HF original implementation? a side by side comparation of embeddings in each step may help

thanks, I'll check it

sphinxkkkbc · 2026-04-25T03:29:43Z

can you check with HF original implementation? a side by side comparation of embeddings in each step may help

thanks, I'll check it

@linyueqian I've checked the hf implementation, the difference is that the generator is not included in scheduler.step.
before:

latents = self.scheduler.step(noise_pred, t, latents).prev_sample

after:

latents = self.scheduler.step(noise_pred, t, latents, generator).prev_sample

Should I include it in this PR? Also, the CPU offloading code is ready, could you help review it? Thanks

linyueqian · 2026-04-25T04:01:12Z

yes please include it in this pr you can revise the desc and title a bit. thanks!

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc · 2026-04-25T04:35:30Z

yes please include it in this pr you can revise the desc and title a bit. thanks!

done, CI is passed

sphinxkkkbc · 2026-05-03T02:28:33Z

can this PR move forward? if any remaining issues, please let me know, thanks!

linyueqian · 2026-05-03T02:34:41Z


            # Scheduler step
-            latents = self.scheduler.step(noise_pred, t, latents).prev_sample
+            latents = self.scheduler.step(noise_pred, t, latents, generator).prev_sample


[suggestion] Worth adding a small regression test that pins this fix. Run the pipeline twice with the same torch.Generator(...).manual_seed(42) and assert the audio tensors are bitwise equal (or torch.allclose with tight tolerance). Without it, a future contributor could drop generator again and we'd silently regress to non-deterministic outputs.

The existing tests/e2e/offline_inference/test_diffusion_layerwise_offload.py and test_diffusion_cpu_offload.py are good neighbors for this; they only parametrize riverclouds/qwen_image_random today. Adding stable-audio-open there with a determinism assertion would cover both this fix and the new offload paths in one shot.

linyueqian · 2026-05-03T02:37:57Z

please add a test as suggested thanks

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

linyueqian · 2026-05-04T14:25:44Z

please fix ci and dco.

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc · 2026-05-04T16:52:05Z

New CI failed even at weight size assertion in weight loading stage – looks like a recently introduced bug. I'll fix it later

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

…erence Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

…llm-omni into feature/add-cpu-offloading

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc · 2026-05-05T11:58:04Z

@linyueqian The latest version works as expected. The AMD Ci failure seems unrelated to this pr. Changes are listed below, if there's better way to fix these, please let me know, thanks.

[Diffusion] [Model] Support AudioX #2077 added a GaussianFourierProjection module to the Stable Audio transformer, but its parameter shape did not match the checkpoint weight shape. I added a narrow preprocessing helper to restore the trailing singleton dimension when needed.
The official Stable Audio scheduler uses final_sigmas_type="zero", so the final CosineDPM step asks torchsde for Brownian noise over the sigma_min -> 0 interval. On CUDA, this out-of-range interval only emits a warning and produces zero noise, while on ROCm it can raise a RecursionError. I added a scheduler wrapper that keeps the official schedule unchanged and intercepts only this final sigma_min -> 0 step, substituting zero noise to match CUDA behavior.

One remaining question: In AMD CI, reserved memory appears to be an outlier, I temporarily set its threshold to None, while allocated memory matches the expected CPU-offload savings. Should we use allocated memory instead of reserved memory for the CPU-offload memory assertion? Or may need to re-examine the memory activity during model-wise cpu offloading.

linyueqian · 2026-05-05T14:50:11Z

thanks! i have merged it.

linyueqian · 2026-05-06T02:16:18Z

@sphinxkkkbc can you help fix ci issue here https://buildkite.com/vllm/vllm-omni/builds/8939/canvas?sid=019df8f8-fd12-4f20-abd4-666554b40e4d&tab=output

…utput inconsistency with same seed (vllm-project#2909) Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com> Co-authored-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc requested a review from hsliuustc0106 as a code owner April 19, 2026 03:44

sphinxkkkbc changed the title ~~[Fear]add cpu-offload/layerwise-offload for stable-audio-open~~ [Feat]add cpu-offload/layerwise-offload for stable-audio-open Apr 19, 2026

sphinxkkkbc mentioned this pull request Apr 19, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

add cpu-offload/layerwise-offload for stable-audio-open

791aedf

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc force-pushed the feature/add-cpu-offloading branch from b88e550 to 791aedf Compare April 19, 2026 06:44

BBuf mentioned this pull request Apr 20, 2026

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 BBuf/how-to-optim-algorithm-in-cuda#14

Open

sphinxkkkbc force-pushed the feature/add-cpu-offloading branch from 9b09453 to 831e96e Compare April 21, 2026 03:17

sphinxkkkbc force-pushed the feature/add-cpu-offloading branch 3 times, most recently from 831e96e to 791aedf Compare April 23, 2026 02:07

Merge upstream/main into feature/add-cpu-offloading

36525c4

linyueqian added the ready label to trigger buildkite CI label Apr 25, 2026

fix output mismatch with same seed

702309c

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc changed the title ~~[Feat]add cpu-offload/layerwise-offload for stable-audio-open~~ [Feat]add cpu-offload/layerwise-offload for stable-audio-open & fix output mismatch with same seed Apr 25, 2026

sphinxkkkbc changed the title ~~[Feat]add cpu-offload/layerwise-offload for stable-audio-open & fix output mismatch with same seed~~ [Feat]add cpu-offload/layerwise-offload for stable-audio-open & fix output inconsistency with same seed Apr 25, 2026

hsliuustc0106 removed the ready label to trigger buildkite CI label Apr 29, 2026

linyueqian added the ready label to trigger buildkite CI label May 3, 2026

linyueqian reviewed May 3, 2026

View reviewed changes

add e2e test

4882d6a

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

Gaohan123 removed this from the v0.20.0 milestone May 4, 2026

sphinxkkkbc added 2 commits May 4, 2026 23:25

fix e2e offline inference tests for cpu/layerwise offload

ddd14e0

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

fix pre-commit

9d2c592

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc force-pushed the feature/add-cpu-offloading branch 3 times, most recently from 3235e53 to 9d2c592 Compare May 4, 2026 15:49

add device_type for memory move

64e0aa1

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

sphinxkkkbc force-pushed the feature/add-cpu-offloading branch from 1010b82 to 64e0aa1 Compare May 4, 2026 15:55

sphinxkkkbc and others added 2 commits May 4, 2026 23:57

fix pre-commit

e9b71c4

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

Merge branch 'main' into feature/add-cpu-offloading

d9bbb2d

sphinxkkkbc and others added 10 commits May 5, 2026 11:52

fix param and weight shape mismatch

a45a021

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

fix pre-commit

53a8cf0

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

Merge branch 'main' into feature/add-cpu-offloading

fdb1cdc

add schedulerwrapper for numeric overflow in dummy run and normal inf…

f9d3b33

…erence Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

Merge branch 'feature/add-cpu-offloading' of github.com:sphinxkkkbc/v…

855b792

…llm-omni into feature/add-cpu-offloading

pre-commit

cbee1a5

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

add/change offload threshold

5453c66

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

change monitor start after runner init

2ca0592

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

reset change and set amd cpu offload threshold to none

5339f5b

Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>

Merge branch 'main' into feature/add-cpu-offloading

1cc8bc9

linyueqian approved these changes May 5, 2026

View reviewed changes

linyueqian merged commit a0918ce into vllm-project:main May 5, 2026
7 of 8 checks passed

xiaohajiayou mentioned this pull request May 6, 2026

[BugFix] Fix Whitelist optimization CI failure #3290

Merged

5 tasks

linyueqian mentioned this pull request May 6, 2026

[CI][Bugfix] Relax stable-audio layerwise offload determinism tolerance to 1e-2 #3371

Merged

Conversation

sphinxkkkbc commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 19, 2026

Uh oh!

sphinxkkkbc commented Apr 20, 2026

Uh oh!

sphinxkkkbc commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sphinxkkkbc commented Apr 22, 2026

Uh oh!

linyueqian commented Apr 23, 2026

Uh oh!

sphinxkkkbc commented Apr 23, 2026

Uh oh!

sphinxkkkbc commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linyueqian commented Apr 25, 2026

Uh oh!

sphinxkkkbc commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sphinxkkkbc commented May 3, 2026

Uh oh!

linyueqian May 3, 2026

Choose a reason for hiding this comment

Uh oh!

linyueqian commented May 3, 2026

Uh oh!

linyueqian commented May 4, 2026

Uh oh!

sphinxkkkbc commented May 4, 2026

Uh oh!

sphinxkkkbc commented May 5, 2026

Uh oh!

Uh oh!

linyueqian commented May 5, 2026

Uh oh!

linyueqian commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sphinxkkkbc commented Apr 19, 2026 •

edited

Loading

sphinxkkkbc commented Apr 21, 2026 •

edited

Loading

sphinxkkkbc commented Apr 25, 2026 •

edited

Loading

sphinxkkkbc commented Apr 25, 2026 •

edited

Loading