[Feat]add cpu-offload/layerwise-offload for stable-audio-open & fix output inconsistency with same seed#2909
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
b88e550 to
791aedf
Compare
|
@hsliuustc0106 PTAL, any other test to do? |
9b09453 to
831e96e
Compare
|
previous implementation of stable-audio-open doesn't generate same output with same seed, i added eval() and generator in denoising loop, but it still output different and regularly presents two outputs with subtle differences like this. @linyueqian does this happened before?These are the results of my three experiments
|
|
@hsliuustc0106 @linyueqian I've implemented CPU offloading for stable-audio-open. During testing, I noticed that even without offloading, the output with the same seed can be inconsistent across runs (see screenshots above). I tried adding and a generator in the denoising loop, but the issue persists. Any advice would be appreciated. Thanks! |
831e96e to
791aedf
Compare
|
can you check with HF original implementation? a side by side comparation of embeddings in each step may help |
thanks, I'll check it |
@linyueqian I've checked the hf implementation, the difference is that the generator is not included in scheduler.step. after: Should I include it in this PR? Also, the CPU offloading code is ready, could you help review it? Thanks |
|
yes please include it in this pr you can revise the desc and title a bit. thanks! |
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
done, CI is passed |
|
can this PR move forward? if any remaining issues, please let me know, thanks! |
|
|
||
| # Scheduler step | ||
| latents = self.scheduler.step(noise_pred, t, latents).prev_sample | ||
| latents = self.scheduler.step(noise_pred, t, latents, generator).prev_sample |
There was a problem hiding this comment.
[suggestion] Worth adding a small regression test that pins this fix. Run the pipeline twice with the same torch.Generator(...).manual_seed(42) and assert the audio tensors are bitwise equal (or torch.allclose with tight tolerance). Without it, a future contributor could drop generator again and we'd silently regress to non-deterministic outputs.
The existing tests/e2e/offline_inference/test_diffusion_layerwise_offload.py and test_diffusion_cpu_offload.py are good neighbors for this; they only parametrize riverclouds/qwen_image_random today. Adding stable-audio-open there with a determinism assertion would cover both this fix and the new offload paths in one shot.
|
please add a test as suggested thanks |
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
|
please fix ci and dco. |
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
3235e53 to
9d2c592
Compare
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
1010b82 to
64e0aa1
Compare
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
|
New CI failed even at weight size assertion in weight loading stage – looks like a recently introduced bug. I'll fix it later |
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
…erence Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
…llm-omni into feature/add-cpu-offloading
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com>
|
@linyueqian The latest version works as expected. The AMD Ci failure seems unrelated to this pr. Changes are listed below, if there's better way to fix these, please let me know, thanks.
One remaining question: In AMD CI, reserved memory appears to be an outlier, I temporarily set its threshold to |
|
thanks! i have merged it. |
|
@sphinxkkkbc can you help fix ci issue here https://buildkite.com/vllm/vllm-omni/builds/8939/canvas?sid=019df8f8-fd12-4f20-abd4-666554b40e4d&tab=output |
…utput inconsistency with same seed (vllm-project#2909) Signed-off-by: sphinxkkkbc <binchengkang8@gmail.com> Co-authored-by: sphinxkkkbc <binchengkang8@gmail.com>



PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
1.Add cpu-offloading(layerwise-offload) for stable-audio-open
2.fix output inconsistency with same seed
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)