[CI/Build] Add Buildkite step for diffusion quantization tests#9
Open
[CI/Build] Add Buildkite step for diffusion quantization tests#9
Conversation
b33677f to
662eb54
Compare
…arkers The unified quantization framework (vllm-project#1764) consolidated source code at vllm_omni/quantization/, but tests were still under tests/diffusion/quantization/, and they had no Buildkite CI coverage. This PR: - Moves tests/diffusion/quantization/ to tests/quantization/ to mirror the source layout. - Aligns pytest markers with the actual test type: * test_int8_config.py: core_model + cuda + L4 (GPU smoke test) * test_inc_config.py: core_model + cpu (pure config builder) * test_fp8_config.py: core_model + cpu (drop redundant diffusion marker) * test_gguf_config.py: core_model + cpu (drop redundant diffusion marker) - Updates the test docstring and contributing doc to reference the new path. After this change, the existing CUDA Unit Test with single card step (pytest -m 'core_model and cuda and L4 and not distributed_cuda') will automatically pick up the GPU quantization tests, and the Simple Unit Test step will pick up the CPU ones — so no dedicated Buildkite step is needed. Fixes vllm-project#2614 Signed-off-by: pjh4993 <pjh4993@naver.com>
Split quantization quality tests by model group in test-nightly-diffusion.yml: - Other group: Z-Image and FLUX FP8 quality tests - Qwen-Image group: Qwen-Image FP8 quality test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: pjh4993 <pjh4993@naver.com>
662eb54 to
9d3885f
Compare
…smoke tests Separate test_int8_config.py into two files aligned with codebase conventions: - test_int8_config.py (core_model, cpu): pure config/factory unit tests using mocks - test_int8_smoke.py (core_model, cuda, L4): real hardware smoke tests with @cuda_available and @npu_available skipif guards Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: pjh4993 <pjh4993@naver.com>
Set VLLM_TEST_CLEAN_GPU_MEMORY=1 on the qwen-image quantization quality test step so the autouse conftest fixture reclaims the runner GPU before each test. Without it, a failed first attempt can leave a StageDiffusionProc child holding tens of GiB, and the in-session retry then hits a spurious CUDA OOM during weight loading (observed in build #6405 as a 59 GiB leaked sibling process on an A100 runner). Signed-off-by: pjh4993 <pjh4993@naver.com>
9d3885f to
16db77c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Add a Buildkite pipeline step for
tests/diffusion/quantization/which was missing fromtest-ready.yml. These tests (added in vllm-project#1470, refactored in vllm-project#1764) havecore_modelanddiffusionmarkers but were never wired into CI, so breakages went undetected.Fixes #8
(upstream: Fixes vllm-project#2614)
Test Plan
The change is a CI config addition — no local test needed. Validation will happen when Buildkite runs the new step on a PR with the
readylabel.The new step runs:
timeout 15m pytest -s -v tests/diffusion/quantization/ -m 'core_model' --run-level core_modelTest Result
N/A — CI-only change. The step uses
gpu_1_queue(L4 GPU), matching the pattern of other diffusion test steps.