[CI/Build] Move quantization tests to tests/quantization/ and align markers#2615
[CI/Build] Move quantization tests to tests/quantization/ and align markers#2615pjh4993 wants to merge 4 commits into
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Thank you for your catch. Maybe we should move tests/diffusion/quantization/ to tests/quantization/ because of #1764 |
74637ad to
fdfb4e6
Compare
|
Thanks for the review @david6666666! You're right — since #1764 unified the quantization framework, the test directory should mirror the source layout. I've updated the PR to:
PTAL when you have a chance. |
|
Added ready label, you can check each test been run. |
| @@ -263,7 +263,7 @@ outputs = omni.generate( | |||
|
|
|||
There was a problem hiding this comment.
Could you also modify the following script?
tests/e2e/offline_inference/test_quantization_fp8.py
tests/e2e/offline_inference/run_quantization_e2e.sh
There was a problem hiding this comment.
Thanks for pointing this out @yenuo26!
I looked into both files:
tests/e2e/offline_inference/test_quantization_fp8.py:
- E2E tests for the unified quantization framework ([Core] Unified quantization framework #1764)
- Individual test functions use
@hardware_test(res={"cuda": "L4"})/@hardware_test(res={"cuda": "H100"})— so they have proper hardware markers at the function level - However, the module-level marker is
pytestmark = [pytest.mark.core_model, pytest.mark.diffusion]— still uses the olddiffusionmarker - Not referenced in any Buildkite pipeline step (same CI gap as the unit tests)
- The generic
CUDA Unit Teststep explicitly--ignore=tests/e2e, so these won't be picked up automatically
tests/e2e/offline_inference/run_quantization_e2e.sh:
- Shell script wrapper for manually running the above tests
- Not referenced by Buildkite
Both files have the same CI coverage gap, but they're e2e tests (L3/L4 tier) rather than unit tests — so the fix approach is different. They'd need a dedicated Buildkite step in test-merge.yml or test-nightly.yml rather than relying on marker-based collection.
Could you clarify what you'd like changed? Some options:
- Marker cleanup only — update
pytestmarkto dropdiffusionand align with the individual@hardware_testmarkers (minimal, fits this PR) - Move + marker + CI step — move to
tests/quantization/, update markers, and add a Buildkite step (bigger scope, maybe a follow-up PR) - Something else?
7809058 to
627880a
Compare
| _marks = hardware_marks(res={"cuda": "H100"}) | ||
|
|
||
|
|
||
| @pytest.mark.advanced_model |
There was a problem hiding this comment.
you should add this test to vllm-omni/.buildkite/test-nightly.yml
There was a problem hiding this comment.
Added the quality test to test-nightly-diffusion.yml (not test-nightly.yml directly, since diffusion tests are dynamically uploaded via test-nightly-diffusion.yml since #2582).
Split it by model group to match the existing nightly structure:
- Diffusion · Other · Quantization Quality Test — runs
-k "z_image or flux" - Diffusion · Qwen-Image · Quantization Quality Test — runs
-k "qwen_image"
| from vllm_omni.quantization.factory import SUPPORTED_QUANTIZATION_METHODS | ||
|
|
||
| pytestmark = [pytest.mark.core_model, pytest.mark.diffusion] | ||
| pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4] |
There was a problem hiding this comment.
| pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4] | |
| pytestmark = [pytest.mark.core_model, pytest.mark.cpu] |
I think it is a cpu test
There was a problem hiding this comment.
Agreed — most tests here are pure config/factory tests that only need CPU. However, some of the tests added with hardware specific decorators (e.g. @cuda_availble, @npu_available)
Split the file into two:
test_int8_config.py(core_model, cpu) — config builder and mock-based unit teststest_int8_smoke.py(core_model, cuda, L4) — real hardware smoke tests with@cuda_available/@npu_availableskipif guards
This follows the codebase pattern where files are either fully CPU or fully CUDA (e.g. test_cuda_graph_decoder.py), rather than mixing tiers in one file.
627880a to
fc75294
Compare
|
Thx for the review @david6666666! Addressed both comments:
Here's the full CI mapping for the
PTAL when you have a chance. |
fc75294 to
516378e
Compare
|
https://buildkite.com/vllm/vllm-omni/builds/6405/steps/canvas 2 Level 4 Quantization test case failed @lishunyang12 @pjh4993 |
|
Hi @hsliuustc0106 👋 While debugging build #6405 for this PR, I traced the fp8_flux failure to a gated-repo access issue and wanted to check how it should be handled before proposing a fix. What's happening The new
What I checked
Questions
Happy to take whichever direction you prefer. Thanks! CC: @david6666666 |
516378e to
b33677f
Compare
|
Hi @david6666666 👋. I would like to share some analysis for the test case failures. Update on build #6405 failures Quick status after investigating the two failing steps in build #6405. ✅ Resolved — Qwen-Image quantization step GPU leak The A sibling
Fix: commit b33677f ( Post-test cleanup prints and orchestrator shuts down cleanly — no leaked processes. The retry scenario that caused the 59 GiB leak on CI is now guarded. After the cleanup fix, the Qwen-Image run progresses end-to-end and fails on a clean LPIPS assertion, not OOM: Reproduced locally for all three fp8 cases in the suite:
All three models hit the same class of failure (LPIPS 0.80–1.03, catastrophically off the BF16 baseline), matching CI's Next steps for this PR
|
b33677f to
662eb54
Compare
…arkers The unified quantization framework (vllm-project#1764) consolidated source code at vllm_omni/quantization/, but tests were still under tests/diffusion/quantization/, and they had no Buildkite CI coverage. This PR: - Moves tests/diffusion/quantization/ to tests/quantization/ to mirror the source layout. - Aligns pytest markers with the actual test type: * test_int8_config.py: core_model + cuda + L4 (GPU smoke test) * test_inc_config.py: core_model + cpu (pure config builder) * test_fp8_config.py: core_model + cpu (drop redundant diffusion marker) * test_gguf_config.py: core_model + cpu (drop redundant diffusion marker) - Updates the test docstring and contributing doc to reference the new path. After this change, the existing CUDA Unit Test with single card step (pytest -m 'core_model and cuda and L4 and not distributed_cuda') will automatically pick up the GPU quantization tests, and the Simple Unit Test step will pick up the CPU ones — so no dedicated Buildkite step is needed. Fixes vllm-project#2614 Signed-off-by: pjh4993 <pjh4993@naver.com>
Split quantization quality tests by model group in test-nightly-diffusion.yml: - Other group: Z-Image and FLUX FP8 quality tests - Qwen-Image group: Qwen-Image FP8 quality test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: pjh4993 <pjh4993@naver.com>
662eb54 to
9d3885f
Compare
…smoke tests Separate test_int8_config.py into two files aligned with codebase conventions: - test_int8_config.py (core_model, cpu): pure config/factory unit tests using mocks - test_int8_smoke.py (core_model, cuda, L4): real hardware smoke tests with @cuda_available and @npu_available skipif guards Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: pjh4993 <pjh4993@naver.com>
Set VLLM_TEST_CLEAN_GPU_MEMORY=1 on the qwen-image quantization quality test step so the autouse conftest fixture reclaims the runner GPU before each test. Without it, a failed first attempt can leave a StageDiffusionProc child holding tens of GiB, and the in-session retry then hits a spurious CUDA OOM during weight loading (observed in build #6405 as a 59 GiB leaked sibling process on an A100 runner). Signed-off-by: pjh4993 <pjh4993@naver.com>
9d3885f to
16db77c
Compare
|
#2791 has been merged |
lishunyang12
left a comment
There was a problem hiding this comment.
Review: [CI/Build] Move quantization tests to tests/quantization/ and align markers
Overall this is a well-structured PR that correctly addresses the directory mismatch noted after #1764. The separation of CPU config tests from GPU smoke tests is a good design choice, and the marker alignment means these tests get picked up by existing CI steps without needing dedicated pipeline entries.
What looks good
- Directory restructure mirrors the source layout (
vllm_omni/quantization/->tests/quantization/), making the project easier to navigate. - Marker cleanup is correct: dropping the
diffusionmarker from pure config tests (test_fp8_config.py,test_gguf_config.py), addingcputotest_inc_config.pywhich previously had no environment marker, and tagging GPU smoke tests withcuda + L4. - Splitting
test_int8_config.pyinto config-only tests +test_int8_smoke.pyfor real-hardware tests is a clean separation of concerns. - Nightly pipeline additions for quantization quality tests are properly structured with the correct node selector, HF token injection, and lpips dependency installation.
- Doc updates in
adding_quantization_model.mdcorrectly reference the new test paths.
Minor observations (non-blocking)
-
NPU test classes under CUDA markers:
test_int8_smoke.pyhas module-levelpytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4], butTestNPUInt8LinearMethodandTestNPUInt8Smokeare decorated with@npu_available(skipif not NPU). This means CUDA CI will collect these NPU test classes and then immediately skip every test in them. It works correctly but adds noise to test reports. Consider either (a) moving NPU tests to a separate file with NPU-appropriate markers, or (b) adding a class-levelpytestmarkoverride on the NPU classes. Low priority since it does not affect correctness. -
Buildkite failure: Build #6514 is currently failing. Based on the discussion in PR comments, this appears to be the pre-existing fp8 numerical regression tracked in #2728, not something introduced by this PR. The suggestion to land with
pytest.mark.xfailon the fp8 quality cases (linking to the tracking issue) seems like the right approach to avoid blocking this infrastructure improvement.
No blocking issues found. The changes are clean and well-motivated.
Purpose
The unified quantization framework (#1764) consolidated source code at
vllm_omni/quantization/, but tests were still undertests/diffusion/quantization/and had no Buildkite CI coverage.This PR addresses both the CI coverage gap from #2614 and the directory mismatch noted by @david6666666 in the PR review.
Fixes #2614
Changes
tests/diffusion/quantization/→tests/quantization/to mirror the source layouttest_int8_config.py:core_model + cuda + L4(GPU smoke test)test_inc_config.py:core_model + cpu(pure config builder, no GPU needed)test_fp8_config.py:core_model + cpu(drop redundantdiffusionmarker)test_gguf_config.py:core_model + cpu(drop redundantdiffusionmarker)After this change, the existing CUDA Unit Test with single card step (
pytest -m 'core_model and cuda and L4 and not distributed_cuda') automatically picks up the GPU quantization tests, and Simple Unit Test picks up the CPU ones — so no dedicated Buildkite step is needed.Test Plan
Test Result
Buildkite CI will provide the authoritative L4 result.