Skip to content

[CI/Build] Move quantization tests to tests/quantization/ and align markers#2615

Open
pjh4993 wants to merge 4 commits into
vllm-project:mainfrom
pjh4993:chore/ghi-2614-ci-add-diffusion-quantization-tests
Open

[CI/Build] Move quantization tests to tests/quantization/ and align markers#2615
pjh4993 wants to merge 4 commits into
vllm-project:mainfrom
pjh4993:chore/ghi-2614-ci-add-diffusion-quantization-tests

Conversation

@pjh4993
Copy link
Copy Markdown
Contributor

@pjh4993 pjh4993 commented Apr 9, 2026

Purpose

The unified quantization framework (#1764) consolidated source code at vllm_omni/quantization/, but tests were still under tests/diffusion/quantization/ and had no Buildkite CI coverage.

This PR addresses both the CI coverage gap from #2614 and the directory mismatch noted by @david6666666 in the PR review.

Fixes #2614

Changes

  • Move tests/diffusion/quantization/tests/quantization/ to mirror the source layout
  • Align pytest markers with actual test type:
    • test_int8_config.py: core_model + cuda + L4 (GPU smoke test)
    • test_inc_config.py: core_model + cpu (pure config builder, no GPU needed)
    • test_fp8_config.py: core_model + cpu (drop redundant diffusion marker)
    • test_gguf_config.py: core_model + cpu (drop redundant diffusion marker)
  • Update docstrings and contributing doc to reference the new path

After this change, the existing CUDA Unit Test with single card step (pytest -m 'core_model and cuda and L4 and not distributed_cuda') automatically picks up the GPU quantization tests, and Simple Unit Test picks up the CPU ones — so no dedicated Buildkite step is needed.

Test Plan

# CPU tests collected by Simple Unit Test
pytest --collect-only -q -m "core_model and cpu" tests/quantization/

# GPU tests collected by CUDA Unit Test with single card
pytest --collect-only -q -m "core_model and cuda and L4" tests/quantization/

Test Result

# CPU collection
43/69 tests collected (26 deselected) in 0.03s

# GPU collection
23/69 tests collected (46 deselected) in 0.08s

Buildkite CI will provide the authoritative L4 result.

@pjh4993 pjh4993 requested a review from hsliuustc0106 as a code owner April 9, 2026 01:06
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@david6666666
Copy link
Copy Markdown
Collaborator

Thank you for your catch. Maybe we should move tests/diffusion/quantization/ to tests/quantization/ because of #1764

@pjh4993 pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 74637ad to fdfb4e6 Compare April 10, 2026 01:17
@pjh4993 pjh4993 changed the title [CI/Build] Add Buildkite step for diffusion quantization tests [CI/Build] Move quantization tests to tests/quantization/ and align markers Apr 10, 2026
@pjh4993
Copy link
Copy Markdown
Contributor Author

pjh4993 commented Apr 10, 2026

Thanks for the review @david6666666! You're right — since #1764 unified the quantization framework, the test directory should mirror the source layout.

I've updated the PR to:

  1. Move tests/diffusion/quantization/tests/quantization/
  2. Align pytest markers with each test's actual scope:
    • test_int8_config.py: core_model + cuda + L4 (real GPU smoke test)
    • test_inc_config.py: core_model + cpu (pure config builder)
    • test_fp8_config.py / test_gguf_config.py: drop the redundant diffusion marker, keep cpu
  3. Update the test docstring and docs/contributing/model/adding_quantization_model.md to reference the new path
  4. Drop the dedicated Diffusion Quantization Test Buildkite step — the existing CUDA Unit Test with single card step (added in [CI][Bugfix] Update environment variables for test configurations in Buildkite YAML files to resolve HF timeout #2628) and Simple Unit Test step now pick these up automatically via marker-based collection

PTAL when you have a chance.

@david6666666 david6666666 added the ready label to trigger buildkite CI label Apr 10, 2026
@david6666666
Copy link
Copy Markdown
Collaborator

Added ready label, you can check each test been run.

@hsliuustc0106 hsliuustc0106 requested a review from yenuo26 April 10, 2026 02:40
@@ -263,7 +263,7 @@ outputs = omni.generate(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also modify the following script?
tests/e2e/offline_inference/test_quantization_fp8.py
tests/e2e/offline_inference/run_quantization_e2e.sh

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @yenuo26!

I looked into both files:

tests/e2e/offline_inference/test_quantization_fp8.py:

  • E2E tests for the unified quantization framework ([Core] Unified quantization framework #1764)
  • Individual test functions use @hardware_test(res={"cuda": "L4"}) / @hardware_test(res={"cuda": "H100"}) — so they have proper hardware markers at the function level
  • However, the module-level marker is pytestmark = [pytest.mark.core_model, pytest.mark.diffusion] — still uses the old diffusion marker
  • Not referenced in any Buildkite pipeline step (same CI gap as the unit tests)
  • The generic CUDA Unit Test step explicitly --ignore=tests/e2e, so these won't be picked up automatically

tests/e2e/offline_inference/run_quantization_e2e.sh:

  • Shell script wrapper for manually running the above tests
  • Not referenced by Buildkite

Both files have the same CI coverage gap, but they're e2e tests (L3/L4 tier) rather than unit tests — so the fix approach is different. They'd need a dedicated Buildkite step in test-merge.yml or test-nightly.yml rather than relying on marker-based collection.

Could you clarify what you'd like changed? Some options:

  1. Marker cleanup only — update pytestmark to drop diffusion and align with the individual @hardware_test markers (minimal, fits this PR)
  2. Move + marker + CI step — move to tests/quantization/, update markers, and add a Buildkite step (bigger scope, maybe a follow-up PR)
  3. Something else?

@pjh4993 pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch 2 times, most recently from 7809058 to 627880a Compare April 10, 2026 05:57
_marks = hardware_marks(res={"cuda": "H100"})


@pytest.mark.advanced_model
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should add this test to vllm-omni/.buildkite/test-nightly.yml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the quality test to test-nightly-diffusion.yml (not test-nightly.yml directly, since diffusion tests are dynamically uploaded via test-nightly-diffusion.yml since #2582).

Split it by model group to match the existing nightly structure:

  • Diffusion · Other · Quantization Quality Test — runs -k "z_image or flux"
  • Diffusion · Qwen-Image · Quantization Quality Test — runs -k "qwen_image"

Comment thread tests/quantization/test_int8_config.py Outdated
from vllm_omni.quantization.factory import SUPPORTED_QUANTIZATION_METHODS

pytestmark = [pytest.mark.core_model, pytest.mark.diffusion]
pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4]
pytestmark = [pytest.mark.core_model, pytest.mark.cpu]

I think it is a cpu test

Copy link
Copy Markdown
Contributor Author

@pjh4993 pjh4993 Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — most tests here are pure config/factory tests that only need CPU. However, some of the tests added with hardware specific decorators (e.g. @cuda_availble, @npu_available)

Split the file into two:

  • test_int8_config.py (core_model, cpu) — config builder and mock-based unit tests
  • test_int8_smoke.py (core_model, cuda, L4) — real hardware smoke tests with @cuda_available / @npu_available skipif guards

This follows the codebase pattern where files are either fully CPU or fully CUDA (e.g. test_cuda_graph_decoder.py), rather than mixing tiers in one file.

@pjh4993 pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 627880a to fc75294 Compare April 11, 2026 07:25
@pjh4993
Copy link
Copy Markdown
Contributor Author

pjh4993 commented Apr 11, 2026

Thx for the review @david6666666! Addressed both comments:

  • Added test_quantization_quality.py to test-nightly-diffusion.yml, split by model group
  • Split test_int8_config.py into CPU config tests + CUDA/NPU smoke tests (test_int8_smoke.py)

Here's the full CI mapping for the tests/quantization :

Test File Markers Buildkite Step Pipeline
test_fp8_config.py core_model, cpu Simple Unit Test test-ready / test-merge
test_gguf_config.py core_model, cpu Simple Unit Test test-ready / test-merge
test_inc_config.py core_model, cpu Simple Unit Test test-ready / test-merge
test_int8_config.py core_model, cpu Simple Unit Test test-ready / test-merge
test_int8_smoke.py core_model, cuda, L4 CUDA Unit Test with single card test-ready
test_quantization_quality.py advanced_model, diffusion, H100 Diffusion · Other · Quantization Quality Test test-nightly-diffusion
test_quantization_quality.py advanced_model, diffusion, H100 Diffusion · Qwen-Image · Quantization Quality Test test-nightly-diffusion

PTAL when you have a chance.

@hsliuustc0106 hsliuustc0106 added the nightly-test label to trigger buildkite nightly test CI label Apr 11, 2026
@pjh4993 pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from fc75294 to 516378e Compare April 12, 2026 02:22
@david6666666
Copy link
Copy Markdown
Collaborator

https://buildkite.com/vllm/vllm-omni/builds/6405/steps/canvas 2 Level 4 Quantization test case failed @lishunyang12 @pjh4993

@pjh4993
Copy link
Copy Markdown
Contributor Author

pjh4993 commented Apr 13, 2026

Hi @hsliuustc0106 👋

While debugging build #6405 for this PR, I traced the fp8_flux failure to a gated-repo access issue and wanted to check how it should be handled before proposing a fix.

What's happening

The new tests/quantization/test_quantization_quality.py::test_quantization_quality[fp8_flux] uses black-forest-labs/FLUX.1-dev, which is a gated Hugging Face repo. In the failed CI run I see:

  WARNING [omni_base.py:55] Repository not found for 'black-forest-labs/FLUX.1-dev'.
  ...
  ValueError: Could not detect config format for no config file found.

RepositoryNotFoundError on a known-to-exist repo means HF returned 404, which Hugging Face does deliberately for gated repos when the caller isn't authorized (security by obscurity — they don't reveal that a gated repo exists). So the CI's HF_TOKEN is being injected fine (I see it wired into every buildkite step via hf-token-secret), but the underlying HF account hasn't been granted access to FLUX.1-dev's gated license.

What I checked

  • HF_TOKEN secret injection is set up in .buildkite/test-nightly-diffusion.yml for every pod — ✅
  • /mnt/hf-cache hostPath mount is in place — ✅
  • I grepped .buildkite/ for any other test using a gated repo (FLUX, Llama, Mistral, ...) — none found. All currently-wired model tests use ungated repos (Z-Image, Qwen-Image, Wan, Bagel, etc.). The new fp8_flux
    test case appears to be the first gated-repo test in the CI pipeline, so there's no existing pattern to copy.
  • Locally (with an authorized token) the test runs end-to-end — this also led to finding a separate fp8 numerical regression (tracked separately).

Questions

  1. Is the CI's HF account (the one behind hf-token-secret) able to request/be granted FLUX.1-dev access? If so, no code changes required, the test will Just Work on next run via the usual lazy-download → hostPath cache flow.
  2. Alternatively, is there a project-wide preference to avoid gated repos in CI? If that's the case I'll swap the test to an ungated model (e.g., FLUX.2-klein-4B if that's ungated, or some other flux variant you
    recommend), understanding that we'd lose FLUX.1-dev-specific quality coverage.
  3. Independently, would you like me to add a defensive pytest.mark.skipif that skips with a clear message when the repo isn't accessible? That way the test wouldn't spuriously fail on forks / dev machines / CI
    nodes without access, and would just start running automatically once access is granted.

Happy to take whichever direction you prefer. Thanks!

CC: @david6666666

@pjh4993 pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 516378e to b33677f Compare April 13, 2026 07:31
@pjh4993
Copy link
Copy Markdown
Contributor Author

pjh4993 commented Apr 13, 2026

Hi @david6666666 👋. I would like to share some analysis for the test case failures.

Update on build #6405 failures

Quick status after investigating the two failing steps in build #6405.

Resolved — Qwen-Image quantization step GPU leak

The fp8_qwen_image failure was a GPU cleanup / test isolation issue, not a test bug. Root cause from the build log:

torch.OutOfMemoryError: CUDA out of memory.
Process 633 has 59.29 GiB memory in use.

A sibling StageDiffusionProc from an earlier test in the session was holding 59 GiB when the fp8 worker tried to load Qwen-Image weights. This was an in-session retry after a failed first attempt, and the leftover child process had never been reaped.

tests/conftest.py already has an autouse clean_gpu_memory_between_tests fixture that waits for GPU memory to clear before each test — but it's gated behind VLLM_TEST_CLEAN_GPU_MEMORY=1 and defaults off.

Fix: commit b33677f ([CI/Build] Enable GPU cleanup for qwen-image quantization quality step) sets VLLM_TEST_CLEAN_GPU_MEMORY=1 for the qwen-image step only. Validated on my local machine — log shows the hook firing as expected:

Pre-test GPU status:
[GPU Memory Monitor] Waiting for GPU 2 to free memory, Condition: Memory usage ratio ≤ 5.0%
[GPU Memory Freed] Devices 2 meet memory condition
   Wait time: 0.0 seconds (0.0 minutes)

Post-test cleanup prints and orchestrator shuts down cleanly — no leaked processes. The retry scenario that caused the 59 GiB leak on CI is now guarded.

⚠️ Unresolved — real fp8 quality regression

After the cleanup fix, the Qwen-Image run progresses end-to-end and fails on a clean LPIPS assertion, not OOM:

LPIPS:   1.0344  (threshold: 0.35)
Result:  FAIL

Reproduced locally for all three fp8 cases in the suite:

Test Model LPIPS Threshold
fp8_z_image Tongyi-MAI/Z-Image-Turbo 0.8826 0.10
fp8_flux black-forest-labs/FLUX.1-dev 0.8014 0.20
fp8_qwen_image Qwen/Qwen-Image 1.0344 0.35

All three models hit the same class of failure (LPIPS 0.80–1.03, catastrophically off the BF16 baseline), matching CI's fp8_z_image LPIPS 0.8987 from build #6405. This is a real pre-existing fp8 numerical regression, not something this PR introduced. I've filed #2728 with the full investigation (determinism probe, cutlass micro-benchmark, CI↔local comparison) — it's not a Blackwell-specific kernel bug (H100 CI and B200 local hit identical numbers), and the raw cutlass_scaled_mm op is fine on synthetic inputs, so the bug lives somewhere in the online-quant → layer-dispatch → weight-loading path rather than in the kernel math.

Next steps for this PR

  • For the fp8 quality regression: I'd propose landing this CI infrastructure PR as-is (the tests are correctly catching a real pre-existing bug) and adding pytest.mark.xfail on the three fp8 cases with a link to the tracking issue. That way the quality gate stays in-tree and automatically flips to xpass once the fp8 numerics are fixed in a follow-up PR. Let me know if you'd prefer a different disposition.

pjh4993 and others added 2 commits April 13, 2026 13:36
…arkers

The unified quantization framework (vllm-project#1764) consolidated source code at
vllm_omni/quantization/, but tests were still under tests/diffusion/quantization/,
and they had no Buildkite CI coverage.

This PR:

- Moves tests/diffusion/quantization/ to tests/quantization/ to mirror the
  source layout.
- Aligns pytest markers with the actual test type:
  * test_int8_config.py: core_model + cuda + L4 (GPU smoke test)
  * test_inc_config.py:  core_model + cpu (pure config builder)
  * test_fp8_config.py:  core_model + cpu (drop redundant diffusion marker)
  * test_gguf_config.py: core_model + cpu (drop redundant diffusion marker)
- Updates the test docstring and contributing doc to reference the new path.

After this change, the existing CUDA Unit Test with single card step
(pytest -m 'core_model and cuda and L4 and not distributed_cuda') will
automatically pick up the GPU quantization tests, and the Simple Unit
Test step will pick up the CPU ones — so no dedicated Buildkite step
is needed.

Fixes vllm-project#2614

Signed-off-by: pjh4993 <pjh4993@naver.com>
Split quantization quality tests by model group in test-nightly-diffusion.yml:
- Other group: Z-Image and FLUX FP8 quality tests
- Qwen-Image group: Qwen-Image FP8 quality test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: pjh4993 <pjh4993@naver.com>
@pjh4993 pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 662eb54 to 9d3885f Compare April 13, 2026 13:56
pjh4993 and others added 2 commits April 13, 2026 14:04
…smoke tests

Separate test_int8_config.py into two files aligned with codebase conventions:
- test_int8_config.py (core_model, cpu): pure config/factory unit tests using mocks
- test_int8_smoke.py (core_model, cuda, L4): real hardware smoke tests with
  @cuda_available and @npu_available skipif guards

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: pjh4993 <pjh4993@naver.com>
Set VLLM_TEST_CLEAN_GPU_MEMORY=1 on the qwen-image quantization quality
test step so the autouse conftest fixture reclaims the runner GPU before
each test. Without it, a failed first attempt can leave a StageDiffusionProc
child holding tens of GiB, and the in-session retry then hits a spurious
CUDA OOM during weight loading (observed in build #6405 as a 59 GiB leaked
sibling process on an A100 runner).

Signed-off-by: pjh4993 <pjh4993@naver.com>
@pjh4993 pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 9d3885f to 16db77c Compare April 13, 2026 14:05
@david6666666
Copy link
Copy Markdown
Collaborator

#2791 has been merged

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: [CI/Build] Move quantization tests to tests/quantization/ and align markers

Overall this is a well-structured PR that correctly addresses the directory mismatch noted after #1764. The separation of CPU config tests from GPU smoke tests is a good design choice, and the marker alignment means these tests get picked up by existing CI steps without needing dedicated pipeline entries.

What looks good

  • Directory restructure mirrors the source layout (vllm_omni/quantization/ -> tests/quantization/), making the project easier to navigate.
  • Marker cleanup is correct: dropping the diffusion marker from pure config tests (test_fp8_config.py, test_gguf_config.py), adding cpu to test_inc_config.py which previously had no environment marker, and tagging GPU smoke tests with cuda + L4.
  • Splitting test_int8_config.py into config-only tests + test_int8_smoke.py for real-hardware tests is a clean separation of concerns.
  • Nightly pipeline additions for quantization quality tests are properly structured with the correct node selector, HF token injection, and lpips dependency installation.
  • Doc updates in adding_quantization_model.md correctly reference the new test paths.

Minor observations (non-blocking)

  1. NPU test classes under CUDA markers: test_int8_smoke.py has module-level pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4], but TestNPUInt8LinearMethod and TestNPUInt8Smoke are decorated with @npu_available (skipif not NPU). This means CUDA CI will collect these NPU test classes and then immediately skip every test in them. It works correctly but adds noise to test reports. Consider either (a) moving NPU tests to a separate file with NPU-appropriate markers, or (b) adding a class-level pytestmark override on the NPU classes. Low priority since it does not affect correctness.

  2. Buildkite failure: Build #6514 is currently failing. Based on the discussion in PR comments, this appears to be the pre-existing fp8 numerical regression tracked in #2728, not something introduced by this PR. The suggestion to land with pytest.mark.xfail on the fp8 quality cases (linking to the tracking issue) seems like the right approach to avoid blocking this infrastructure improvement.

No blocking issues found. The changes are clean and well-motivated.

@hsliuustc0106 hsliuustc0106 removed ready label to trigger buildkite CI nightly-test label to trigger buildkite nightly test CI labels Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Diffusion quantization tests missing from Buildkite CI pipeline

5 participants