[CI/Build] Move quantization tests to tests/quantization/ and align markers by pjh4993 · Pull Request #2615 · vllm-project/vllm-omni

pjh4993 · 2026-04-09T01:06:21Z

Purpose

The unified quantization framework (#1764) consolidated source code at vllm_omni/quantization/, but tests were still under tests/diffusion/quantization/ and had no Buildkite CI coverage.

This PR addresses both the CI coverage gap from #2614 and the directory mismatch noted by @david6666666 in the PR review.

Fixes #2614

Changes

Move tests/diffusion/quantization/ → tests/quantization/ to mirror the source layout
Align pytest markers with actual test type:
- test_int8_config.py: core_model + cuda + L4 (GPU smoke test)
- test_inc_config.py: core_model + cpu (pure config builder, no GPU needed)
- test_fp8_config.py: core_model + cpu (drop redundant diffusion marker)
- test_gguf_config.py: core_model + cpu (drop redundant diffusion marker)
Update docstrings and contributing doc to reference the new path

After this change, the existing CUDA Unit Test with single card step (pytest -m 'core_model and cuda and L4 and not distributed_cuda') automatically picks up the GPU quantization tests, and Simple Unit Test picks up the CPU ones — so no dedicated Buildkite step is needed.

Test Plan

# CPU tests collected by Simple Unit Test
pytest --collect-only -q -m "core_model and cpu" tests/quantization/

# GPU tests collected by CUDA Unit Test with single card
pytest --collect-only -q -m "core_model and cuda and L4" tests/quantization/

Test Result

# CPU collection
43/69 tests collected (26 deselected) in 0.03s

# GPU collection
23/69 tests collected (46 deselected) in 0.08s

Buildkite CI will provide the authoritative L4 result.

chatgpt-codex-connector · 2026-04-09T01:06:27Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

david6666666 · 2026-04-09T06:59:04Z

Thank you for your catch. Maybe we should move tests/diffusion/quantization/ to tests/quantization/ because of #1764

pjh4993 · 2026-04-10T01:26:48Z

Thanks for the review @david6666666! You're right — since #1764 unified the quantization framework, the test directory should mirror the source layout.

I've updated the PR to:

Move tests/diffusion/quantization/ → tests/quantization/
Align pytest markers with each test's actual scope:
- test_int8_config.py: core_model + cuda + L4 (real GPU smoke test)
- test_inc_config.py: core_model + cpu (pure config builder)
- test_fp8_config.py / test_gguf_config.py: drop the redundant diffusion marker, keep cpu
Update the test docstring and docs/contributing/model/adding_quantization_model.md to reference the new path
Drop the dedicated Diffusion Quantization Test Buildkite step — the existing CUDA Unit Test with single card step (added in [CI][Bugfix] Update environment variables for test configurations in Buildkite YAML files to resolve HF timeout #2628) and Simple Unit Test step now pick these up automatically via marker-based collection

PTAL when you have a chance.

david6666666 · 2026-04-10T01:54:23Z

Added ready label, you can check each test been run.

yenuo26 · 2026-04-10T02:49:20Z

@@ -263,7 +263,7 @@ outputs = omni.generate(



Could you also modify the following script?
tests/e2e/offline_inference/test_quantization_fp8.py
tests/e2e/offline_inference/run_quantization_e2e.sh

Thanks for pointing this out @yenuo26!

I looked into both files:

tests/e2e/offline_inference/test_quantization_fp8.py:

E2E tests for the unified quantization framework ([Core] Unified quantization framework #1764)

Individual test functions use @hardware_test(res={"cuda": "L4"}) / @hardware_test(res={"cuda": "H100"}) — so they have proper hardware markers at the function level

However, the module-level marker is pytestmark = [pytest.mark.core_model, pytest.mark.diffusion] — still uses the old diffusion marker

Not referenced in any Buildkite pipeline step (same CI gap as the unit tests)

The generic CUDA Unit Test step explicitly --ignore=tests/e2e, so these won't be picked up automatically

tests/e2e/offline_inference/run_quantization_e2e.sh:

Shell script wrapper for manually running the above tests

Not referenced by Buildkite

Both files have the same CI coverage gap, but they're e2e tests (L3/L4 tier) rather than unit tests — so the fix approach is different. They'd need a dedicated Buildkite step in test-merge.yml or test-nightly.yml rather than relying on marker-based collection.

Could you clarify what you'd like changed? Some options:

Marker cleanup only — update pytestmark to drop diffusion and align with the individual @hardware_test markers (minimal, fits this PR)

Move + marker + CI step — move to tests/quantization/, update markers, and add a Buildkite step (bigger scope, maybe a follow-up PR)

Something else?

david6666666 · 2026-04-10T07:04:34Z

 _marks = hardware_marks(res={"cuda": "H100"})


 @pytest.mark.advanced_model


you should add this test to vllm-omni/.buildkite/test-nightly.yml

Added the quality test to test-nightly-diffusion.yml (not test-nightly.yml directly, since diffusion tests are dynamically uploaded via test-nightly-diffusion.yml since #2582).

Split it by model group to match the existing nightly structure:

Diffusion · Other · Quantization Quality Test — runs -k "z_image or flux"

Diffusion · Qwen-Image · Quantization Quality Test — runs -k "qwen_image"

david6666666 · 2026-04-10T07:08:12Z

 from vllm_omni.quantization.factory import SUPPORTED_QUANTIZATION_METHODS

-pytestmark = [pytest.mark.core_model, pytest.mark.diffusion]
+pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4]


Suggested change

pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4]

pytestmark = [pytest.mark.core_model, pytest.mark.cpu]

I think it is a cpu test

Agreed — most tests here are pure config/factory tests that only need CPU. However, some of the tests added with hardware specific decorators (e.g. @cuda_availble, @npu_available)

Split the file into two:

test_int8_config.py (core_model, cpu) — config builder and mock-based unit tests

test_int8_smoke.py (core_model, cuda, L4) — real hardware smoke tests with @cuda_available / @npu_available skipif guards

This follows the codebase pattern where files are either fully CPU or fully CUDA (e.g. test_cuda_graph_decoder.py), rather than mixing tiers in one file.

pjh4993 · 2026-04-11T07:35:58Z

Thx for the review @david6666666! Addressed both comments:

Added test_quantization_quality.py to test-nightly-diffusion.yml, split by model group
Split test_int8_config.py into CPU config tests + CUDA/NPU smoke tests (test_int8_smoke.py)

Here's the full CI mapping for the tests/quantization :

Test File	Markers	Buildkite Step	Pipeline
`test_fp8_config.py`	`core_model`, `cpu`	Simple Unit Test	test-ready / test-merge
`test_gguf_config.py`	`core_model`, `cpu`	Simple Unit Test	test-ready / test-merge
`test_inc_config.py`	`core_model`, `cpu`	Simple Unit Test	test-ready / test-merge
`test_int8_config.py`	`core_model`, `cpu`	Simple Unit Test	test-ready / test-merge
`test_int8_smoke.py`	`core_model`, `cuda`, `L4`	CUDA Unit Test with single card	test-ready
`test_quantization_quality.py`	`advanced_model`, `diffusion`, `H100`	Diffusion · Other · Quantization Quality Test	test-nightly-diffusion
`test_quantization_quality.py`	`advanced_model`, `diffusion`, `H100`	Diffusion · Qwen-Image · Quantization Quality Test	test-nightly-diffusion

PTAL when you have a chance.

david6666666 · 2026-04-13T01:29:07Z

https://buildkite.com/vllm/vllm-omni/builds/6405/steps/canvas 2 Level 4 Quantization test case failed @lishunyang12 @pjh4993

pjh4993 · 2026-04-13T07:29:33Z

Hi @hsliuustc0106 👋

While debugging build #6405 for this PR, I traced the fp8_flux failure to a gated-repo access issue and wanted to check how it should be handled before proposing a fix.

What's happening

The new tests/quantization/test_quantization_quality.py::test_quantization_quality[fp8_flux] uses black-forest-labs/FLUX.1-dev, which is a gated Hugging Face repo. In the failed CI run I see:

  WARNING [omni_base.py:55] Repository not found for 'black-forest-labs/FLUX.1-dev'.
  ...
  ValueError: Could not detect config format for no config file found.

RepositoryNotFoundError on a known-to-exist repo means HF returned 404, which Hugging Face does deliberately for gated repos when the caller isn't authorized (security by obscurity — they don't reveal that a gated repo exists). So the CI's HF_TOKEN is being injected fine (I see it wired into every buildkite step via hf-token-secret), but the underlying HF account hasn't been granted access to FLUX.1-dev's gated license.

What I checked

HF_TOKEN secret injection is set up in .buildkite/test-nightly-diffusion.yml for every pod — ✅
/mnt/hf-cache hostPath mount is in place — ✅
I grepped .buildkite/ for any other test using a gated repo (FLUX, Llama, Mistral, ...) — none found. All currently-wired model tests use ungated repos (Z-Image, Qwen-Image, Wan, Bagel, etc.). The new fp8_flux
test case appears to be the first gated-repo test in the CI pipeline, so there's no existing pattern to copy.
Locally (with an authorized token) the test runs end-to-end — this also led to finding a separate fp8 numerical regression (tracked separately).

Questions

Is the CI's HF account (the one behind hf-token-secret) able to request/be granted FLUX.1-dev access? If so, no code changes required, the test will Just Work on next run via the usual lazy-download → hostPath cache flow.
Alternatively, is there a project-wide preference to avoid gated repos in CI? If that's the case I'll swap the test to an ungated model (e.g., FLUX.2-klein-4B if that's ungated, or some other flux variant you
recommend), understanding that we'd lose FLUX.1-dev-specific quality coverage.
Independently, would you like me to add a defensive pytest.mark.skipif that skips with a clear message when the repo isn't accessible? That way the test wouldn't spuriously fail on forks / dev machines / CI
nodes without access, and would just start running automatically once access is granted.

Happy to take whichever direction you prefer. Thanks!

CC: @david6666666

pjh4993 · 2026-04-13T07:36:27Z

Hi @david6666666 👋. I would like to share some analysis for the test case failures.

Update on build #6405 failures

Quick status after investigating the two failing steps in build #6405.

✅ Resolved — Qwen-Image quantization step GPU leak

The fp8_qwen_image failure was a GPU cleanup / test isolation issue, not a test bug. Root cause from the build log:

torch.OutOfMemoryError: CUDA out of memory.
Process 633 has 59.29 GiB memory in use.

A sibling StageDiffusionProc from an earlier test in the session was holding 59 GiB when the fp8 worker tried to load Qwen-Image weights. This was an in-session retry after a failed first attempt, and the leftover child process had never been reaped.

tests/conftest.py already has an autouse clean_gpu_memory_between_tests fixture that waits for GPU memory to clear before each test — but it's gated behind VLLM_TEST_CLEAN_GPU_MEMORY=1 and defaults off.

Fix: commit b33677f ([CI/Build] Enable GPU cleanup for qwen-image quantization quality step) sets VLLM_TEST_CLEAN_GPU_MEMORY=1 for the qwen-image step only. Validated on my local machine — log shows the hook firing as expected:

Pre-test GPU status:
[GPU Memory Monitor] Waiting for GPU 2 to free memory, Condition: Memory usage ratio ≤ 5.0%
[GPU Memory Freed] Devices 2 meet memory condition
   Wait time: 0.0 seconds (0.0 minutes)

Post-test cleanup prints and orchestrator shuts down cleanly — no leaked processes. The retry scenario that caused the 59 GiB leak on CI is now guarded.

⚠️ Unresolved — real fp8 quality regression

After the cleanup fix, the Qwen-Image run progresses end-to-end and fails on a clean LPIPS assertion, not OOM:

LPIPS:   1.0344  (threshold: 0.35)
Result:  FAIL

Reproduced locally for all three fp8 cases in the suite:

Test	Model	LPIPS	Threshold
`fp8_z_image`	Tongyi-MAI/Z-Image-Turbo	0.8826	0.10
`fp8_flux`	black-forest-labs/FLUX.1-dev	0.8014	0.20
`fp8_qwen_image`	Qwen/Qwen-Image	1.0344	0.35

All three models hit the same class of failure (LPIPS 0.80–1.03, catastrophically off the BF16 baseline), matching CI's fp8_z_image LPIPS 0.8987 from build #6405. This is a real pre-existing fp8 numerical regression, not something this PR introduced. I've filed #2728 with the full investigation (determinism probe, cutlass micro-benchmark, CI↔local comparison) — it's not a Blackwell-specific kernel bug (H100 CI and B200 local hit identical numbers), and the raw cutlass_scaled_mm op is fine on synthetic inputs, so the bug lives somewhere in the online-quant → layer-dispatch → weight-loading path rather than in the kernel math.

Next steps for this PR

For the fp8 quality regression: I'd propose landing this CI infrastructure PR as-is (the tests are correctly catching a real pre-existing bug) and adding pytest.mark.xfail on the three fp8 cases with a link to the tracking issue. That way the quality gate stays in-tree and automatically flips to xpass once the fp8 numerics are fixed in a follow-up PR. Let me know if you'd prefer a different disposition.

…arkers The unified quantization framework (vllm-project#1764) consolidated source code at vllm_omni/quantization/, but tests were still under tests/diffusion/quantization/, and they had no Buildkite CI coverage. This PR: - Moves tests/diffusion/quantization/ to tests/quantization/ to mirror the source layout. - Aligns pytest markers with the actual test type: * test_int8_config.py: core_model + cuda + L4 (GPU smoke test) * test_inc_config.py: core_model + cpu (pure config builder) * test_fp8_config.py: core_model + cpu (drop redundant diffusion marker) * test_gguf_config.py: core_model + cpu (drop redundant diffusion marker) - Updates the test docstring and contributing doc to reference the new path. After this change, the existing CUDA Unit Test with single card step (pytest -m 'core_model and cuda and L4 and not distributed_cuda') will automatically pick up the GPU quantization tests, and the Simple Unit Test step will pick up the CPU ones — so no dedicated Buildkite step is needed. Fixes vllm-project#2614 Signed-off-by: pjh4993 <pjh4993@naver.com>

Split quantization quality tests by model group in test-nightly-diffusion.yml: - Other group: Z-Image and FLUX FP8 quality tests - Qwen-Image group: Qwen-Image FP8 quality test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: pjh4993 <pjh4993@naver.com>

…smoke tests Separate test_int8_config.py into two files aligned with codebase conventions: - test_int8_config.py (core_model, cpu): pure config/factory unit tests using mocks - test_int8_smoke.py (core_model, cuda, L4): real hardware smoke tests with @cuda_available and @npu_available skipif guards Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: pjh4993 <pjh4993@naver.com>

Set VLLM_TEST_CLEAN_GPU_MEMORY=1 on the qwen-image quantization quality test step so the autouse conftest fixture reclaims the runner GPU before each test. Without it, a failed first attempt can leave a StageDiffusionProc child holding tens of GiB, and the in-session retry then hits a spurious CUDA OOM during weight loading (observed in build #6405 as a 59 GiB leaked sibling process on an A100 runner). Signed-off-by: pjh4993 <pjh4993@naver.com>

david6666666 · 2026-04-15T14:35:24Z

#2791 has been merged

lishunyang12

Review: [CI/Build] Move quantization tests to tests/quantization/ and align markers

Overall this is a well-structured PR that correctly addresses the directory mismatch noted after #1764. The separation of CPU config tests from GPU smoke tests is a good design choice, and the marker alignment means these tests get picked up by existing CI steps without needing dedicated pipeline entries.

What looks good

Directory restructure mirrors the source layout (vllm_omni/quantization/ -> tests/quantization/), making the project easier to navigate.
Marker cleanup is correct: dropping the diffusion marker from pure config tests (test_fp8_config.py, test_gguf_config.py), adding cpu to test_inc_config.py which previously had no environment marker, and tagging GPU smoke tests with cuda + L4.
Splitting test_int8_config.py into config-only tests + test_int8_smoke.py for real-hardware tests is a clean separation of concerns.
Nightly pipeline additions for quantization quality tests are properly structured with the correct node selector, HF token injection, and lpips dependency installation.
Doc updates in adding_quantization_model.md correctly reference the new test paths.

Minor observations (non-blocking)

NPU test classes under CUDA markers: test_int8_smoke.py has module-level pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4], but TestNPUInt8LinearMethod and TestNPUInt8Smoke are decorated with @npu_available (skipif not NPU). This means CUDA CI will collect these NPU test classes and then immediately skip every test in them. It works correctly but adds noise to test reports. Consider either (a) moving NPU tests to a separate file with NPU-appropriate markers, or (b) adding a class-level pytestmark override on the NPU classes. Low priority since it does not affect correctness.
Buildkite failure: Build #6514 is currently failing. Based on the discussion in PR comments, this appears to be the pre-existing fp8 numerical regression tracked in #2728, not something introduced by this PR. The suggestion to land with pytest.mark.xfail on the fp8 quality cases (linking to the tracking issue) seems like the right approach to avoid blocking this infrastructure improvement.

No blocking issues found. The changes are clean and well-motivated.

pjh4993 requested a review from hsliuustc0106 as a code owner April 9, 2026 01:06

pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 74637ad to fdfb4e6 Compare April 10, 2026 01:17

pjh4993 changed the title ~~[CI/Build] Add Buildkite step for diffusion quantization tests~~ [CI/Build] Move quantization tests to tests/quantization/ and align markers Apr 10, 2026

david6666666 added the ready label to trigger buildkite CI label Apr 10, 2026

hsliuustc0106 requested a review from yenuo26 April 10, 2026 02:40

yenuo26 reviewed Apr 10, 2026

View reviewed changes

pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch 2 times, most recently from 7809058 to 627880a Compare April 10, 2026 05:57

david6666666 reviewed Apr 10, 2026

View reviewed changes

pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 627880a to fc75294 Compare April 11, 2026 07:25

hsliuustc0106 added the nightly-test label to trigger buildkite nightly test CI label Apr 11, 2026

lishunyang12 approved these changes Apr 11, 2026

View reviewed changes

pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from fc75294 to 516378e Compare April 12, 2026 02:22

pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 516378e to b33677f Compare April 13, 2026 07:31

pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from b33677f to 662eb54 Compare April 13, 2026 09:29

pjh4993 and others added 2 commits April 13, 2026 13:36

pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 662eb54 to 9d3885f Compare April 13, 2026 13:56

pjh4993 and others added 2 commits April 13, 2026 14:04

pjh4993 force-pushed the chore/ghi-2614-ci-add-diffusion-quantization-tests branch from 9d3885f to 16db77c Compare April 13, 2026 14:05

lishunyang12 reviewed Apr 16, 2026

View reviewed changes

hsliuustc0106 removed ready label to trigger buildkite CI nightly-test label to trigger buildkite nightly test CI labels Apr 29, 2026

		_marks = hardware_marks(res={"cuda": "H100"})


		@pytest.mark.advanced_model

	pytestmark = [pytest.mark.core_model, pytest.mark.cuda, pytest.mark.L4]
	pytestmark = [pytest.mark.core_model, pytest.mark.cpu]

Conversation

pjh4993 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot commented Apr 9, 2026

Uh oh!

david6666666 commented Apr 9, 2026

Uh oh!

pjh4993 commented Apr 10, 2026

Uh oh!

david6666666 commented Apr 10, 2026

Uh oh!

yenuo26 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

pjh4993 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

pjh4993 Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

david6666666 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

pjh4993 Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pjh4993 commented Apr 11, 2026

Uh oh!

david6666666 commented Apr 13, 2026

Uh oh!

pjh4993 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pjh4993 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david6666666 commented Apr 15, 2026

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Review: [CI/Build] Move quantization tests to tests/quantization/ and align markers

What looks good

Minor observations (non-blocking)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pjh4993 commented Apr 9, 2026 •

edited

Loading

pjh4993 Apr 11, 2026 •

edited

Loading

pjh4993 commented Apr 13, 2026 •

edited

Loading

pjh4993 commented Apr 13, 2026 •

edited

Loading