Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
1d0e39c
[Core] Unified quantization framework for vLLM-OMNI
lishunyang12 Mar 10, 2026
0842adb
Clean up redundant comments and fix bugs in quantization framework
lishunyang12 Mar 11, 2026
aee99e9
Add integration tests and quantization docs
lishunyang12 Mar 11, 2026
1520d05
Update quantization docs with unified framework and per-component sup…
lishunyang12 Mar 11, 2026
f0195b4
Address review feedback: consolidate quantization_config field, simpl…
lishunyang12 Mar 12, 2026
0964f59
Add core_model and diffusion test markers
lishunyang12 Mar 12, 2026
7d0009b
Integrate quantization quality benchmark (LPIPS)
lishunyang12 Mar 12, 2026
382325e
[Doc] Add quantization contributor guide and overview docs
lishunyang12 Mar 12, 2026
798a628
[Doc] Update docs to upstream nav and new unified API
lishunyang12 Mar 12, 2026
393f951
address isotr0py's review: always use from_config, use current_platfo…
lishunyang12 Mar 13, 2026
415ce7a
remove validation.py — duplicates vLLM's own checks
lishunyang12 Mar 13, 2026
9ba80e2
fix stale docstring, add none-string test, soften compat shim to Depr…
lishunyang12 Mar 13, 2026
d4cf50c
make resolve() public, it's used across models
lishunyang12 Mar 13, 2026
6acdccb
fix pre-commit and improve benchmark script
lishunyang12 Mar 13, 2026
224e7c5
fix: use direct __init__ instead of from_config, handle "none" string…
lishunyang12 Mar 14, 2026
253be3c
add --quantization flag to bagel offline example
lishunyang12 Mar 14, 2026
90df6d1
fix: route quantization to diffusion stage only for multi-stage models
lishunyang12 Mar 14, 2026
01a0c37
add e2e tests for unified FP8 quantization across all models
lishunyang12 Mar 14, 2026
def93db
fix bash arithmetic under set -e
lishunyang12 Mar 14, 2026
50f1d1e
use production-quality params for e2e quantization tests
lishunyang12 Mar 14, 2026
8fc2b73
add LPIPS quality benchmark to e2e quantization tests
lishunyang12 Mar 14, 2026
a52ef41
support loading pre-quantized modelopt FP8 checkpoints for Qwen3-Omni
lishunyang12 Mar 15, 2026
768adaa
add model download verification script
lishunyang12 Mar 15, 2026
a1a9ee9
add FP8 stage config for Qwen3-Omni modelopt testing
lishunyang12 Mar 15, 2026
77556e7
make model name configurable in qwen3_omni end2end example
lishunyang12 Mar 15, 2026
c715812
fix FP8 testing: reduce max_model_len, handle single-stage sampling p…
lishunyang12 Mar 15, 2026
3f96a8d
add BF16 vs FP8 comparison script and BF16 stage config
lishunyang12 Mar 15, 2026
c970bec
fix: use request_output (singular) in comparison script
lishunyang12 Mar 15, 2026
a5575d2
fix: handle request_output as list in comparison script
lishunyang12 Mar 15, 2026
8e5e169
cap FP8 stage config to ~64GB for single-GPU testing
lishunyang12 Mar 15, 2026
6abdd4b
add full 3-stage FP8 pipeline config constrained to 64GB
lishunyang12 Mar 15, 2026
a6c160d
fix: add max_model_len to talker, increase memory for KV cache
lishunyang12 Mar 15, 2026
f2bde09
fix pre-commit: ruff format + fix omni_llm typo
lishunyang12 Mar 18, 2026
0b55248
remove temporary test files
lishunyang12 Mar 18, 2026
54e9cd8
remove dead compat shim vllm_omni.diffusion.quantization
lishunyang12 Mar 18, 2026
365c1d5
remove stale validation.py reference from quantization docs
lishunyang12 Mar 18, 2026
713b3fd
retrigger CI
lishunyang12 Mar 18, 2026
f375553
add NVFP4 support: test coverage and docs
lishunyang12 Mar 18, 2026
acbd9d6
add NVFP4 support: test coverage and docs
lishunyang12 Mar 18, 2026
c7a3920
add NVFP4 single-GPU stage config for RTX 5090
lishunyang12 Mar 18, 2026
bce7ea3
integrate Int8 quantization into unified framework
lishunyang12 Mar 19, 2026
c329d6c
update sensitive layers guide with Int8 benchmark results
lishunyang12 Mar 19, 2026
0db1c36
support pre-quantized checkpoints for all thinker subcomponents
lishunyang12 Mar 20, 2026
cc253c3
add quantization quality gate test and developer testing guide
lishunyang12 Mar 22, 2026
7254683
update quantization overview to cover omni models and tested results
lishunyang12 Mar 22, 2026
ad55a6b
move int8 into unified quantization framework, fix hunyuan-image-3 im…
lishunyang12 Mar 22, 2026
8a50ae4
fix int8 tests: use quantization_config instead of removed quantizati…
lishunyang12 Mar 22, 2026
7f20d1f
add 2-GPU stage config for HunyuanImage-3 FP8 testing
lishunyang12 Mar 22, 2026
7b257ab
Add 2-GPU FP8 stage config for HunyuanImage-3.0
lishunyang12 Mar 22, 2026
b15be5c
fix upstream GGUF test imports for unified quantization module
lishunyang12 Mar 23, 2026
548ee25
Merge branch 'main' into unified-quantization-framework
lishunyang12 Mar 23, 2026
408d681
Fix CI failures: GGUF tests, attribute error, pre-commit, docs
lishunyang12 Mar 23, 2026
f1308c6
Exclude quantization modules from API README generator
lishunyang12 Mar 23, 2026
657018f
address review: remove bench reports, move quantization doc to model/
lishunyang12 Mar 24, 2026
f92d34e
Merge remote-tracking branch 'origin/main' into unified-quantization-…
lishunyang12 Mar 24, 2026
bc6ba3a
Merge branch 'main' into unified-quantization-framework
lishunyang12 Mar 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
460 changes: 460 additions & 0 deletions benchmarks/diffusion/quantization_quality.py

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ nav:
- contributing/model/adding_omni_model.md
- contributing/model/adding_tts_model.md
- contributing/model/adding_diffusion_model.md
- contributing/model/adding_quantization_model.md
- CI: contributing/ci
- Design Documents:
- design/index.md
Expand Down
Loading
Loading