[Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method by maxmgrdv · Pull Request #5143 · vllm-project/vllm-ascend

maxmgrdv · 2025-12-17T14:36:02Z

Introduce W4A4 LAOS Quantization for better model compression and inference efficiency on Ascend devices.

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

gemini-code-assist · 2025-12-17T14:37:56Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

whx-sjtu

This new w4a4 quantization algorithm has significant accuracy and performance gain compared to current FlatQuant w4a4. Please provide the accuracy and performance test results in a new comment, thanks @maxmgrdv .

github-actions · 2025-12-17T15:04:53Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2026-01-05T11:28:30Z

Any progress? If this PR is still alive, please rebase to main and make CI happy, otherwise you can close it. Thanks

whx-sjtu · 2026-01-05T12:03:43Z

Any progress? If this PR is still alive, please rebase to main and make CI happy, otherwise you can close it. Thanks

Related colleagues are in holiday right now. Next week I will push this PR to be merged after they come back.

whx-sjtu

I remember that there exists an eagle3-related bug that will decrease acceptance rate to 0 for this quant method. Dose related fix codes included in this PR? I don't find it.

github-actions · 2026-01-16T07:52:07Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: maxmgrdv <gordeev.maxim@huawei.com> add e2e ci Signed-off-by: hfadzxy <starmoon_zhang@163.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits) [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032) [BugFix] fix 3vl dense model load quant weight (vllm-project#6100) [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641) [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145) [CI]Install clang in dokerfile for triton ascend (vllm-project#4409) [Main] Upgrade PTA to 2.9.0 (vllm-project#6112) [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721) [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124) [BugFix]converting pa get_workspace back to capturing (vllm-project#5833) [CI] optimize lint term (vllm-project#5986) [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042) [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015) [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097) [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110) [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758) [bugfix] adapt_remote_request_id (vllm-project#6051) [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143) [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702) [CI] Upgrade CANN to 8.5.0 (vllm-project#6070) Default enable MLAPO (vllm-project#5952) ...

…llm-project#5143) Introduce W4A4 LAOS Quantization for better model compression and inference efficiency on Ascend devices. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: hfadzxy <starmoon_zhang@163.com>

…llm-project#5143) Introduce W4A4 LAOS Quantization for better model compression and inference efficiency on Ascend devices. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…llm-project#5143) Introduce W4A4 LAOS Quantization for better model compression and inference efficiency on Ascend devices. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: hfadzxy <starmoon_zhang@163.com>

…llm-project#5143) Introduce W4A4 LAOS Quantization for better model compression and inference efficiency on Ascend devices. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…llm-project#5143) Introduce W4A4 LAOS Quantization for better model compression and inference efficiency on Ascend devices. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: hfadzxy <starmoon_zhang@163.com>

whx-sjtu reviewed Dec 17, 2025

View reviewed changes

github-actions bot added the module:quantization label Dec 17, 2025

maxmgrdv force-pushed the w4a4_laos_quantization branch from 5915e23 to 63de002 Compare December 25, 2025 08:04

maxmgrdv force-pushed the w4a4_laos_quantization branch 2 times, most recently from 5c75162 to c2c9f39 Compare January 14, 2026 07:56

whx-sjtu reviewed Jan 14, 2026

View reviewed changes

zhangxinyuehfad force-pushed the w4a4_laos_quantization branch from c2c9f39 to 8338edf Compare January 16, 2026 03:25

zhangxinyuehfad requested review from Yikun and wangxiyuan as code owners January 16, 2026 03:25

zhangxinyuehfad force-pushed the w4a4_laos_quantization branch 4 times, most recently from 0c5a587 to 79aea0d Compare January 16, 2026 03:45

vllm-ascend-ci added ready read for review ready-for-test start test by label for PR labels Jan 16, 2026

github-actions bot added the merge-conflicts label Jan 16, 2026

zhangxinyuehfad force-pushed the w4a4_laos_quantization branch from 79aea0d to f13883c Compare January 16, 2026 08:09

github-actions bot removed the merge-conflicts label Jan 16, 2026

whx-sjtu approved these changes Jan 19, 2026

View reviewed changes

maxmgrdv force-pushed the w4a4_laos_quantization branch 2 times, most recently from e570420 to 1d871ac Compare January 19, 2026 12:13

Added new W4A4_LAOS_DYNAMIC quantization method

6190cd3

Signed-off-by: maxmgrdv <gordeev.maxim@huawei.com> add e2e ci Signed-off-by: hfadzxy <starmoon_zhang@163.com>

maxmgrdv force-pushed the w4a4_laos_quantization branch from 1d871ac to 6190cd3 Compare January 21, 2026 06:27

zzzzwwjj approved these changes Jan 21, 2026

View reviewed changes

wangxiyuan approved these changes Jan 22, 2026

View reviewed changes

wangxiyuan merged commit ef9d836 into vllm-project:main Jan 22, 2026
20 checks passed

SlightwindSec mentioned this pull request Jan 30, 2026

[RFC]: W4A4_LAOS_DYNAMIC Quantization #6417

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method#5143

[Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method#5143
wangxiyuan merged 1 commit intovllm-project:mainfrom
maxmgrdv:w4a4_laos_quantization

maxmgrdv commented Dec 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Uh oh!

whx-sjtu left a comment

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

wangxiyuan commented Jan 5, 2026

Uh oh!

whx-sjtu commented Jan 5, 2026

Uh oh!

whx-sjtu left a comment

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

maxmgrdv commented Dec 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

wangxiyuan commented Jan 5, 2026

Uh oh!

whx-sjtu commented Jan 5, 2026

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

maxmgrdv commented Dec 17, 2025 •

edited by github-actions bot

Loading