Skip to content

[Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method#5143

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
maxmgrdv:w4a4_laos_quantization
Jan 22, 2026
Merged

[Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method#5143
wangxiyuan merged 1 commit intovllm-project:mainfrom
maxmgrdv:w4a4_laos_quantization

Conversation

@maxmgrdv
Copy link
Copy Markdown
Contributor

@maxmgrdv maxmgrdv commented Dec 17, 2025

Introduce W4A4 LAOS Quantization for better model compression and inference efficiency on Ascend devices.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

Copy link
Copy Markdown
Collaborator

@whx-sjtu whx-sjtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new w4a4 quantization algorithm has significant accuracy and performance gain compared to current FlatQuant w4a4. Please provide the accuracy and performance test results in a new comment, thanks @maxmgrdv .

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@maxmgrdv maxmgrdv force-pushed the w4a4_laos_quantization branch from 5915e23 to 63de002 Compare December 25, 2025 08:04
@wangxiyuan
Copy link
Copy Markdown
Collaborator

Any progress? If this PR is still alive, please rebase to main and make CI happy, otherwise you can close it. Thanks

@whx-sjtu
Copy link
Copy Markdown
Collaborator

whx-sjtu commented Jan 5, 2026

Any progress? If this PR is still alive, please rebase to main and make CI happy, otherwise you can close it. Thanks

Related colleagues are in holiday right now. Next week I will push this PR to be merged after they come back.

@maxmgrdv maxmgrdv force-pushed the w4a4_laos_quantization branch 2 times, most recently from 5c75162 to c2c9f39 Compare January 14, 2026 07:56
Copy link
Copy Markdown
Collaborator

@whx-sjtu whx-sjtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that there exists an eagle3-related bug that will decrease acceptance rate to 0 for this quant method. Dose related fix codes included in this PR? I don't find it.

@zhangxinyuehfad zhangxinyuehfad force-pushed the w4a4_laos_quantization branch 4 times, most recently from 0c5a587 to 79aea0d Compare January 16, 2026 03:45
@vllm-ascend-ci vllm-ascend-ci added ready read for review ready-for-test start test by label for PR labels Jan 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@maxmgrdv maxmgrdv force-pushed the w4a4_laos_quantization branch 2 times, most recently from e570420 to 1d871ac Compare January 19, 2026 12:13
Signed-off-by: maxmgrdv <gordeev.maxim@huawei.com>

add e2e ci

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
@maxmgrdv maxmgrdv force-pushed the w4a4_laos_quantization branch from 1d871ac to 6190cd3 Compare January 21, 2026 06:27
@wangxiyuan wangxiyuan merged commit ef9d836 into vllm-project:main Jan 22, 2026
20 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 22, 2026
…to qwen3next_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (51 commits)
  [Bugfix] Remove `use_aclgraph` in mtp_proposer and use `use_cuda_graph` (vllm-project#6032)
  [BugFix] fix 3vl dense model load quant weight (vllm-project#6100)
  [CP&SP] Integrate FIA operator in mla_cp._forward_decode (vllm-project#5641)
  [CI][Doc] Upgrade wheel building's CANN to 8.5.0 and update the Docs (vllm-project#6145)
  [CI]Install clang in dokerfile for triton ascend (vllm-project#4409)
  [Main] Upgrade PTA to 2.9.0 (vllm-project#6112)
  [Graph][Fusion] Add QKVNormRope and QKVNormRopeWithBias (vllm-project#5721)
  [P/D][PCP]bugfix pcp force free twice caused logger error (vllm-project#6124)
  [BugFix]converting pa get_workspace back to capturing (vllm-project#5833)
  [CI] optimize lint term (vllm-project#5986)
  [Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6042)
  [bugfix][npugraph_ex]fix the model output type issue caused by manually modify FX graph (vllm-project#6015)
  [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#6097)
  [Misc] Bump mooncake version to v0.3.8.post1 (vllm-project#6110)
  [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8 or not moe [RFC: issue 5476] (vllm-project#5758)
  [bugfix] adapt_remote_request_id (vllm-project#6051)
  [Feature] Add support of new W4A4_LAOS_DYNAMIC quantization method (vllm-project#5143)
  [Feature] Support DSA-CP for Hybrid scenario (vllm-project#5702)
  [CI] Upgrade CANN to 8.5.0 (vllm-project#6070)
  Default enable MLAPO (vllm-project#5952)
  ...
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…llm-project#5143)

Introduce W4A4 LAOS Quantization for better model compression and
inference efficiency on Ascend devices.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…llm-project#5143)

Introduce W4A4 LAOS Quantization for better model compression and
inference efficiency on Ascend devices.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…llm-project#5143)

Introduce W4A4 LAOS Quantization for better model compression and
inference efficiency on Ascend devices.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…llm-project#5143)

Introduce W4A4 LAOS Quantization for better model compression and
inference efficiency on Ascend devices.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…llm-project#5143)

Introduce W4A4 LAOS Quantization for better model compression and
inference efficiency on Ascend devices.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:quantization ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants