[Feature][Quantization] auto_round format add support for regex by n1ck-guo · Pull Request #24024 · vllm-project/vllm

n1ck-guo · 2025-09-01T06:07:15Z

Purpose

auto_round format add support for regex

Test Plan

Load auto_round quantized model with extra_config including regular expressions and full name of layers.

Test Result

With the change, each layer of linear that satisfies the regex in extra_config (for example, ".*mlp.down_proj": {"bits": 16}) can obtain the correct bits
Successful load mixed bits quantization model with auto-round quant_method

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: n1ck-guo <heng.guo@intel.com>

gemini-code-assist

Code Review

This pull request introduces support for regular expressions in AutoRound's extra_config, which is a valuable feature for defining quantization settings for groups of layers. However, the current implementation has a critical correctness issue where literal layer names can be misinterpreted as regex patterns, potentially leading to incorrect quantization. My review provides a comment with a suggested code change to address this by using a heuristic to differentiate between literal names and regex patterns, which also improves performance.

vllm/model_executor/layers/quantization/auto_round.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Heng Guo <heng.guo@intel.com>

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…to autoround_regex

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo · 2025-09-24T05:41:02Z

@mgoin @robertgshaw2-redhat @tlrmchlsmth @yewentao256 could you please help to review this pr.

yewentao256

Could you introduce more context about this PR?
Eg, which model is using it. Without this PR what it would be like, and with this PR, etc
Also, showing lm_eval result for accuracy and vllm bench for perf is helpful.

n1ck-guo · 2025-10-13T06:22:50Z

@yewentao256 Currently, auto_round is implemented for mixed-precision quantization models by saving the full name of all models. In the future, with the support of this pr, we hope to add support for regularization. This pr primarily reads the regularization configuration. All the model quantized by auto_round will use this pr in future.

For example, if I use this script to generate a quantized qwen model:

from auto_round import AutoRound
model_path = "Qwen/Qwen3-15B-A2B-Base/"
layer_config = {
    "self_attn.[koqv]_proj$": {"bits": 8},
}
ar = AutoRound(model=model_path, scheme="W4A16", layer_config=layer_config, iters=1)
ar.quantize_and_save("Qwen3-15B-A2B-Base-vllm-regex-test")

This config.json will include a parameter to show that all non-expert linear will fallback to 16 bits. For old version, it should look like this:

"quantization_config": {
    "autoround_version": "0.8.0.dev",
    "bits": 4,
    "data_type": "int",
    "extra_config": {
      "model.layers.0.self_attn.k_proj": {
        "bits": 8
      },
      "model.layers.0.self_attn.o_proj": {
        "bits": 8
      },
      "model.layers.0.self_attn.q_proj": {
        "bits": 8
      },
      "model.layers.0.self_attn.v_proj": {
        "bits": 8
      },
      "model.layers.1.self_attn.k_proj": {
        "bits": 8
      },
      "model.layers.1.self_attn.o_proj": {
        "bits": 8
      },
      "model.layers.1.self_attn.q_proj": {
        "bits": 8
      },

And with the support of this pr, it can be simplified to

"quantization_config": {
    "autoround_version": "0.8.0.dev",
    "bits": 4,
    "data_type": "int",
    "extra_config": {
        "self_attn.[koqv]_proj$": {"bits": 8},
}

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…to autoround_regex

n1ck-guo · 2025-10-14T00:13:16Z

This pr will not affect the accuracy of the model, this is our test result：
model is build by the script above.

task	bf16	this pr	main branch
average	0.6324	0.6305	0.6299
arc_challenge	0.4838	0.5026	0.5077
arc_easy	0.7866	0.7942	0.7963
boolq	0.8361	0.8287	0.8269
hellaswag	0.5888	0.5765	0.5772
lambada_openai	0.7402	0.7339	0.7320
mmlu	0.7347	0.7213	0.7200
openbookqa	0.3180	0.3140	0.3060
piqa	0.7894	0.7862	0.7840
truthfulqa_mc1	0.3745	0.3758	0.3733
wikitext	0.5871	0.6022	0.6023
winogrande	0.7174	0.7001	0.7032

Signed-off-by: n1ck-guo <heng.guo@intel.com>

mgoin

LGTM, just one issue

vllm/model_executor/layers/quantization/auto_round.py

Signed-off-by: n1ck-guo <heng.guo@intel.com>

…-project#24024) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

…-project#24024) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

…-project#24024) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: bbartels <benjamin@bartels.dev>

…-project#24024) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…-project#24024) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…-project#24024) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add support for regex

9bab5a4

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners September 1, 2025 06:07

gemini-code-assist bot reviewed Sep 1, 2025

View reviewed changes

vllm/model_executor/layers/quantization/auto_round.py Outdated Show resolved Hide resolved

n1ck-guo and others added 5 commits September 1, 2025 14:25

Update vllm/model_executor/layers/quantization/auto_round.py

ef62369

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Heng Guo <heng.guo@intel.com>

pre-commit

9924a05

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'autoround_regex' of https://github.com/n1ck-guo/vllm in…

5090610

…to autoround_regex

clean

67e33cd

Signed-off-by: n1ck-guo <heng.guo@intel.com>

format

4b63c8e

Signed-off-by: n1ck-guo <heng.guo@intel.com>

n1ck-guo changed the title ~~auto_round format add support for regex~~ [Feature][Quantization] auto_round format add support for regex Sep 2, 2025

Merge branch 'main' into autoround_regex

3e29d9d

yewentao256 reviewed Sep 25, 2025

View reviewed changes

n1ck-guo added 4 commits October 13, 2025 16:32

Merge branch 'main' into autoround_regex

1f388cc

update

3065ae5

Signed-off-by: n1ck-guo <heng.guo@intel.com>

preci

3efb8e6

Signed-off-by: n1ck-guo <heng.guo@intel.com>

Merge branch 'autoround_regex' of https://github.com/n1ck-guo/vllm in…

5a7652a

…to autoround_regex

update

93782ce

Signed-off-by: n1ck-guo <heng.guo@intel.com>

mgoin reviewed Oct 14, 2025

View reviewed changes

vllm/model_executor/layers/quantization/auto_round.py Outdated Show resolved Hide resolved

n1ck-guo added 2 commits October 13, 2025 20:46

change import

75b7c98

Signed-off-by: n1ck-guo <heng.guo@intel.com>

preci

044ca3c

Signed-off-by: n1ck-guo <heng.guo@intel.com>

mgoin added quantization ready ONLY add when PR is ready to merge/full CI is needed labels Oct 14, 2025

mgoin approved these changes Oct 14, 2025

View reviewed changes

mgoin enabled auto-merge (squash) October 14, 2025 00:53

mgoin merged commit 2935092 into vllm-project:main Oct 14, 2025
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature][Quantization] auto_round format add support for regex#24024

[Feature][Quantization] auto_round format add support for regex#24024
mgoin merged 14 commits intovllm-project:mainfrom
n1ck-guo:autoround_regex

n1ck-guo commented Sep 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

n1ck-guo commented Sep 24, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

n1ck-guo commented Oct 13, 2025

Uh oh!

n1ck-guo commented Oct 14, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

n1ck-guo commented Sep 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

With the change, each layer of linear that satisfies the regex in extra_config (for example, ".*mlp.down_proj": {"bits": 16}) can obtain the correct bits Successful load mixed bits quantization model with auto-round quant_method

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

n1ck-guo commented Sep 24, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

n1ck-guo commented Oct 13, 2025

Uh oh!

n1ck-guo commented Oct 14, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

n1ck-guo commented Sep 1, 2025 •

edited by github-actions bot

Loading

With the change, each layer of linear that satisfies the regex in extra_config (for example, ".*mlp.down_proj": {"bits": 16}) can obtain the correct bits
Successful load mixed bits quantization model with auto-round quant_method