[Feat] dnnl build for AVX2 W8A8 Int8 by tianmu-li · Pull Request #41318 · vllm-project/vllm

tianmu-li · 2026-04-30T02:16:16Z

Purpose

The CPU backend's W8A8 INT8 quantization ops (static_scaled_int8_quant, dynamic_scaled_int8_quant, onednn_scaled_mm) were gated behind __AVX512F__ and completely absent from the _C_AVX2 shared library. Running a compressed-tensors W8A8 INT8 model on an AVX2-only host (E.g: Xeon-6 with E-cores) resulted in a missing-symbol error at runtime. This PR links _C_AVX2 against the existing dnnl_ext and adds avx2 operators needed for quantization. int8 quantization is especially beneficial to AVX2, as bf16/fp16 models run at fp32 rate on AVX2.

Note: dnnl_ext now compiles with -mavx2. onednn detects isa and jit-compiles kernels during runtime, so I don't expect it to be a problem.

Test Plan

Test platform: an AVX2-enabled platform
Server

export VLLM_CPU_KVCACHE_SPACE=20
export VLLM_CPU_OMP_THREADS_BIND=0-47
MODEL_NAME={RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8 or meta-llama/Llama-3.1-8B-Instruct}
vllm serve $MODEL_NAME \
  --served-model-name meta-llama/Llama-3.1-8B-Instruct \
  --port 8868 --host 0.0.0.0 \
  --no-enable-prefix-caching \
  --max-model-len=16384 \
  --max-num-batched-tokens=8192

Client

MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
vllm bench serve --model $MODEL_NAME \
  --num-prompt 50 \
  --port=8868 \
  --random-input-len 128 \
  --random-output-len 128

Test Result

Dtype	Before throughput (toks/s)	After throughput (toks/s)
bf16	69	72.5
int8	DNR	172.7

AI assistance

This PR was developed with Claude Code assistance. All changed lines have been reviewed by the submitting author.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

…d for both avx2 and avx512 Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request enables oneDNN support for AVX2 architectures by updating CMake configurations and providing AVX2-compatible implementations for vector operations, including masked stores, clamping, and reductions. It also fixes a loop increment bug in the dynamic quantization kernel where the index was being incremented by one instead of the vector element count. I have no feedback to provide.

bigPYJ1151

Thanks! LGTM :)

mergify · 2026-04-30T13:35:03Z

Hi @tianmu-li, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-04-30T14:09:38Z

Hi @tianmu-li, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

tianmu-li · 2026-04-30T16:03:33Z

Found some issues in an apple silicon smoke test https://github.com/tianmu-li/vllm/actions/runs/25149522905/job/73716695058#logs, will need to merge/rebase after #41387. Also, some potential compilation issues on ARM using dnnl that needs fixing

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

…/avx2_w8a8

louie-tsai · 2026-05-05T22:09:04Z

loop @louie-tsai

Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>

Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Signed-off-by: Mehdi Ghanimifard <mehdi.ghanimifard@amd.com>

Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>

Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Signed-off-by: Libin Tang <libin.tang@intel.com>

tianmu-li added 10 commits April 27, 2026 14:19

Fix compile flag; add avx2 creation ops

30a5afe

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

Merge remote-tracking branch 'origin/main' into feat/avx2_w8a8

8d29e63

Fix cmake

702c6ca

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

Cleanup

1d80107

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

Add comments for serialized cases

d2c79a3

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

Add missing AVX2 methods

494748b

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

Merge remote-tracking branch 'origin/main' into feat/avx2_w8a8

c1306ff

Merge remote-tracking branch 'origin/main' into feat/avx2_w8a8

ea6edc4

Remove separate dnnl_ext_avx2

d689970

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

Use avx2 flag when building dnnl so that the same dnnl_ext can be use…

22bf130

…d for both avx2 and avx512 Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

tianmu-li requested a review from bigPYJ1151 as a code owner April 30, 2026 02:16

claude Bot reviewed Apr 30, 2026

View reviewed changes

mergify Bot added ci/build cpu Related to CPU backends labels Apr 30, 2026

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

bigPYJ1151 approved these changes Apr 30, 2026

View reviewed changes

Merge branch 'main' into feat/avx2_w8a8

d671609

bigPYJ1151 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 30, 2026

bigPYJ1151 enabled auto-merge (squash) April 30, 2026 11:47

Merge branch 'main' into feat/avx2_w8a8

824ccd1

stecasta mentioned this pull request Apr 30, 2026

[CI/Build] pre-commit pip-compile fails for ~half of recent PRs: PyPI quarantined the lightning project #41376

Closed

Merge branch 'main' into feat/avx2_w8a8

4a25b4b

tianmu-li marked this pull request as draft April 30, 2026 16:02

auto-merge was automatically disabled April 30, 2026 16:02
Pull request was converted to draft

tianmu-li added 2 commits April 30, 2026 09:09

Use AVX2 for dnnl only for x86

333465b

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>

Merge remote-tracking branch 'origin-tianmu/feat/avx2_w8a8' into feat…

5a0c938

…/avx2_w8a8

tianmu-li marked this pull request as ready for review April 30, 2026 20:55

Merge branch 'main' into feat/avx2_w8a8

7f2c281

Merge branch 'main' into feat/avx2_w8a8

3e005eb

bigPYJ1151 merged commit e87e09a into vllm-project:main May 6, 2026
20 checks passed

chaojun-zhang pushed a commit to chaojun-zhang/vllm that referenced this pull request May 6, 2026

[Feat] dnnl build for AVX2 W8A8 Int8 (vllm-project#41318)

9583ba2

Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] dnnl build for AVX2 W8A8 Int8#41318

[Feat] dnnl build for AVX2 W8A8 Int8#41318
bigPYJ1151 merged 17 commits intovllm-project:mainfrom
tianmu-li:feat/avx2_w8a8

tianmu-li commented Apr 30, 2026 •

edited by github-actions Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

bigPYJ1151 left a comment

Uh oh!

mergify Bot commented Apr 30, 2026

Uh oh!

mergify Bot commented Apr 30, 2026

Uh oh!

tianmu-li commented Apr 30, 2026

Uh oh!

louie-tsai commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

tianmu-li commented Apr 30, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

AI assistance

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

bigPYJ1151 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 30, 2026

Uh oh!

mergify Bot commented Apr 30, 2026

Uh oh!

tianmu-li commented Apr 30, 2026

Uh oh!

louie-tsai commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tianmu-li commented Apr 30, 2026 •

edited by github-actions Bot

Loading