[Feat] dnnl build for AVX2 W8A8 Int8#41318
Conversation
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
…d for both avx2 and avx512 Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
There was a problem hiding this comment.
Code Review
This pull request enables oneDNN support for AVX2 architectures by updating CMake configurations and providing AVX2-compatible implementations for vector operations, including masked stores, clamping, and reductions. It also fixes a loop increment bug in the dynamic quantization kernel where the index was being incremented by one instead of the vector element count. I have no feedback to provide.
|
Hi @tianmu-li, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
1 similar comment
|
Hi @tianmu-li, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Pull request was converted to draft
|
Found some issues in an apple silicon smoke test https://github.com/tianmu-li/vllm/actions/runs/25149522905/job/73716695058#logs, will need to merge/rebase after #41387. Also, some potential compilation issues on ARM using dnnl that needs fixing |
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
|
loop @louie-tsai |
Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Signed-off-by: Mehdi Ghanimifard <mehdi.ghanimifard@amd.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com> Signed-off-by: Libin Tang <libin.tang@intel.com>
Purpose
The CPU backend's W8A8 INT8 quantization ops (
static_scaled_int8_quant,dynamic_scaled_int8_quant,onednn_scaled_mm) were gated behind__AVX512F__and completely absent from the_C_AVX2shared library. Running a compressed-tensors W8A8 INT8 model on an AVX2-only host (E.g: Xeon-6 with E-cores) resulted in a missing-symbol error at runtime. This PR links _C_AVX2 against the existingdnnl_extand adds avx2 operators needed for quantization. int8 quantization is especially beneficial to AVX2, as bf16/fp16 models run at fp32 rate on AVX2.Note: dnnl_ext now compiles with -mavx2. onednn detects isa and jit-compiles kernels during runtime, so I don't expect it to be a problem.
Test Plan
Test platform: an AVX2-enabled platform
Server
Client
Test Result
AI assistance
This PR was developed with Claude Code assistance. All changed lines have been reviewed by the submitting author.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.