[Quantization][Refactor] Move CPU GPTQ kernel into MP linear by bigPYJ1151 · Pull Request #31801 · vllm-project/vllm

bigPYJ1151 · 2026-01-06T10:47:32Z

Purpose

part of #31689

Test Plan

https://buildkite.com/vllm/fastcheck/builds/44891#019b92e9-b55f-4146-a633-3e486ac0da68

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request refactors the CPU GPTQ kernel by integrating it into the Mixed-Precision (MP) linear layer framework. This is a good architectural improvement, moving the logic into a new CPUWNA16LinearKernel and extending the gptq_marlin quantization to support CPU. While the overall direction is positive, I've identified a couple of critical issues in the implementation that need to be addressed to ensure correctness.

vllm/model_executor/layers/quantization/kernels/mixed_precision/cpu.py

Signed-off-by: jiang1.li <jiang1.li@intel.com>

…n/cpu.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Signed-off-by: jiang1.li <jiang1.li@intel.com>

robertgshaw2-redhat · 2026-01-06T14:48:45Z

nice job

robertgshaw2-redhat · 2026-01-06T15:15:02Z

LGTM

Signed-off-by: jiang1.li <jiang1.li@intel.com>

ProExpertProg · 2026-01-06T17:14:03Z

vllm/model_executor/layers/quantization/kernels/mixed_precision/cpu.py

+    @classmethod
+    def get_min_capability(cls) -> int:
+        return -1


We should change this to a is_supported idiom instead of min_capability, #31821.

…oject#31801) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…oject#31801) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…oject#31801) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

bigPYJ1151 requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners January 6, 2026 10:47

mergify bot added the cpu Related to CPU backends label Jan 6, 2026

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

vllm/model_executor/layers/quantization/kernels/mixed_precision/cpu.py Outdated Show resolved Hide resolved

vllm/model_executor/layers/quantization/kernels/mixed_precision/cpu.py Show resolved Hide resolved

bigPYJ1151 and others added 2 commits January 6, 2026 14:09

refactore

ccd0a9f

Signed-off-by: jiang1.li <jiang1.li@intel.com>

bigPYJ1151 force-pushed the reorg_wna16 branch from 43c75fa to 05d7795 Compare January 6, 2026 14:10

robertgshaw2-redhat added this to MoE Refactor Jan 6, 2026

github-project-automation bot moved this to Backlog in MoE Refactor Jan 6, 2026

robertgshaw2-redhat moved this from Backlog to In progress in MoE Refactor Jan 6, 2026

robertgshaw2-redhat approved these changes Jan 6, 2026

View reviewed changes

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026

Merge branch 'main' into reorg_wna16

ee8f98d

robertgshaw2-redhat enabled auto-merge (squash) January 6, 2026 15:15

robertgshaw2-redhat moved this from In progress to In review in MoE Refactor Jan 6, 2026

fix exllama

7414027

Signed-off-by: jiang1.li <jiang1.li@intel.com>

ProExpertProg reviewed Jan 6, 2026

View reviewed changes

robertgshaw2-redhat merged commit 8becf14 into vllm-project:main Jan 6, 2026
56 checks passed

github-project-automation bot moved this from In review to Done in MoE Refactor Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quantization][Refactor] Move CPU GPTQ kernel into MP linear#31801

[Quantization][Refactor] Move CPU GPTQ kernel into MP linear#31801
robertgshaw2-redhat merged 4 commits intovllm-project:mainfrom
bigPYJ1151:reorg_wna16

bigPYJ1151 commented Jan 6, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jan 6, 2026

Uh oh!

robertgshaw2-redhat commented Jan 6, 2026

Uh oh!

ProExpertProg Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

bigPYJ1151 commented Jan 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jan 6, 2026

Uh oh!

robertgshaw2-redhat commented Jan 6, 2026

Uh oh!

ProExpertProg Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bigPYJ1151 commented Jan 6, 2026 •

edited by github-actions bot

Loading