Skip to content

[Quantization][Refactor] Move CPU GPTQ kernel into MP linear#31801

Merged
robertgshaw2-redhat merged 4 commits intovllm-project:mainfrom
bigPYJ1151:reorg_wna16
Jan 6, 2026
Merged

[Quantization][Refactor] Move CPU GPTQ kernel into MP linear#31801
robertgshaw2-redhat merged 4 commits intovllm-project:mainfrom
bigPYJ1151:reorg_wna16

Conversation

@bigPYJ1151
Copy link
Copy Markdown
Member

@bigPYJ1151 bigPYJ1151 commented Jan 6, 2026

Purpose

part of #31689

Test Plan

https://buildkite.com/vllm/fastcheck/builds/44891#019b92e9-b55f-4146-a633-3e486ac0da68

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the CPU GPTQ kernel by integrating it into the Mixed-Precision (MP) linear layer framework. This is a good architectural improvement, moving the logic into a new CPUWNA16LinearKernel and extending the gptq_marlin quantization to support CPU. While the overall direction is positive, I've identified a couple of critical issues in the implementation that need to be addressed to ensure correctness.

bigPYJ1151 and others added 2 commits January 6, 2026 14:09
Signed-off-by: jiang1.li <jiang1.li@intel.com>
…n/cpu.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

nice job

@github-project-automation github-project-automation bot moved this to Backlog in MoE Refactor Jan 6, 2026
@robertgshaw2-redhat robertgshaw2-redhat moved this from Backlog to In progress in MoE Refactor Jan 6, 2026
@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

LGTM

@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) January 6, 2026 15:15
@robertgshaw2-redhat robertgshaw2-redhat moved this from In progress to In review in MoE Refactor Jan 6, 2026
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Comment on lines +20 to +22
@classmethod
def get_min_capability(cls) -> int:
return -1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to a is_supported idiom instead of min_capability, #31821.

@robertgshaw2-redhat robertgshaw2-redhat merged commit 8becf14 into vllm-project:main Jan 6, 2026
56 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in MoE Refactor Jan 6, 2026
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
…oject#31801)

Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
…oject#31801)

Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…oject#31801)

Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…oject#31801)

Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu Related to CPU backends ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants