Fix TURBOQUANT backend selection in cuda.py by mgoin · Pull Request #40060 · vllm-project/vllm

mgoin · 2026-04-16T20:55:38Z

Purpose

Added TURBOQUANT to the selection list of attention backends and removed specialized case.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Added TURBOQUANT to the list of attention backends and removed specialized TurboQuant KV cache handling. Signed-off-by: Michael Goin <mgoin64@gmail.com>

gemini-code-assist

Code Review

This pull request integrates TURBOQUANT into the standard attention backend priority lists for CUDA platforms while removing the previous hardcoded bypass for TurboQuant KV cache types. Feedback indicates that moving TURBOQUANT to the end of the priority list is a regression that could lead to incorrect backend selection; it is recommended to use the kv_cache_dtype parameter to prioritize TURBOQUANT when explicitly requested while still allowing for standard validation.

mergify · 2026-04-16T21:00:10Z

Hi @mgoin, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-04-16T21:05:16Z

Hi @mgoin, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Michael Goin <mgoin64@gmail.com>

mergify · 2026-04-16T21:10:16Z

Documentation preview: https://vllm--40060.org.readthedocs.build/en/40060/

Signed-off-by: Michael Goin <mgoin64@gmail.com>

Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

Signed-off-by: Michael Goin <mgoin64@gmail.com>

Fix TURBOQUANT backend selection

c6087aa

Added TURBOQUANT to the list of attention backends and removed specialized TurboQuant KV cache handling. Signed-off-by: Michael Goin <mgoin64@gmail.com>

mergify Bot added the nvidia label Apr 16, 2026

github-project-automation Bot added this to NVIDIA Apr 16, 2026

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread vllm/platforms/cuda.py

mgoin mentioned this pull request Apr 16, 2026

[ROCm] Fix TurboQuant on ROCm: backend routing, flash-attn compat, int64 overflow #39953

Merged

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Apr 16, 2026

mgoin changed the title ~~Fix TURBOQUANT backend selection~~ Fix TURBOQUANT backend selection in cuda.py Apr 16, 2026

Add TURBOQUANT backend to attention backends list

f99cc1e

Signed-off-by: Michael Goin <mgoin64@gmail.com>

mergify Bot added the documentation Improvements or additions to documentation label Apr 16, 2026

gaby mentioned this pull request Apr 17, 2026

[Tracking issue]: TurboQuant/HIGGS Attention follow-ups #40069

Open

13 tasks

vllm-bot merged commit 1174723 into main Apr 17, 2026
50 of 53 checks passed

vllm-bot deleted the fix-tq-selection branch April 17, 2026 14:31

github-project-automation Bot moved this to Done in NVIDIA Apr 17, 2026

Sandermage mentioned this pull request Apr 18, 2026

[Bug/Feature] TurboQuant + Hybrid MoE (Qwen3.6-35B-A3B) broken on Ampere (SM 80-86) — 13 patches with fixes #40124

Open

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 20, 2026

Fix TURBOQUANT backend selection in cuda.py (vllm-project#40060)

cce9b4d

Signed-off-by: Michael Goin <mgoin64@gmail.com>

baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026

Fix TURBOQUANT backend selection in cuda.py (vllm-project#40060)

b5b7d82

Signed-off-by: Michael Goin <mgoin64@gmail.com>

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

Fix TURBOQUANT backend selection in cuda.py (vllm-project#40060)

159850c

Signed-off-by: Michael Goin <mgoin64@gmail.com>

Sandermage mentioned this pull request Apr 26, 2026

[Bugfix][Spec-Decode] TurboQuant K+1 spec-verify routing (fixes #40880) #40914

Open

6 tasks

avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026

Fix TURBOQUANT backend selection in cuda.py (vllm-project#40060)

05f4ac6

Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

Fix TURBOQUANT backend selection in cuda.py (vllm-project#40060)

4b8ebe1

Signed-off-by: Michael Goin <mgoin64@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix TURBOQUANT backend selection in cuda.py#40060

Fix TURBOQUANT backend selection in cuda.py#40060
vllm-bot merged 2 commits intomainfrom
fix-tq-selection

mgoin commented Apr 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mgoin commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoin commented Apr 16, 2026 •

edited

Loading