fix(gguf): Auto-select compatible dtype for GGUF models on Blackwell by kitaekatt · Pull Request #30365 · vllm-project/vllm

kitaekatt · 2025-12-09T23:59:30Z

Summary

Fixes dtype conflict for Gemma3 GGUF models on Blackwell GPUs (SM 120+) where --dtype auto fails because:

Gemma3 blocks float16 (numerical instability) - only allows [bfloat16, float32]
GGUF on Blackwell blocks bfloat16 (dequant kernels use fp16) - only allows [float16, float32]
Intersection: Only float32 works

Error before fix:

torch.bfloat16 is not supported for quantization method gguf.
Supported dtypes: [torch.float16, torch.float32]

Changes

gguf.py: Block bfloat16 on Blackwell (SM 120+) via current_platform.has_device_capability(120)
vllm.py: Add _resolve_dtype_conflict() to find compatible dtype when model restrictions and quantization restrictions conflict. Falls back to float32 when no other option exists.

Test Plan

Tested with google/gemma-3-1b-it GGUF on RTX 5090 (Blackwell)
Server starts successfully with --dtype auto
Inference produces correct output (211 tok/s)
Non-Gemma3 GGUF models still work (Gemma2, Qwen, etc.)

Code Review

This pull request effectively resolves a dtype conflict for GGUF models on Blackwell GPUs, particularly for models like Gemma3 with specific dtype restrictions. The changes are well-implemented. In vllm/model_executor/layers/quantization/gguf.py, bfloat16 is correctly excluded on SM 120+ devices. The new logic in vllm/config/vllm.py for automatic dtype conflict resolution is robust; it finds a compatible dtype by intersecting model and quantization-supported types, selects the most performant option, and warns the user. This is a solid fix that also handles future similar conflicts. I have not identified any high or critical severity issues.

Fixes Gemma3 GGUF models failing on Blackwell GPUs with --dtype auto. Problem: - Gemma3 blocks float16 (numerical instability) - GGUF on Blackwell blocks bfloat16 (precision issues) - Only float32 works, but dtype=auto picks bfloat16 → fails Changes: 1. gguf.py: Block bfloat16 on SM 120+ (Blackwell) devices 2. vllm.py: Auto-select compatible dtype when model and quantization restrictions conflict, instead of failing with an error This allows --dtype auto to work correctly with Gemma3 GGUF on Blackwell by automatically falling back to float32. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mergify · 2025-12-10T17:46:42Z

⚠️ The sha of the head commit of this PR conflicts with #30410. Mergify cannot evaluate rules on this PR. ⚠️

kitaekatt requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners December 9, 2025 23:59

gemini-code-assist Bot reviewed Dec 10, 2025

View reviewed changes

kitaekatt force-pushed the fix-gemma3-gguf-dtype branch from 41f57b4 to 9b115ed Compare December 10, 2025 00:03

This was referenced Dec 10, 2025

fix(gguf): Ensure Gemma2 configs have hidden_act for backward compatibility #30404

Closed

fix(gguf): Skip lm_head mapping for models with tied word embeddings #30405

Closed

kitaekatt closed this Dec 10, 2025

kitaekatt deleted the fix-gemma3-gguf-dtype branch December 10, 2025 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gguf): Auto-select compatible dtype for GGUF models on Blackwell#30365

fix(gguf): Auto-select compatible dtype for GGUF models on Blackwell#30365
kitaekatt wants to merge 1 commit into
vllm-project:mainfrom
kitaekatt:fix-gemma3-gguf-dtype

kitaekatt commented Dec 9, 2025

Uh oh!

chatgpt-codex-connector Bot commented Dec 9, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mergify Bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kitaekatt commented Dec 9, 2025

Summary

Changes

Test Plan

Related

Uh oh!

chatgpt-codex-connector Bot commented Dec 9, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify Bot commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant