fix(gguf): Disable bfloat16 for GGUF on blackwell device by kitaekatt · Pull Request #30408 · vllm-project/vllm

kitaekatt · 2025-12-10T17:45:39Z

Summary

Fixes incorrect output from GGUF models on Blackwell (SM 120+) GPUs by defaulting to float16 dtype.

Changes

Default GGUF quantization to float16 (was auto-selecting bfloat16)
Add warning when bfloat16 is explicitly requested on Blackwell

Root Cause

bfloat16 causes precision issues with GGUF quantized weights on Blackwell architecture.

Testing

Tested with multiple GGUF models on RTX 5090.

chatgpt-codex-connector · 2025-12-10T17:45:49Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request aims to fix precision issues with GGUF models on Blackwell GPUs by defaulting to float16. The change correctly disables bfloat16 for GGUF on Blackwell devices. However, I've found a critical issue where the device capability check uses 120 instead of 100 for Blackwell, which would prevent the fix from being applied. I've provided a suggestion to correct this.

kitaekatt · 2025-12-10T18:09:07Z

This PR is a re-opening of #30090. The original branch was accidentally deleted, preventing that PR from being reopened.

@Isotr0py You had approved the previous PR - would appreciate your review on this one as well.

yewentao256

Thanks for the work! One minor update before landing

mergify · 2025-12-11T03:36:45Z

Hi @kitaekatt, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

yewentao256

It would be a little bit unclear for SM (10.0+), let's just remove them and we all know blackwell.

GGUF dequantization kernels use half precision (fp16) internally via the `dfloat` typedef. On Blackwell GPUs (sm_120), using bfloat16 causes garbage output due to dtype mismatch. Approach taken (middle ground): - arg_utils.py: Auto-set dtype to float16 when dtype="auto" for GGUF - gguf.py: Keep bfloat16 in supported_act_dtypes for explicit override This defaults to safe behavior while preserving user control. Users on hardware where bfloat16 works can still use --dtype bfloat16 explicitly. Options considered: 1. Blanket removal of bfloat16 from GGUF - rejected (breaks working configs) 2. Blackwell-specific detection - rejected (maintenance burden, edge cases) 3. Default fp16 + allow explicit bf16 - chosen (simple, safe, preserves choice) Tested on RTX 5090 (sm_120) with Qwen3-4B-GGUF: 583.8 tok/s Signed-off-by: Christina <truffle@gmail.com>

…ity check Instead of removing bfloat16 support globally, use device capability detection to disable bfloat16 only on SM 120+ devices (Blackwell). This preserves bfloat16 support on older architectures where tests show it works correctly, while preventing precision issues on Blackwell. Co-Authored-By: Isotr0py <isotr0py@users.noreply.github.com> Signed-off-by: Christina <truffle@gmail.com>

Per review feedback: the arg_utils.py dtype override breaks Gemma2 GGUF which doesn't support FP16. The Blackwell-specific bfloat16 restriction in gguf.py's get_supported_act_dtypes() is sufficient - let _resolve_auto_dtype handle dtype selection automatically. Signed-off-by: Christina <truffle@gmail.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com>

mergify · 2025-12-11T18:08:40Z

Hi @kitaekatt, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

kitaekatt · 2025-12-11T18:10:53Z

I know this PR is already approved, but I just did some isolated testing of this PR so sharing results.

Tested on Blackwell Hardware

GPU: RTX 5090 (SM 12.0, compute capability 120)

Test: Direct validation of GGUFConfig().get_supported_act_dtypes()

Before (upstream/main):

>>> GGUFConfig().get_supported_act_dtypes()
[torch.float16, torch.bfloat16, torch.float32]  # bfloat16 included ❌

After (this PR):

>>> GGUFConfig().get_supported_act_dtypes()
WARNING: GGUF has precision issues with bfloat16 on Blackwell.
[torch.float16, torch.float32]  # bfloat16 excluded ✓

Validated that bfloat16 is correctly excluded on Blackwell devices.

Remove '(SM 10.0+)' from comment and warning message per reviewer feedback. yewentao256: 'It would be a little bit unclear for SM (10.0+), let's just remove them' Signed-off-by: Christina Norman <christina@example.com> Signed-off-by: Christina <truffle@gmail.com>

kitaekatt · 2025-12-11T18:20:18Z

Fixed DCO (added sign-off) and pre-commit (single-line format).

Cherry-pick the Blackwell dtype fix to ensure GGUF models work correctly on RTX 5090 and other Blackwell GPUs during metadata extraction testing. This fix excludes bfloat16 from supported dtypes on Blackwell devices to avoid precision issues with GGUF dequantization kernels. Signed-off-by: Christina Norman <christina@example.com>

…t#30408) Signed-off-by: Christina <truffle@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Christina Norman <christina@example.com> Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…t#30408) Signed-off-by: Christina <truffle@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Christina Norman <christina@example.com> Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

…t#30408) Signed-off-by: Christina <truffle@gmail.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Christina Norman <christina@example.com> Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

kitaekatt requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners December 10, 2025 17:45

mergify Bot mentioned this pull request Dec 10, 2025

fix: Force float16 dtype for GGUF models to fix incorrect output #30090

Closed

gemini-code-assist Bot reviewed Dec 10, 2025

View reviewed changes

Comment thread vllm/model_executor/layers/quantization/gguf.py Outdated

Isotr0py approved these changes Dec 10, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) December 10, 2025 18:21

Isotr0py changed the title ~~fix(gguf): Default GGUF to float16 while preserving bfloat16 option~~ fix(gguf): Disable bfloat16 for GGUF on sm120 device Dec 10, 2025

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2025

yewentao256 reviewed Dec 10, 2025

View reviewed changes

Comment thread vllm/model_executor/layers/quantization/gguf.py Outdated

Isotr0py changed the title ~~fix(gguf): Disable bfloat16 for GGUF on sm120 device~~ fix(gguf): Disable bfloat16 for GGUF on blackwell device Dec 11, 2025

yewentao256 reviewed Dec 11, 2025

View reviewed changes

Comment thread vllm/model_executor/layers/quantization/gguf.py Outdated

Comment thread vllm/model_executor/layers/quantization/gguf.py Outdated

kitaekatt and others added 4 commits December 11, 2025 11:47

Update vllm/model_executor/layers/quantization/gguf.py

904663e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com>

auto-merge was automatically disabled December 11, 2025 18:04
Head branch was pushed to by a user without write access

kitaekatt force-pushed the fix/30090-gguf-bfloat16-dtype branch from 8493901 to cb5a036 Compare December 11, 2025 18:04

kitaekatt force-pushed the fix/30090-gguf-bfloat16-dtype branch from cb5a036 to 157a9fe Compare December 11, 2025 18:16

Isotr0py merged commit dc13c99 into vllm-project:main Dec 12, 2025
54 checks passed

kitaekatt deleted the fix/30090-gguf-bfloat16-dtype branch December 15, 2025 15:48

kitaekatt mentioned this pull request Dec 15, 2025

fix(gguf): GGUF model support fixes for Blackwell GPUs #30497

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gguf): Disable bfloat16 for GGUF on blackwell device#30408

fix(gguf): Disable bfloat16 for GGUF on blackwell device#30408
Isotr0py merged 5 commits into
vllm-project:mainfrom
kitaekatt:fix/30090-gguf-bfloat16-dtype

kitaekatt commented Dec 10, 2025

Uh oh!

chatgpt-codex-connector Bot commented Dec 10, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

kitaekatt commented Dec 10, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

mergify Bot commented Dec 11, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Dec 11, 2025

Uh oh!

kitaekatt commented Dec 11, 2025

Uh oh!

kitaekatt commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

kitaekatt commented Dec 10, 2025

Summary

Changes

Root Cause

Testing

Uh oh!

chatgpt-codex-connector Bot commented Dec 10, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

kitaekatt commented Dec 10, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify Bot commented Dec 11, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Dec 11, 2025

Uh oh!

kitaekatt commented Dec 11, 2025

Tested on Blackwell Hardware

Uh oh!

kitaekatt commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants