Skip to content

fix(gguf): Disable bfloat16 for GGUF on blackwell device#30408

Merged
Isotr0py merged 5 commits into
vllm-project:mainfrom
kitaekatt:fix/30090-gguf-bfloat16-dtype
Dec 12, 2025
Merged

fix(gguf): Disable bfloat16 for GGUF on blackwell device#30408
Isotr0py merged 5 commits into
vllm-project:mainfrom
kitaekatt:fix/30090-gguf-bfloat16-dtype

Conversation

@kitaekatt
Copy link
Copy Markdown
Contributor

Summary

Fixes incorrect output from GGUF models on Blackwell (SM 120+) GPUs by defaulting to float16 dtype.

Changes

  • Default GGUF quantization to float16 (was auto-selecting bfloat16)
  • Add warning when bfloat16 is explicitly requested on Blackwell

Root Cause

bfloat16 causes precision issues with GGUF quantized weights on Blackwell architecture.

Testing

Tested with multiple GGUF models on RTX 5090.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix precision issues with GGUF models on Blackwell GPUs by defaulting to float16. The change correctly disables bfloat16 for GGUF on Blackwell devices. However, I've found a critical issue where the device capability check uses 120 instead of 100 for Blackwell, which would prevent the fix from being applied. I've provided a suggestion to correct this.

Comment thread vllm/model_executor/layers/quantization/gguf.py Outdated
@kitaekatt
Copy link
Copy Markdown
Contributor Author

This PR is a re-opening of #30090. The original branch was accidentally deleted, preventing that PR from being reopened.

@Isotr0py You had approved the previous PR - would appreciate your review on this one as well.

@Isotr0py Isotr0py enabled auto-merge (squash) December 10, 2025 18:21
@Isotr0py Isotr0py changed the title fix(gguf): Default GGUF to float16 while preserving bfloat16 option fix(gguf): Disable bfloat16 for GGUF on sm120 device Dec 10, 2025
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2025
Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! One minor update before landing

Comment thread vllm/model_executor/layers/quantization/gguf.py Outdated
@Isotr0py Isotr0py changed the title fix(gguf): Disable bfloat16 for GGUF on sm120 device fix(gguf): Disable bfloat16 for GGUF on blackwell device Dec 11, 2025
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Dec 11, 2025

Hi @kitaekatt, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a little bit unclear for SM (10.0+), let's just remove them and we all know blackwell.

Comment thread vllm/model_executor/layers/quantization/gguf.py Outdated
Comment thread vllm/model_executor/layers/quantization/gguf.py Outdated
kitaekatt and others added 4 commits December 11, 2025 11:47
GGUF dequantization kernels use half precision (fp16) internally via the
`dfloat` typedef. On Blackwell GPUs (sm_120), using bfloat16 causes garbage
output due to dtype mismatch.

Approach taken (middle ground):
- arg_utils.py: Auto-set dtype to float16 when dtype="auto" for GGUF
- gguf.py: Keep bfloat16 in supported_act_dtypes for explicit override

This defaults to safe behavior while preserving user control. Users on
hardware where bfloat16 works can still use --dtype bfloat16 explicitly.

Options considered:
1. Blanket removal of bfloat16 from GGUF - rejected (breaks working configs)
2. Blackwell-specific detection - rejected (maintenance burden, edge cases)
3. Default fp16 + allow explicit bf16 - chosen (simple, safe, preserves choice)

Tested on RTX 5090 (sm_120) with Qwen3-4B-GGUF: 583.8 tok/s

Signed-off-by: Christina <truffle@gmail.com>
…ity check

Instead of removing bfloat16 support globally, use device capability
detection to disable bfloat16 only on SM 120+ devices (Blackwell).

This preserves bfloat16 support on older architectures where tests show
it works correctly, while preventing precision issues on Blackwell.

Co-Authored-By: Isotr0py <isotr0py@users.noreply.github.com>
Signed-off-by: Christina <truffle@gmail.com>
Per review feedback: the arg_utils.py dtype override breaks Gemma2 GGUF
which doesn't support FP16. The Blackwell-specific bfloat16 restriction
in gguf.py's get_supported_act_dtypes() is sufficient - let
_resolve_auto_dtype handle dtype selection automatically.

Signed-off-by: Christina <truffle@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
auto-merge was automatically disabled December 11, 2025 18:04

Head branch was pushed to by a user without write access

@kitaekatt kitaekatt force-pushed the fix/30090-gguf-bfloat16-dtype branch from 8493901 to cb5a036 Compare December 11, 2025 18:04
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Dec 11, 2025

Hi @kitaekatt, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@kitaekatt
Copy link
Copy Markdown
Contributor Author

I know this PR is already approved, but I just did some isolated testing of this PR so sharing results.

Tested on Blackwell Hardware

GPU: RTX 5090 (SM 12.0, compute capability 120)

Test: Direct validation of GGUFConfig().get_supported_act_dtypes()

Before (upstream/main):

>>> GGUFConfig().get_supported_act_dtypes()
[torch.float16, torch.bfloat16, torch.float32]  # bfloat16 included ❌

After (this PR):

>>> GGUFConfig().get_supported_act_dtypes()
WARNING: GGUF has precision issues with bfloat16 on Blackwell.
[torch.float16, torch.float32]  # bfloat16 excluded ✓

Validated that bfloat16 is correctly excluded on Blackwell devices.

Remove '(SM 10.0+)' from comment and warning message per reviewer feedback.
yewentao256: 'It would be a little bit unclear for SM (10.0+), let's just remove them'

Signed-off-by: Christina Norman <christina@example.com>
Signed-off-by: Christina <truffle@gmail.com>
@kitaekatt kitaekatt force-pushed the fix/30090-gguf-bfloat16-dtype branch from cb5a036 to 157a9fe Compare December 11, 2025 18:16
@kitaekatt
Copy link
Copy Markdown
Contributor Author

Fixed DCO (added sign-off) and pre-commit (single-line format).

kitaekatt added a commit to kitaekatt/vllm that referenced this pull request Dec 11, 2025
Cherry-pick the Blackwell dtype fix to ensure GGUF models work correctly
on RTX 5090 and other Blackwell GPUs during metadata extraction testing.

This fix excludes bfloat16 from supported dtypes on Blackwell devices
to avoid precision issues with GGUF dequantization kernels.

Signed-off-by: Christina Norman <christina@example.com>
@Isotr0py Isotr0py merged commit dc13c99 into vllm-project:main Dec 12, 2025
54 checks passed
@kitaekatt kitaekatt deleted the fix/30090-gguf-bfloat16-dtype branch December 15, 2025 15:48
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025
…t#30408)

Signed-off-by: Christina <truffle@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Christina Norman <christina@example.com>
Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Majid-Taheri pushed a commit to Majid-Taheri/vllm that referenced this pull request Dec 23, 2025
…t#30408)

Signed-off-by: Christina <truffle@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Christina Norman <christina@example.com>
Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…t#30408)

Signed-off-by: Christina <truffle@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Christina Norman <christina@example.com>
Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants