Skip to content

[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference#32384

Merged
DarkLight1337 merged 3 commits intovllm-project:mainfrom
karanb192:fix/macos-cpu-xgrammar-dtype
Mar 14, 2026
Merged

[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference#32384
DarkLight1337 merged 3 commits intovllm-project:mainfrom
karanb192:fix/macos-cpu-xgrammar-dtype

Conversation

@karanb192
Copy link
Copy Markdown

@karanb192 karanb192 commented Jan 15, 2026

Summary

  • Fixes guided decoding with xgrammar failing on macOS CPU inference due to dtype mismatch
  • The xgrammar CPU kernel requires float32 logits, but macOS CPU inference uses float16/bfloat16
  • Adds dtype conversion in apply_grammar_bitmask() to convert to float32 before applying the bitmask, then copy back to original dtype

Credits

Fix approach based on @Inokinoki's suggestion in #31901.

Test Plan

Reproduction script from issue:

from pydantic import BaseModel
from vllm import LLM, SamplingParams
from vllm.sampling_params import StructuredOutputsParams

llm = LLM(model="Qwen/Qwen3-0.6B", max_model_len=512)

class OutputJsonFormat(BaseModel):
    python_code: str

base_schema = OutputJsonFormat.model_json_schema()

prompt = "Write a simple hello world in Python:"

sampling_params = SamplingParams(
    temperature=0.0,
    structured_outputs=StructuredOutputsParams(json=base_schema),
)

outputs = llm.generate(prompt, sampling_params)
print(outputs[0].outputs[0].text)

Before fix: ValueError: logits must be of type float32
After fix: Successfully generates structured output

Related Issue

Fixes #31901

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a dtype mismatch bug that occurs during guided decoding with xgrammar on macOS with CPU inference. The fix correctly converts logits to float32 before applying the bitmask to accommodate the CPU kernel's requirement. My review includes a suggestion to make this fix more targeted by adding a device check, which will prevent unnecessary data type conversions and potential performance overhead during GPU inference.

@karanb192
Copy link
Copy Markdown
Author

Test Results on macOS M3 Pro (36GB RAM)

Environment

  • Hardware: MacBook Pro M3 Pro, 36GB RAM
  • macOS: 15.x (arm64)
  • Python: 3.12.11
  • PyTorch: 2.9.1 (MPS available, CUDA not available)
  • xgrammar: 0.1.29
  • vLLM: Built from this branch

Test Results

Test Status
Unit test (dtype conversion logic) ✅ Passed
xgrammar library test (float32, float16, bfloat16) ✅ Passed
Full integration test with Qwen/Qwen3-0.6B ✅ Passed

Integration Test Output

Prompt: Write a simple hello world in Python:
Output: {"python_code": "print('Hello, World!')"}

The model successfully generated structured JSON output using xgrammar on macOS CPU with float16 dtype.


Response to Gemini Code Review

Accepted Gemini's suggestion - Added logits.device.type == "cpu" check to make the fix more targeted.

Analysis

I investigated the xgrammar library and found:

  1. xgrammar 0.1.29 (current) already supports float16/bfloat16 on CPU natively
  2. The original bug existed in older xgrammar versions that only supported float32
  3. The CPU kernel at line 22 in older versions raised ValueError: logits must be of type float32

Why keep this fix?

  • Provides backward compatibility with older xgrammar versions
  • Defensive coding in case of future regressions
  • Minimal overhead with the CPU device check

The updated code now includes the CPU device check as suggested:

if logits.device.type == "cpu" and logits.dtype != torch.float32:

This ensures:

  • No unnecessary dtype conversions on GPU inference
  • Fix only applies where needed (CPU with non-float32 dtype)

Karan Bansal added 2 commits January 15, 2026 10:53
The xgrammar CPU kernel requires float32 logits, but macOS CPU
inference uses float16/bfloat16. This causes a ValueError when
using guided decoding with xgrammar on macOS.

This fix converts logits to float32 before applying the token
bitmask, then copies the result back to the original dtype.

Fixes vllm-project#31901

Signed-off-by: Karan Bansal <karanb192@gmail.com>
Add logits.device.type == "cpu" check to make the fix more targeted
and avoid unnecessary dtype conversions on GPU inference.

Signed-off-by: Karan Bansal <karanb192@gmail.com>
@karanb192
Copy link
Copy Markdown
Author

Hi @DarkLight1337 @ywang96 - could you please review this PR when you have a chance?

This fixes issue #31901 - guided decoding with xgrammar failing on macOS CPU inference due to dtype mismatch. The fix has been tested locally on macOS M3 Pro and all CI checks are passing.

Thank you! 🙏

@LucasWilkinson
Copy link
Copy Markdown
Collaborator

cc @aarnphm @khluu for structured ouputs

@Sberm
Copy link
Copy Markdown

Sberm commented Jan 26, 2026

You stole somebody’s code without any acknowledgement, and you belong in the circus, cause you’s a freaking clown.

@Inokinoki
Copy link
Copy Markdown
Contributor

Inokinoki commented Jan 26, 2026

You stole somebody’s code without any acknowledgement, and you belong in the circus, cause you’s a freaking clown.

Don't take it too hard ;) There is a link in the comment sourcing the issue.

I think it's better to wait for the confirmation and the suggestions from the maintainers, in order to make sure that the fix can generalise.

@karanb192
Copy link
Copy Markdown
Author

Thanks @Inokinoki for clarifying. To be explicit: the fix approach was based on @Inokinoki's suggestion in #31901, which is linked in the PR description ("Fixes #31901"). I've added explicit credit in the PR description to make this clearer.

Happy to discuss the technical merits of the fix with maintainers.

@saattrupdan
Copy link
Copy Markdown

@DarkLight1337 @ywang96, could you please have a look at this short PR? It would enable MacOS support for all our vLLM projects using structured generation, including the EuroEval framework.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 7, 2026 13:00
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 7, 2026
@Inokinoki
Copy link
Copy Markdown
Contributor

Thanks @Inokinoki for clarifying. To be explicit: the fix approach was based on @Inokinoki's suggestion in #31901, which is linked in the PR description ("Fixes #31901"). I've added explicit credit in the PR description to make this clearer.

Happy to discuss the technical merits of the fix with maintainers.

Congrats that this is approved!

I would appreciate it if the maintainer @DarkLight1337 /you would like to add me as a "co-authored-by" in the commit message.
But if this blocks the merge, plz do not do that - just keep me informed, if this quick fix brings any issues.

@DarkLight1337
Copy link
Copy Markdown
Member

I did that already!

@saattrupdan
Copy link
Copy Markdown

@DarkLight1337 With this PR being approved, are the failing tests the reason for not merging? @karanb192 or @Inokinoki, are you able to look at them, if it's needed?

@DarkLight1337
Copy link
Copy Markdown
Member

Rebased the branch, see if the tests pass now

@DarkLight1337 DarkLight1337 merged commit 821fde2 into vllm-project:main Mar 14, 2026
46 checks passed
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
…ject#32384)

Signed-off-by: Karan Bansal <karanb192@gmail.com>
Co-authored-by: Inokinoki <inoki@inoki.cc>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…ject#32384)

Signed-off-by: Karan Bansal <karanb192@gmail.com>
Co-authored-by: Inokinoki <inoki@inoki.cc>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
…ject#32384)

Signed-off-by: Karan Bansal <karanb192@gmail.com>
Co-authored-by: Inokinoki <inoki@inoki.cc>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…ject#32384)

Signed-off-by: Karan Bansal <karanb192@gmail.com>
Co-authored-by: Inokinoki <inoki@inoki.cc>
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
…ject#32384)

Signed-off-by: Karan Bansal <karanb192@gmail.com>
Co-authored-by: Inokinoki <inoki@inoki.cc>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed structured-output v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: Guided decoding with xgrammar failed on macOS CPU inference

8 participants