[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference#32384
[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference#32384DarkLight1337 merged 3 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request addresses a dtype mismatch bug that occurs during guided decoding with xgrammar on macOS with CPU inference. The fix correctly converts logits to float32 before applying the bitmask to accommodate the CPU kernel's requirement. My review includes a suggestion to make this fix more targeted by adding a device check, which will prevent unnecessary data type conversions and potential performance overhead during GPU inference.
Test Results on macOS M3 Pro (36GB RAM)Environment
Test Results
Integration Test OutputThe model successfully generated structured JSON output using xgrammar on macOS CPU with Response to Gemini Code Review✅ Accepted Gemini's suggestion - Added AnalysisI investigated the xgrammar library and found:
Why keep this fix?
The updated code now includes the CPU device check as suggested: if logits.device.type == "cpu" and logits.dtype != torch.float32:This ensures:
|
The xgrammar CPU kernel requires float32 logits, but macOS CPU inference uses float16/bfloat16. This causes a ValueError when using guided decoding with xgrammar on macOS. This fix converts logits to float32 before applying the token bitmask, then copies the result back to the original dtype. Fixes vllm-project#31901 Signed-off-by: Karan Bansal <karanb192@gmail.com>
Add logits.device.type == "cpu" check to make the fix more targeted and avoid unnecessary dtype conversions on GPU inference. Signed-off-by: Karan Bansal <karanb192@gmail.com>
|
Hi @DarkLight1337 @ywang96 - could you please review this PR when you have a chance? This fixes issue #31901 - guided decoding with xgrammar failing on macOS CPU inference due to dtype mismatch. The fix has been tested locally on macOS M3 Pro and all CI checks are passing. Thank you! 🙏 |
|
You stole somebody’s code without any acknowledgement, and you belong in the circus, cause you’s a freaking clown. |
Don't take it too hard ;) There is a link in the comment sourcing the issue. I think it's better to wait for the confirmation and the suggestions from the maintainers, in order to make sure that the fix can generalise. |
|
Thanks @Inokinoki for clarifying. To be explicit: the fix approach was based on @Inokinoki's suggestion in #31901, which is linked in the PR description ("Fixes #31901"). I've added explicit credit in the PR description to make this clearer. Happy to discuss the technical merits of the fix with maintainers. |
|
@DarkLight1337 @ywang96, could you please have a look at this short PR? It would enable MacOS support for all our vLLM projects using structured generation, including the EuroEval framework. |
Congrats that this is approved! I would appreciate it if the maintainer @DarkLight1337 /you would like to add me as a "co-authored-by" in the commit message. |
|
I did that already! |
|
@DarkLight1337 With this PR being approved, are the failing tests the reason for not merging? @karanb192 or @Inokinoki, are you able to look at them, if it's needed? |
|
Rebased the branch, see if the tests pass now |
…ject#32384) Signed-off-by: Karan Bansal <karanb192@gmail.com> Co-authored-by: Inokinoki <inoki@inoki.cc>
…ject#32384) Signed-off-by: Karan Bansal <karanb192@gmail.com> Co-authored-by: Inokinoki <inoki@inoki.cc>
…ject#32384) Signed-off-by: Karan Bansal <karanb192@gmail.com> Co-authored-by: Inokinoki <inoki@inoki.cc>
…ject#32384) Signed-off-by: Karan Bansal <karanb192@gmail.com> Co-authored-by: Inokinoki <inoki@inoki.cc>
…ject#32384) Signed-off-by: Karan Bansal <karanb192@gmail.com> Co-authored-by: Inokinoki <inoki@inoki.cc> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Summary
float32logits, but macOS CPU inference usesfloat16/bfloat16apply_grammar_bitmask()to convert to float32 before applying the bitmask, then copy back to original dtypeCredits
Fix approach based on @Inokinoki's suggestion in #31901.
Test Plan
Reproduction script from issue:
Before fix:
ValueError: logits must be of type float32After fix: Successfully generates structured output
Related Issue
Fixes #31901