Skip to content

Fix apply_bitmask logit for both CPU and triton versions when shape and stride doesn't match#390

Merged
Ubospica merged 4 commits intomlc-ai:mainfrom
Jialin:bitmask_mismatch
Aug 11, 2025
Merged

Fix apply_bitmask logit for both CPU and triton versions when shape and stride doesn't match#390
Ubospica merged 4 commits intomlc-ai:mainfrom
Jialin:bitmask_mismatch

Conversation

@Jialin
Copy link
Contributor

@Jialin Jialin commented Aug 9, 2025

Motivation

In vLLM, in order to address Issue #19493, PR #19565 served as a workaround to mitigate the issue by switching to use torch.compile version instead of triton version.

Root cause

With some investigation, we found that the error would happen when logits.stride()[0] != logits.shape[-1].

Changes

  • Fix CPU and Triton-version apply_grammar_bitmask for this scenario
    • use stride[0] instead of shape[-1] when access rows
  • Add a unit test to reproduce the errors

Tests

We've verified

  • New unit tests would failed on trunk
  • New unit tests passed with the change (and no new failure, however, there're a lot of failing tests in trunk, 47 failed in tests/python/test_token_bitmask_operations.py)

Followup

We will follow up with a benchmark comparison to see if we should bring back triton-version or use the new cuda version on vLLM side.

Jialin added 3 commits August 8, 2025 14:45
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Copy link
Collaborator

@Ubospica Ubospica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your contribution!

strides are not considered previously. It's good that this PR fixes this issue.

In the future we will further use nanobind's ArrayList to simplify the logic of passing torch tensor to C++. See https://github.com/mlc-ai/xgrammar/blob/main/cpp/nanobind/nanobind.cc#L45. This can avoid the complex logic of passing shape, stride, and dtype, etc. separately to C++.

Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
@Jialin
Copy link
Contributor Author

Jialin commented Aug 11, 2025

@Ubospica The precommit failure is addressed. And also we tried to fix cuda implementation in #394

@Ubospica Ubospica merged commit c3da9db into mlc-ai:main Aug 11, 2025
38 checks passed
@Ubospica
Copy link
Collaborator

@Jialin Thank you so much for your contribution

@Jialin Jialin deleted the bitmask_mismatch branch August 13, 2025 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants