Skip to content

Add dynamic tokens-per-page TRTLLM-GEN GQA kernels#3259

Merged
qsang-nv merged 1 commit into
flashinfer-ai:mainfrom
PerkzZheng:port-dynamic-tokens-per-page-gqa
May 11, 2026
Merged

Add dynamic tokens-per-page TRTLLM-GEN GQA kernels#3259
qsang-nv merged 1 commit into
flashinfer-ai:mainfrom
PerkzZheng:port-dynamic-tokens-per-page-gqa

Conversation

@PerkzZheng
Copy link
Copy Markdown
Contributor

@PerkzZheng PerkzZheng commented May 7, 2026

Summary

  • update TRTLLM-GEN FMHA artifact path and checksum to the newly published cubins
  • select dynamic tokens-per-page kernels for paged GQA decode and prefill when page size is at least 128
  • keep MLA decode kernel selection on dense-mask keys and add Blackwell-only dynamic page-size coverage
  • relax fp16 Blackwell attention-sink context tolerance to match observed precision noise

Tests

  • pre-commit run on changed files
  • python3 -m pytest -q -n 8 --tb=short --maxfail=5 -k trtllm-gen-mla-filter tests/attention/test_trtllm_gen_mla.py: 7686 passed
  • python3 -m pytest -q -n 8 --tb=short tests/attention/test_attention_sink_blackwell.py: 144 passed

Summary by CodeRabbit

  • Chores

    • Updated an artifact path and its checksum manifest.
  • New Features

    • Dynamic page-size selection for generation attention kernels on supported GPUs.
    • Per-token variable sparse-MLA top‑K lengths for finer-grained sparse attention.
    • Expanded MLA-generation kernel support for additional head/dimension combos.
  • Tests

    • Added GPU-only tests for dynamic page-size prefill/decode scenarios.
    • Relaxed float16 numerical tolerance in an attention test.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1b37111-fcd5-4d5e-9270-caac8390b54f

📥 Commits

Reviewing files that changed from the base of the PR and between dffd4b83dfd4da812afcbe277cee082dfeb1d45b and eb503ea.

📒 Files selected for processing (6)
  • flashinfer/artifacts.py
  • include/flashinfer/trtllm/fmha/fmhaKernels.cuh
  • include/flashinfer/trtllm/fmha/fmhaRunnerParams.h
  • include/flashinfer/trtllm/fmha/kernelParams.h
  • tests/attention/test_attention_sink_blackwell.py
  • tests/attention/test_trtllm_gen_attention.py
🚧 Files skipped from review as they are similar to previous changes (5)
  • flashinfer/artifacts.py
  • tests/attention/test_attention_sink_blackwell.py
  • tests/attention/test_trtllm_gen_attention.py
  • include/flashinfer/trtllm/fmha/fmhaRunnerParams.h
  • include/flashinfer/trtllm/fmha/fmhaKernels.cuh

📝 Walkthrough

Walkthrough

Adds dynamic numTokensPerPage selection for TRTLLM FMHA kernels, extends parameter structs for sparse-MLA, updates kernel selection and hash encoding (including MLA Dense-mask handling), bumps the TRTLLM_GEN_FMHA artifact path and checksum, and adds Blackwell-gated tests plus a float16 tolerance tweak.

Changes

Dynamic numTokensPerPage Selection for TRTLLM FMHA

Layer / File(s) Summary
Parameter Contracts
include/flashinfer/trtllm/fmha/fmhaRunnerParams.h, include/flashinfer/trtllm/fmha/kernelParams.h
TllmGenSelectKernelParams adds mDynamicNumTokensPerPage (initialized false). KernelParams adds tmaKSlidingWindowKvPool_ and ptrSparseMlaTopKLens; setKernelParams initializes ptrSparseMlaTopKLens to nullptr.
Kernel Selection Logic
include/flashinfer/trtllm/fmha/fmhaKernels.cuh
Adds constants and selectNumTokensPerPage with eligibility rules; allows numTokensPerPage==0 in hash validation and encodes log2 as 0 when 0; centralizes page selection before heuristic probing; sets dynamic flag for kernel-keyed generated cubins; forces Dense mask for MLA generation; skips sliding-window/chunked causal mask override for MLA generation; debug/hash string includes dynamicNumTokensPerPage.
Artifact Versioning
flashinfer/artifacts.py
Updates ArtifactPath.TRTLLM_GEN_FMHA and CheckSumHash.TRTLLM_GEN_FMHA to new directory hash and checksum manifest SHA256.
Tests / Tolerance
tests/attention/test_trtllm_gen_attention.py, tests/attention/test_attention_sink_blackwell.py
Adds _skip_if_not_blackwell() and Blackwell-gated parametrized tests test_trtllm_batch_prefill_dynamic_page_size_gqa and test_trtllm_batch_decode_dynamic_page_size_gqa; increases float16 atol from 1e-3 to 2e-3 in the sink test; removes an earlier head-dim-512 prefill block.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant Selector
  participant Hasher
  participant KernelPicker
  Caller->>Selector: selectNumTokensPerPage(params)
  Selector-->>Caller: set mNumTokensPerPage, mDynamicNumTokensPerPage
  Caller->>Hasher: build/validate kernel hash (numTokensPerPageLog2)
  Hasher-->>KernelPicker: candidate hashes
  KernelPicker-->>Caller: selected cubin (Dense mask for MLA gen)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

run-ci

Suggested reviewers

  • sricketts
  • aleozlx
  • yzh119
  • cyx-6
  • bkryu
  • samuellees
  • yongwww
  • nv-yunzheq
  • jimmyzho
  • kahyunnam
  • yyihuang

Poem

🐰 I hopped through hashes, pages, and kernels bright,
Dynamic tokens per page now take flight.
Dense masks for MLA, tests gated with care,
Artifacts bumped — new cubins are there!
🥕✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description lacks the required template sections (Description, Related Issues, Checklist items) and doesn't follow the specified format, though it does contain relevant technical content. Reformat the description to follow the repository's template, including the Description section, Related Issues section, and completion of the Pre-commit Checks and Tests checklists.
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately describes the main change: adding dynamic tokens-per-page TRTLLM-GEN GQA kernels, which aligns with the primary objective and code modifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

@PerkzZheng PerkzZheng force-pushed the port-dynamic-tokens-per-page-gqa branch from 3ddae55 to 8b17dc6 Compare May 7, 2026 14:32
@PerkzZheng
Copy link
Copy Markdown
Contributor Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !644 has been created, and the CI pipeline #50558261 is currently running. I'll report back once the pipeline job completes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/flashinfer/trtllm/fmha/kernelParams.h`:
- Around line 47-48: The field CUtensorMap tmaKSlidingWindowKvPool_ is declared
but never initialized or used; either remove this dead member or initialize it
in setKernelParams by calling buildNdTmaDescriptor in the same sparse-MLA
conditional where tmaK_ and tmaV_ are built (mirror their construction/guards),
ensuring the descriptor parameters match the DSv4 sparse MLA sliding-window KV
pool layout if you intend to support that path; update any related cleanup/usage
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: efb8b457-2332-4fab-9813-3e51975b0e44

📥 Commits

Reviewing files that changed from the base of the PR and between 3ddae5562724a157a2f9e06ba150a05f72a32cca and 8b17dc664bacc1e5685345ba66e92b4411465d84.

📒 Files selected for processing (6)
  • flashinfer/artifacts.py
  • include/flashinfer/trtllm/fmha/fmhaKernels.cuh
  • include/flashinfer/trtllm/fmha/fmhaRunnerParams.h
  • include/flashinfer/trtllm/fmha/kernelParams.h
  • tests/attention/test_attention_sink_blackwell.py
  • tests/attention/test_trtllm_gen_attention.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • include/flashinfer/trtllm/fmha/fmhaRunnerParams.h
  • flashinfer/artifacts.py
  • include/flashinfer/trtllm/fmha/fmhaKernels.cuh
  • tests/attention/test_trtllm_gen_attention.py

Comment on lines +47 to +48
// TMA descriptor for DSv4 sparse MLA sliding-window KV pool. Same format as tmaK_.
CUtensorMap tmaKSlidingWindowKvPool_;
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all sites that set tmaKSlidingWindowKvPool_ to verify caller-side initialization.
rg -n "tmaKSlidingWindowKvPool_" --type cpp --type h -C 4

Repository: flashinfer-ai/flashinfer

Length of output: 800


🏁 Script executed:

# Find the setKernelParams function and buildNdTmaDescriptor calls
rg -n "setKernelParams|buildNdTmaDescriptor" --type cpp --type h -A 3 -B 1

Repository: flashinfer-ai/flashinfer

Length of output: 9188


🏁 Script executed:

# Check the memset call around line 639 in kernelParams.h
head -660 include/flashinfer/trtllm/fmha/kernelParams.h | tail -50

Repository: flashinfer-ai/flashinfer

Length of output: 2785


🏁 Script executed:

# Search for where tmaKSlidingWindowKvPool_ is actually used/read
rg -n "tmaKSlidingWindowKvPool_" --type cpp --type cu

Repository: flashinfer-ai/flashinfer

Length of output: 95


🏁 Script executed:

# Read the full setKernelParams function (roughly lines 634-850)
sed -n '634,850p' include/flashinfer/trtllm/fmha/kernelParams.h

Repository: flashinfer-ai/flashinfer

Length of output: 11415


🏁 Script executed:

# Search for any usage/reads of tmaKSlidingWindowKvPool_ without file type restriction
rg "tmaKSlidingWindowKvPool_" -C 4

Repository: flashinfer-ai/flashinfer

Length of output: 773


🏁 Script executed:

# Look for sparse MLA related code to understand if this field is relevant
rg "sparse.*MLA|SlidingWindow" -C 2

Repository: flashinfer-ai/flashinfer

Length of output: 6475


🏁 Script executed:

# Search for tmaKSlidingWindowKvPool_ in all files, focusing on kernel code
find . -type f \( -name "*.cuh" -o -name "*.cu" \) -exec grep -l "tmaKSlidingWindowKvPool_" {} \;

Repository: flashinfer-ai/flashinfer

Length of output: 50


🏁 Script executed:

# Also search in any generated or header files for kernel access
rg "tmaKSlidingWindowKvPool_" . --type-list

Repository: flashinfer-ai/flashinfer

Length of output: 458


🏁 Script executed:

# Look for the sparse MLA kernel to see if it uses this field
rg -n "sparse.*kernel|slidingWindow" --type-list 2>/dev/null | head -20

Repository: flashinfer-ai/flashinfer

Length of output: 50


🏁 Script executed:

# Search for kernel code more broadly - look for .cuh and .cu files
find . -type f \( -name "*.cuh" -o -name "*.cu" \) | head -20

Repository: flashinfer-ai/flashinfer

Length of output: 668


🏁 Script executed:

# Search all files for any reference to this field with broader patterns
rg "SlidingWindowKvPool" .

Repository: flashinfer-ai/flashinfer

Length of output: 156


🏁 Script executed:

# Check git history to see if this field was recently added
git log --oneline --all -- "include/flashinfer/trtllm/fmha/kernelParams.h" | head -10

Repository: flashinfer-ai/flashinfer

Length of output: 842


🏁 Script executed:

# See the actual commit that added this field
git log -p --all -S "tmaKSlidingWindowKvPool_" -- "include/flashinfer/trtllm/fmha/kernelParams.h" | head -100

Repository: flashinfer-ai/flashinfer

Length of output: 50


🏁 Script executed:

# Look at recent commits to kernelParams.h to find when this field was added
git log -p --all -- "include/flashinfer/trtllm/fmha/kernelParams.h" | grep -B 10 -A 10 "tmaKSlidingWindowKvPool_" | head -50

Repository: flashinfer-ai/flashinfer

Length of output: 50


🏁 Script executed:

# Also check if there are any TODOs or comments about this field being unused
rg -B 5 -A 5 "tmaKSlidingWindowKvPool_"

Repository: flashinfer-ai/flashinfer

Length of output: 893


🏁 Script executed:

# Search for any test or reference that might use sparse MLA with sliding window
rg "sparse.*sliding|slidingWindow.*sparse" -C 3

Repository: flashinfer-ai/flashinfer

Length of output: 633


🏁 Script executed:

# Check the fmhaKernels.cuh file that handles sparse MLA selection
cat csrc/trtllm_fmha_kernel_launcher.cu | grep -A 20 -B 5 "sparse_mla"

Repository: flashinfer-ai/flashinfer

Length of output: 8495


Remove tmaKSlidingWindowKvPool_ or implement its initialization for sparse MLA sliding-window support.

The field is declared but never populated via buildNdTmaDescriptor and is unused throughout the codebase. Either remove it as dead code, or if intended for future sparse MLA sliding-window support, add the corresponding buildNdTmaDescriptor call inside setKernelParams (guarded by the sparse-MLA path, similar to how tmaK_ and tmaV_ are constructed).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/flashinfer/trtllm/fmha/kernelParams.h` around lines 47 - 48, The
field CUtensorMap tmaKSlidingWindowKvPool_ is declared but never initialized or
used; either remove this dead member or initialize it in setKernelParams by
calling buildNdTmaDescriptor in the same sparse-MLA conditional where tmaK_ and
tmaV_ are built (mirror their construction/guards), ensuring the descriptor
parameters match the DSv4 sparse MLA sliding-window KV pool layout if you intend
to support that path; update any related cleanup/usage accordingly.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm missing something, but what's the reason for this never actually being initialized?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for any TODO/FIXME near this field or sparse MLA sliding window
rg -n "TODO|FIXME|tmaKSlidingWindowKvPool_|SlidingWindowKvPool|sparse.*sliding|sliding.*sparse" -C 3

# Also check if the cubin kernels might reference struct offsets that imply this field must exist
rg -n "sparseMla|sparse_mla|mSparseMla" --type cpp --type h -C 3

# Check the full sparse MLA block in setKernelParams
grep -n "sparseMla\|SlidingWindow" include/flashinfer/trtllm/fmha/kernelParams.h

Length of output: 210871


@saltyminty Great question! The root cause is binary ABI compatibility with the precompiled cubins.

KernelParams is the struct passed directly to the TRTLLM-GEN GPU kernels, so its memory layout (field offsets) must exactly match what the precompiled cubins expect. This PR updates the cubin artifacts to a new version (as noted in the PR description), and the new cubins appear to include tmaKSlidingWindowKvPool_ in their expected KernelParams layout. If the field were absent from the host-side struct, every field declared after it would be at the wrong byte offset, causing silent data corruption in the kernel.

The memset zero-initialization acts as a safe "disabled" sentinel — the kernel can check whether the descriptor is valid before using it, and a zeroed CUtensorMap signals that this feature path is inactive.

You can see the same pattern with ptrSparseMlaTopKLens (line 122): it's declared, zero-initialized by memset, and explicitly pinned to nullptr at line 867 — all without a full host-side implementation. Both fields are effectively forward-ported struct layout stubs that maintain cubin ABI compatibility while the actual initialization logic is deferred to a follow-up.

So to summarize: the field must exist to keep the struct binary-compatible with the new cubins, and the zero-initialization from memset is the correct default for the "feature not active" case.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually used for a new kind of kernels, see #3269.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment thread include/flashinfer/trtllm/fmha/fmhaKernels.cuh
@PerkzZheng PerkzZheng force-pushed the port-dynamic-tokens-per-page-gqa branch 2 times, most recently from 92c6075 to dffd4b8 Compare May 8, 2026 07:45
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/attention/test_attention_sink_blackwell.py (1)

215-215: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove debug print statement.

Line 215 (print(q.shape, k.shape, v.shape)) is a debug artifact that produces noise in CI output. It should be removed before merging.

🐛 Proposed fix
-    print(q.shape, k.shape, v.shape)
-
     o_ref = sink_attention_unified(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/attention/test_attention_sink_blackwell.py` at line 215, Remove the
debug print that prints tensor shapes — delete the statement printing q.shape,
k.shape, v.shape (the `print(q.shape, k.shape, v.shape)` line) from the test so
it no longer emits noisy output during CI; leave the surrounding test logic and
any assertions intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@tests/attention/test_attention_sink_blackwell.py`:
- Line 215: Remove the debug print that prints tensor shapes — delete the
statement printing q.shape, k.shape, v.shape (the `print(q.shape, k.shape,
v.shape)` line) from the test so it no longer emits noisy output during CI;
leave the surrounding test logic and any assertions intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f785676b-3ed8-4591-90db-4b98cb5c0c67

📥 Commits

Reviewing files that changed from the base of the PR and between 8b17dc664bacc1e5685345ba66e92b4411465d84 and 92c6075b657b93a1c865835647f534560e9137c1.

📒 Files selected for processing (6)
  • flashinfer/artifacts.py
  • include/flashinfer/trtllm/fmha/fmhaKernels.cuh
  • include/flashinfer/trtllm/fmha/fmhaRunnerParams.h
  • include/flashinfer/trtllm/fmha/kernelParams.h
  • tests/attention/test_attention_sink_blackwell.py
  • tests/attention/test_trtllm_gen_attention.py
✅ Files skipped from review due to trivial changes (2)
  • flashinfer/artifacts.py
  • include/flashinfer/trtllm/fmha/fmhaKernels.cuh
🚧 Files skipped from review as they are similar to previous changes (2)
  • include/flashinfer/trtllm/fmha/fmhaRunnerParams.h
  • tests/attention/test_trtllm_gen_attention.py

Copy link
Copy Markdown
Collaborator

@qsang-nv qsang-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@saltyminty saltyminty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but see comment about tmaKSlidingWindowKvPool_

Comment on lines +47 to +48
// TMA descriptor for DSv4 sparse MLA sliding-window KV pool. Same format as tmaK_.
CUtensorMap tmaKSlidingWindowKvPool_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm missing something, but what's the reason for this never actually being initialized?

@PerkzZheng PerkzZheng force-pushed the port-dynamic-tokens-per-page-gqa branch from dffd4b8 to eb503ea Compare May 9, 2026 02:02
@PerkzZheng
Copy link
Copy Markdown
Contributor Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !644 has been updated with latest changes, and the CI pipeline #50739882 is currently running. I'll report back once the pipeline job completes.

@PerkzZheng
Copy link
Copy Markdown
Contributor Author

PerkzZheng commented May 11, 2026

@qsang-nv @saltyminty the CI failures are more related to machine-allocation issues. Feel free to merge if it looks good to you. Thanks!

B300 tests are passed so it should be good.

@qsang-nv
Copy link
Copy Markdown
Collaborator

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !644 has been created, and the CI pipeline #50869866 is currently running. I'll report back once the pipeline job completes.

@qsang-nv qsang-nv merged commit 4f072c4 into flashinfer-ai:main May 11, 2026
42 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants