fix: Decode benchmark's fa2_tc uses backend=fa2 in wrapper by bkryu · Pull Request #2302 · flashinfer-ai/flashinfer

bkryu · 2026-01-07T02:03:34Z

📌 Description

During flashinfer_benchmark.py's attention benchmark, using fa2_tc for "FlashAttention2 with tensor cores enabled" would lead to incorrect backend name "fa2_tc" to wrapper when it should be "fa2". This bug did not cause any issues, but recent commits have caused the bug to surface.

Current PR changed the benchmark code to fix the issue.

No library code or unit test code changes so will not trigger unit tests

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Tests
- Improved backend configuration handling in batch decoding benchmarks to ensure correct parameter mapping during wrapper instantiation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

gemini-code-assist · 2026-01-07T02:03:45Z

Summary of Changes

Hello @bkryu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue in the attention decode benchmark where the fa2_tc backend identifier was not correctly interpreted by the flashinfer wrapper. By introducing a mapping from fa2_tc to fa2 before passing the parameter, the change ensures that the benchmark accurately reflects the intended backend configuration and provides reliable performance measurements.

Highlights

Backend Mapping Correction: Implemented a mapping for the benchmark-specific fa2_tc backend identifier to fa2 when initializing the BatchDecodeWithPagedKVCacheWrapper.
Benchmark Accuracy: Ensured that the decode benchmark for attention routines correctly utilizes the fa2 backend when fa2_tc is specified, resolving a potential misconfiguration in backend identification.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-07T02:03:46Z

📝 Walkthrough

Walkthrough

The testBatchDecodeWithPagedKVCacheWrapper function in benchmarks/routines/attention.py now normalizes backend identifiers before constructing the wrapper. When the backend is "fa2_tc", it maps to "fa2" for wrapper instantiation, while preserving existing plan_kv_indptr handling for trtllm-gen.

Changes

Cohort / File(s)	Summary
Backend Name Normalization `benchmarks/routines/attention.py`	Introduces backend identifier mapping in `testBatchDecodeWithPagedKVCacheWrapper` to normalize "fa2_tc" to "fa2" when instantiating BatchDecodeWithPagedKVCacheWrapper, ensuring the wrapper receives the canonical backend name.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Enable Hopper FA3 FP8 attention in decode.py #2148: Adds backend handling and propagation for fa2/fa3 in BatchDecodeWithPagedKVCacheWrapper construction, directly related to this backend normalization approach.
misc: Add XQA decode to microbenchmark for sm90 and sm120 #2055: Previously modifies backend-name handling in benchmarks/routines/attention.py for normalizing backend identifiers before wrapper instantiation.

Suggested reviewers

Anerudhan
cyx-6
jiahanc
nvmbreughe

Poem

🐰 Hops through the benchmarks with glee,
Backend names aligned perfectly—
fa2_tc becomes fa2 so bright,
Tests run smooth, logic just right! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing the decode benchmark to use backend=fa2 (instead of fa2_tc) in the wrapper.
Description check	✅ Passed	The description includes a completed Description section explaining the bug and fix, but lacks completed Related Issues, Pre-commit Checks confirmation, and Tests sections.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edb37cd and 71206f2.

📒 Files selected for processing (1)

benchmarks/routines/attention.py

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/routines/attention.py (1)

flashinfer/comm/trtllm_mnnvl_ar.py (1)

backend (227-228)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

🔇 Additional comments (1)

benchmarks/routines/attention.py (1)

476-487: The wrapper correctly expects backend="fa2" rather than backend="fa2_tc".

The normalization logic on lines 476-478 is correct and necessary: fa2_tc is a benchmark-specific alias meaning "fa2 with tensor cores," and mapping it to the actual backend name "fa2" ensures the wrapper receives the expected parameter. Tensor core behavior is properly preserved via the use_tensor_cores parameter.

However, note that testBatchPrefillWithPagedKVCacheWrapper (lines 1033-1046) and testBatchPrefillWithRaggedKVCacheWrapper (lines 1588-1611) do not handle fa2_tc at all, using only ["fa2", "fa3", "trtllm-gen"] and ["cutlass", "fa2", "fa3", "trtllm-gen"] respectively. If fa2_tc is a valid backend choice (as indicated by its inclusion in parse_attention_args), consider whether prefill tests should also support it for consistency.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request correctly fixes an issue in the decode benchmark for the fa2_tc backend by mapping it to the fa2 backend when initializing the BatchDecodeWithPagedKVCacheWrapper. The change is straightforward and effective. I've added one suggestion to improve maintainability by using a dictionary for backend mapping, which could make future extensions cleaner. Overall, the fix is correct and the PR is in good shape.

gemini-code-assist · 2026-01-07T02:04:52Z

benchmarks/routines/attention.py

+            # Map fa2_tc to fa2 for the actual backend parameter
+            # fa2_tc is a benchmark-specific name meaning "fa2 with tensor cores"
+            actual_backend = "fa2" if backend == "fa2_tc" else backend


For better maintainability and extensibility, consider using a dictionary to map benchmark-specific backend names to their actual counterparts. This approach makes it easier to add more mappings in the future without adding more if/else conditions.

Suggested change

# Map fa2_tc to fa2 for the actual backend parameter

# fa2_tc is a benchmark-specific name meaning "fa2 with tensor cores"

actual_backend = "fa2" if backend == "fa2_tc" else backend

# Map benchmark-specific backend names to actual backend parameters.

backend_map = {"fa2_tc": "fa2"}

actual_backend = backend_map.get(backend, backend)

fa2_tc name goes to fa2

71206f2

bkryu requested review from Anerudhan, cyx-6 and jiahanc as code owners January 7, 2026 02:03

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

Anerudhan self-assigned this Jan 7, 2026

Anerudhan approved these changes Jan 7, 2026

View reviewed changes

bkryu merged commit df8015c into flashinfer-ai:main Jan 7, 2026
4 checks passed

bkryu deleted the benchmark_decode_fix branch January 15, 2026 18:06

coderabbitai bot mentioned this pull request Feb 9, 2026

pick fa2 for BatchDecodeWithPagedKVCacheWrapper auto backend #2530

Merged

2 tasks

coderabbitai bot mentioned this pull request Feb 25, 2026

benchmark: Enable speculative decode microbenchmarking for paged decode #2628

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Decode benchmark's fa2_tc uses backend=fa2 in wrapper#2302

fix: Decode benchmark's fa2_tc uses backend=fa2 in wrapper#2302
bkryu merged 1 commit intoflashinfer-ai:mainfrom
bkryu:benchmark_decode_fix

bkryu commented Jan 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Uh oh!

coderabbitai bot commented Jan 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bkryu commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bkryu commented Jan 7, 2026 •

edited

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading