Skip to content

fix: Decode benchmark's fa2_tc uses backend=fa2 in wrapper#2302

Merged
bkryu merged 1 commit intoflashinfer-ai:mainfrom
bkryu:benchmark_decode_fix
Jan 7, 2026
Merged

fix: Decode benchmark's fa2_tc uses backend=fa2 in wrapper#2302
bkryu merged 1 commit intoflashinfer-ai:mainfrom
bkryu:benchmark_decode_fix

Conversation

@bkryu
Copy link
Collaborator

@bkryu bkryu commented Jan 7, 2026

📌 Description

During flashinfer_benchmark.py's attention benchmark, using fa2_tc for "FlashAttention2 with tensor cores enabled" would lead to incorrect backend name "fa2_tc" to wrapper when it should be "fa2". This bug did not cause any issues, but recent commits have caused the bug to surface.

Current PR changed the benchmark code to fix the issue.

No library code or unit test code changes so will not trigger unit tests

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Tests
    • Improved backend configuration handling in batch decoding benchmarks to ensure correct parameter mapping during wrapper instantiation.

✏️ Tip: You can customize this high-level summary in your review settings.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue in the attention decode benchmark where the fa2_tc backend identifier was not correctly interpreted by the flashinfer wrapper. By introducing a mapping from fa2_tc to fa2 before passing the parameter, the change ensures that the benchmark accurately reflects the intended backend configuration and provides reliable performance measurements.

Highlights

  • Backend Mapping Correction: Implemented a mapping for the benchmark-specific fa2_tc backend identifier to fa2 when initializing the BatchDecodeWithPagedKVCacheWrapper.
  • Benchmark Accuracy: Ensured that the decode benchmark for attention routines correctly utilizes the fa2 backend when fa2_tc is specified, resolving a potential misconfiguration in backend identification.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

The testBatchDecodeWithPagedKVCacheWrapper function in benchmarks/routines/attention.py now normalizes backend identifiers before constructing the wrapper. When the backend is "fa2_tc", it maps to "fa2" for wrapper instantiation, while preserving existing plan_kv_indptr handling for trtllm-gen.

Changes

Cohort / File(s) Summary
Backend Name Normalization
benchmarks/routines/attention.py
Introduces backend identifier mapping in testBatchDecodeWithPagedKVCacheWrapper to normalize "fa2_tc" to "fa2" when instantiating BatchDecodeWithPagedKVCacheWrapper, ensuring the wrapper receives the canonical backend name.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • Anerudhan
  • cyx-6
  • jiahanc
  • nvmbreughe

Poem

🐰 Hops through the benchmarks with glee,
Backend names aligned perfectly—
fa2_tc becomes fa2 so bright,
Tests run smooth, logic just right!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing the decode benchmark to use backend=fa2 (instead of fa2_tc) in the wrapper.
Description check ✅ Passed The description includes a completed Description section explaining the bug and fix, but lacks completed Related Issues, Pre-commit Checks confirmation, and Tests sections.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edb37cd and 71206f2.

📒 Files selected for processing (1)
  • benchmarks/routines/attention.py
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/routines/attention.py (1)
flashinfer/comm/trtllm_mnnvl_ar.py (1)
  • backend (227-228)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs
🔇 Additional comments (1)
benchmarks/routines/attention.py (1)

476-487: The wrapper correctly expects backend="fa2" rather than backend="fa2_tc".

The normalization logic on lines 476-478 is correct and necessary: fa2_tc is a benchmark-specific alias meaning "fa2 with tensor cores," and mapping it to the actual backend name "fa2" ensures the wrapper receives the expected parameter. Tensor core behavior is properly preserved via the use_tensor_cores parameter.

However, note that testBatchPrefillWithPagedKVCacheWrapper (lines 1033-1046) and testBatchPrefillWithRaggedKVCacheWrapper (lines 1588-1611) do not handle fa2_tc at all, using only ["fa2", "fa3", "trtllm-gen"] and ["cutlass", "fa2", "fa3", "trtllm-gen"] respectively. If fa2_tc is a valid backend choice (as indicated by its inclusion in parse_attention_args), consider whether prefill tests should also support it for consistency.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes an issue in the decode benchmark for the fa2_tc backend by mapping it to the fa2 backend when initializing the BatchDecodeWithPagedKVCacheWrapper. The change is straightforward and effective. I've added one suggestion to improve maintainability by using a dictionary for backend mapping, which could make future extensions cleaner. Overall, the fix is correct and the PR is in good shape.

Comment on lines +476 to +478
# Map fa2_tc to fa2 for the actual backend parameter
# fa2_tc is a benchmark-specific name meaning "fa2 with tensor cores"
actual_backend = "fa2" if backend == "fa2_tc" else backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and extensibility, consider using a dictionary to map benchmark-specific backend names to their actual counterparts. This approach makes it easier to add more mappings in the future without adding more if/else conditions.

Suggested change
# Map fa2_tc to fa2 for the actual backend parameter
# fa2_tc is a benchmark-specific name meaning "fa2 with tensor cores"
actual_backend = "fa2" if backend == "fa2_tc" else backend
# Map benchmark-specific backend names to actual backend parameters.
backend_map = {"fa2_tc": "fa2"}
actual_backend = backend_map.get(backend, backend)

@Anerudhan Anerudhan self-assigned this Jan 7, 2026
@bkryu bkryu merged commit df8015c into flashinfer-ai:main Jan 7, 2026
4 checks passed
@bkryu bkryu deleted the benchmark_decode_fix branch January 15, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants