fix: Decode benchmark's fa2_tc uses backend=fa2 in wrapper#2302
fix: Decode benchmark's fa2_tc uses backend=fa2 in wrapper#2302bkryu merged 1 commit intoflashinfer-ai:mainfrom
Conversation
Summary of ChangesHello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue in the attention decode benchmark where the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)benchmarks/routines/attention.py (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request correctly fixes an issue in the decode benchmark for the fa2_tc backend by mapping it to the fa2 backend when initializing the BatchDecodeWithPagedKVCacheWrapper. The change is straightforward and effective. I've added one suggestion to improve maintainability by using a dictionary for backend mapping, which could make future extensions cleaner. Overall, the fix is correct and the PR is in good shape.
| # Map fa2_tc to fa2 for the actual backend parameter | ||
| # fa2_tc is a benchmark-specific name meaning "fa2 with tensor cores" | ||
| actual_backend = "fa2" if backend == "fa2_tc" else backend |
There was a problem hiding this comment.
For better maintainability and extensibility, consider using a dictionary to map benchmark-specific backend names to their actual counterparts. This approach makes it easier to add more mappings in the future without adding more if/else conditions.
| # Map fa2_tc to fa2 for the actual backend parameter | |
| # fa2_tc is a benchmark-specific name meaning "fa2 with tensor cores" | |
| actual_backend = "fa2" if backend == "fa2_tc" else backend | |
| # Map benchmark-specific backend names to actual backend parameters. | |
| backend_map = {"fa2_tc": "fa2"} | |
| actual_backend = backend_map.get(backend, backend) |
📌 Description
During
flashinfer_benchmark.py's attention benchmark, usingfa2_tcfor "FlashAttention2 with tensor cores enabled" would lead to incorrect backend name "fa2_tc" to wrapper when it should be "fa2". This bug did not cause any issues, but recent commits have caused the bug to surface.Current PR changed the benchmark code to fix the issue.
No library code or unit test code changes so will not trigger unit tests
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.