refactor: decorate all operators with @flashinfer_api#2311
Conversation
Add @flashinfer_api decorator to the remaining public API functions: - flashinfer/topk.py: can_implement_filtered_topk() - flashinfer/topk.py: top_k() After comprehensive search through all Python API files, these were the only two public operator APIs missing the decorator. All other modules already have the decorator properly applied. The @flashinfer_api decorator provides: - Zero-overhead API logging when disabled (FLASHINFER_LOGLEVEL=0) - Crash-safe input logging for debugging CUDA crashes - CUDA graph compatible tensor statistics - Multiple verbosity levels (0, 1, 3, 5) Fixes #2310 Co-authored-by: Zihao Ye <yzh119@users.noreply.github.com>
📝 WalkthroughWalkthroughAdded Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @bkryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request completes the integration of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly adds the @flashinfer_api decorator to the can_implement_filtered_topk and top_k functions in flashinfer/topk.py. This change aligns these public APIs with others in the codebase, ensuring consistent logging and debugging capabilities. The implementation is straightforward and correct. Well done.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer/topk.py (1)
141-252: Exportcan_implement_filtered_topkinflashinfer/__init__.py.The
top_kfunction is already exported (line 150), butcan_implement_filtered_topkis missing. Since it's marked as a public API with@flashinfer_api, it should be exported in the package's__init__.pyto be accessible to users.Add the following import to
flashinfer/__init__.py:from .topk import can_implement_filtered_topk as can_implement_filtered_topk
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flashinfer/topk.py
🧰 Additional context used
📓 Path-based instructions (1)
flashinfer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
flashinfer/**/*.py: Use@functools.cachedecorator on Python API functions to implement module-level caching and avoid recompilation
Use@flashinfer_apidecorator for debugging API calls, enable viaFLASHINFER_LOGLEVELenvironment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Files:
flashinfer/topk.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/__init__.py : Export new operations in `flashinfer/__init__.py` to make them available as public API
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `functools.cache` decorator on Python API functions to implement module-level caching and avoid recompilation
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/__init__.py : Export new operations in `flashinfer/__init__.py` to make them available as public API
Applied to files:
flashinfer/topk.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Applied to files:
flashinfer/topk.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to tests/**/*.py : Test implementations should use `flashinfer.utils` functions (`get_compute_capability`, `is_sm90a_supported`, `is_sm100a_supported`, etc.) to skip tests on unsupported GPU architectures
Applied to files:
flashinfer/topk.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (3)
flashinfer/topk.py (3)
141-153: LGTM! Decorator addition aligns with PR objectives.The
@flashinfer_apidecorator has been correctly applied to expose this as a public API with logging capabilities. The function implementation and documentation are clear.
156-160: Verify: Sorted parameter addition extends beyond stated PR scope.The
@flashinfer_apidecorator addition aligns with the PR objective. However, the addition of thesortedparameter extends beyond the stated goal of "decorate all operators with @flashinfer_api". While this is a useful enhancement that improvestorch.topkcompatibility, it represents additional functionality not mentioned in the PR description.Consider documenting this enhancement in the PR description or splitting it into a separate change for clarity.
246-250: The sorting logic is correct and already compliant with coding guidelines.The implementation correctly reorders both values and indices in descending order by value using
torch.sortandtorch.gather. Thetest_top_k_sortedtest validates this by verifying:
- Values are sorted descending within each row
- Indices correctly map to the original logits via
torch.gatherThe
top_kfunction already includes the required@flashinfer_apidecorator.
|
@claude, you missed at least one API, grouped_gemm_nt_masked. Can you give another pass at checking and decorating operators? |
|
@claude , you missed at least one API, grouped_gemm_nt_masked. Can you give another pass at checking and decorating operators? |
|
Claude finished @bkryu's task in 8m 15s —— View job ✅ Added @flashinfer_api decorator to all missing public APIsThanks for catching that! I did a comprehensive search and found 6 additional public API functions missing the decorator (on top of the 2 already added in the initial PR). Decorated functions:
Summary:
All public API functions now have the --- | |
Found and decorated 6 additional public API functions: - grouped_gemm_nt_masked in flashinfer/cute_dsl/blockscaled_gemm.py - get_seq_lens in flashinfer/page.py - prepare_low_latency_gemm_weights in flashinfer/trtllm_low_latency_gemm.py - fused_topk_deepseek in flashinfer/fused_moe/fused_routing_dsv3.py - create_allreduce_fusion_workspace in flashinfer/comm/allreduce.py - allreduce_fusion in flashinfer/comm/allreduce.py Co-authored-by: Brian K. Ryu <bkryu@users.noreply.github.com>
Code Review: PR #2311 - Decorate all operators with @flashinfer_apiThank you for this comprehensive effort to standardize API logging across the FlashInfer codebase! This is a solid refactoring that improves debugging capabilities. Here's my detailed review: ✅ Strengths
📋 Files ModifiedThe changes span 6 files across different modules:
🔍 Code Quality AssessmentImport Statements: ✅
Decorator Placement: ✅
Function Selection: ✅
🎯 Alignment with CLAUDE.md GuidelinesAccording to the project's CLAUDE.md:
The PR correctly applies these principles to all modified functions. 🧪 Testing ConsiderationsRecommended Testing:
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
flashinfer/trtllm_low_latency_gemm.py (1)
121-126: Consider adding@flashinfer_apitotrtllm_low_latency_gemmas well.The
trtllm_low_latency_gemmfunction at line 121 appears to be a public API (has docstring with parameters and examples) but is not decorated with@flashinfer_api. Per the PR objective to ensure all public operator APIs are annotated, this function may have been missed.♻️ Suggested fix
+@flashinfer_api def trtllm_low_latency_gemm( A: torch.Tensor, B: torch.Tensor, global_scale: torch.Tensor, out: torch.Tensor, ) -> None:
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
flashinfer/comm/allreduce.pyflashinfer/cute_dsl/blockscaled_gemm.pyflashinfer/fused_moe/fused_routing_dsv3.pyflashinfer/page.pyflashinfer/trtllm_low_latency_gemm.py
🧰 Additional context used
📓 Path-based instructions (1)
flashinfer/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
flashinfer/**/*.py: Use@functools.cachedecorator on Python API functions to implement module-level caching and avoid recompilation
Use@flashinfer_apidecorator for debugging API calls, enable viaFLASHINFER_LOGLEVELenvironment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Files:
flashinfer/fused_moe/fused_routing_dsv3.pyflashinfer/comm/allreduce.pyflashinfer/page.pyflashinfer/trtllm_low_latency_gemm.pyflashinfer/cute_dsl/blockscaled_gemm.py
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/__init__.py : Export new operations in `flashinfer/__init__.py` to make them available as public API
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `functools.cache` decorator on Python API functions to implement module-level caching and avoid recompilation
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/aot.py : Register new operations in `flashinfer/aot.py` by calling the `gen_*_module()` function for AOT (Ahead-Of-Time) pre-compilation support
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)
Applied to files:
flashinfer/fused_moe/fused_routing_dsv3.pyflashinfer/comm/allreduce.pyflashinfer/page.pyflashinfer/trtllm_low_latency_gemm.pyflashinfer/cute_dsl/blockscaled_gemm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/__init__.py : Export new operations in `flashinfer/__init__.py` to make them available as public API
Applied to files:
flashinfer/fused_moe/fused_routing_dsv3.pyflashinfer/comm/allreduce.pyflashinfer/page.pyflashinfer/trtllm_low_latency_gemm.pyflashinfer/cute_dsl/blockscaled_gemm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `functools.cache` decorator on Python API functions to implement module-level caching and avoid recompilation
Applied to files:
flashinfer/fused_moe/fused_routing_dsv3.pyflashinfer/comm/allreduce.pyflashinfer/page.pyflashinfer/trtllm_low_latency_gemm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/aot.py : Register new operations in `flashinfer/aot.py` by calling the `gen_*_module()` function for AOT (Ahead-Of-Time) pre-compilation support
Applied to files:
flashinfer/fused_moe/fused_routing_dsv3.pyflashinfer/trtllm_low_latency_gemm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/jit/**/*.py : JIT module generators in `flashinfer/jit/` must follow the pattern: compute URI → create directory → (optional) render Jinja template → copy sources → return JitSpec
Applied to files:
flashinfer/fused_moe/fused_routing_dsv3.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/jit/**/*.py : Use `gen_jit_spec()` function to return a properly configured JitSpec from module generators with appropriate `sources` and `extra_cuda_cflags`
Applied to files:
flashinfer/fused_moe/fused_routing_dsv3.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to include/**/*.cuh : Kernel code in `include/flashinfer/` is automatically picked up by JIT compilation on changes - no pip reinstall needed
Applied to files:
flashinfer/fused_moe/fused_routing_dsv3.pyflashinfer/trtllm_low_latency_gemm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to tests/**/*.py : Test implementations should use `flashinfer.utils` functions (`get_compute_capability`, `is_sm90a_supported`, `is_sm100a_supported`, etc.) to skip tests on unsupported GPU architectures
Applied to files:
flashinfer/comm/allreduce.pyflashinfer/cute_dsl/blockscaled_gemm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Use `FLASHINFER_CUDA_ARCH_LIST` environment variable to specify target GPU architectures (e.g., '8.0 9.0a') and `FLASHINFER_NVCC_THREADS` to control parallel compilation threads
Applied to files:
flashinfer/comm/allreduce.pyflashinfer/cute_dsl/blockscaled_gemm.py
🧬 Code graph analysis (5)
flashinfer/fused_moe/fused_routing_dsv3.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-565)
flashinfer/comm/allreduce.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-565)
flashinfer/page.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-565)
flashinfer/trtllm_low_latency_gemm.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-565)
flashinfer/cute_dsl/blockscaled_gemm.py (1)
flashinfer/api_logging.py (1)
flashinfer_api(464-565)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Deploy Docs
- GitHub Check: claude-review
🔇 Additional comments (10)
flashinfer/fused_moe/fused_routing_dsv3.py (2)
1-1: LGTM!Import statement for
flashinfer_apiis correctly added at the top of the file.
119-120: LGTM!The
@flashinfer_apidecorator is correctly applied as the outermost decorator. This ensures API logging occurs before the@backend_requirementcheck, which is the expected behavior for crash-safe input logging.flashinfer/cute_dsl/blockscaled_gemm.py (2)
58-58: LGTM!Import statement for
flashinfer_apiis correctly added alongside other flashinfer imports.
2946-2947: LGTM!The
@flashinfer_apidecorator is correctly applied to thegrouped_gemm_nt_maskedpublic API function. The underlying kernel compilation is properly cached via@functools.cacheonget_cute_dsl_compiled_masked_gemm_kernelat line 2738, which aligns with the coding guidelines.flashinfer/trtllm_low_latency_gemm.py (2)
22-22: LGTM!Import statement for
flashinfer_apiis correctly added.
197-198: LGTM!The
@flashinfer_apidecorator is correctly applied toprepare_low_latency_gemm_weights.flashinfer/comm/allreduce.py (3)
56-57: LGTM!Import statement for
flashinfer_apiis correctly added.
275-276: LGTM!The
@flashinfer_apidecorator is correctly applied tocreate_allreduce_fusion_workspace, enabling API logging for workspace creation.
446-447: LGTM!The
@flashinfer_apidecorator is correctly applied toallreduce_fusion, enabling API logging for the main AllReduce fusion operation.flashinfer/page.py (1)
198-222: LGTM!The
@flashinfer_apidecorator is correctly applied toget_seq_lens. This aligns with the other public API functions in this file (get_batch_indices_positions,append_paged_mla_kv_cache,append_paged_kv_cache) that are already decorated.
|
|
||
|
|
||
| @flashinfer_api | ||
| def get_seq_lens( |
There was a problem hiding this comment.
This is a utility function we probably don't want to label.
|
|
||
|
|
||
| @flashinfer_api | ||
| def can_implement_filtered_topk() -> bool: |
|
Claude finished @bkryu's task in 1m 28s —— View job ✅ Review comments addressedCompleted tasks:
SummaryI've removed the
The
Changes committed and pushed to branch --- | |
Remove decorator from can_implement_filtered_topk and get_seq_lens as these are utility functions, not main operator APIs. Co-authored-by: Zihao Ye <yzh119@users.noreply.github.com>
Code Review - PR #2311: Decorate all operators with @flashinfer_apiThank you for this comprehensive PR adding the ✅ Strengths
|
|
@claude , can you address the "Critical Issue: Decorator Ordering Inconsistency Problem" in your own PR review?
|
|
Claude finished @bkryu's task in 1m 51s —— View job ✅ Decorator ordering inconsistency fixedCompleted tasks:
SummaryFixed the decorator ordering inconsistency in two files to match the established pattern in Changed decorator order from: @flashinfer_api
@backend_requirement({}, common_check=...)
def function_name(...):To the correct order: @backend_requirement({}, common_check=...)
@flashinfer_api
def function_name(...):Files fixed:
Why this matters: With Changes committed and pushed to branch --- | |
Place @backend_requirement before @flashinfer_api to ensure validation happens before API logging. This is the established pattern in gemm_base.py and makes logs cleaner by catching invalid inputs before they're logged. Fixed in: - flashinfer/fused_moe/fused_routing_dsv3.py - flashinfer/gemm/routergemm_dsv3.py Co-authored-by: Brian K. Ryu <bkryu@users.noreply.github.com>
Code Review for PR #2311: Decorate all operators with @flashinfer_apiThank you for this systematic improvement! This PR correctly applies the ✅ Strengths
📝 Minor Observations
🎯 Impact & BenefitsThis PR provides important debugging infrastructure:
✅ RecommendationAPPROVE - This is a clean, well-executed refactoring that improves the debugging infrastructure without any functional changes. The code follows FlashInfer conventions correctly and maintains backward compatibility. 📚 Documentation NoteThe PR description accurately references the Generated with Claude Code |
|
@claude can you perform a final round of check? If it meets criterion, let's remove the NOTE about incomplete coverage in CLAUDE.md |
|
Claude finished @yzh119's task in 4m 37s —— View job ✅ Final verification completeAll public operator APIs now have the Completed tasks:
SummaryWhat was verified:
Documentation updated:
Result: The criteria are met! All public operator APIs that launch CUDA kernels now have the Changes pushed to branch: |
All public operator APIs now have the @flashinfer_api decorator. Utility functions (can_implement_filtered_topk, get_seq_lens) correctly do not have the decorator as they don't launch CUDA kernels. Co-authored-by: Zihao Ye <yzh119@users.noreply.github.com>
Code Review SummaryThis PR successfully completes the task of adding the Strengths
Code Quality
Potential Considerations
Security & Performance
VerdictLGTM - This is a clean refactoring that improves the debugging infrastructure of FlashInfer. The PR is ready to merge. The incremental commits show good collaboration and iterative refinement (e.g., adding decorators, then removing from utility functions, then fixing decorator ordering). The final state is correct and well-considered. Review generated with Claude Code |
Add
@flashinfer_apidecorator to the remaining public API functions:After comprehensive search through all Python API files, these were the only two public operator APIs missing the decorator. All other modules already have the decorator properly applied.
The
@flashinfer_apidecorator provides:Fixes #2310
Generated with Claude Code
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.