Conversation
📝 WalkthroughWalkthroughThis change introduces a new CUDA scale kernel to the flashinfer public API, including tutorial documentation with benchmarking guidance, a Python API wrapper, test coverage, and a benchmark script demonstrating performance measurement across multiple sizes and data types. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used🧠 Learnings (8)📓 Common learnings📚 Learning: 2025-12-30T09:34:39.900ZApplied to files:
📚 Learning: 2025-12-30T09:34:39.900ZApplied to files:
📚 Learning: 2025-12-30T09:34:39.900ZApplied to files:
📚 Learning: 2025-12-30T09:34:39.900ZApplied to files:
📚 Learning: 2025-12-30T09:34:39.900ZApplied to files:
📚 Learning: 2025-12-30T09:34:39.900ZApplied to files:
📚 Learning: 2025-12-30T09:34:39.900ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (4)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug where Claude skills were not functioning as intended due to inconsistencies in file naming conventions and a lack of necessary metadata. By standardizing skill file names to Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request fixes the Claude skills by renaming the skill files to SKILL.md and adding the required metadata. It also enhances the add-cuda-kernel skill by adding a new step for benchmarking. The changes are correct and align with the PR's goal. I've found a minor issue in the example code for benchmarking and provided a suggestion to fix it.
| import torch | ||
| from flashinfer.testing import bench_gpu_time | ||
|
|
||
| def bench_scale(): | ||
| """Benchmark scale kernel.""" | ||
| import flashinfer | ||
|
|
||
| sizes = [1024, 4096, 16384, 65536, 262144] | ||
| dtypes = [torch.float16, torch.bfloat16] | ||
|
|
||
| print("Scale Kernel Benchmark") | ||
| print("-" * 60) | ||
| print(f"{'Size':>10} {'Dtype':>10} {'Time (us)':>12} {'Std (us)':>10}") | ||
| print("-" * 60) | ||
|
|
||
| for size in sizes: | ||
| for dtype in dtypes: | ||
| x = torch.randn(size, dtype=dtype, device="cuda") | ||
|
|
||
| # Benchmark with CUPTI (auto-fallback to CUDA events) | ||
| median_time, std_time = bench_gpu_time( | ||
| flashinfer.scale, | ||
| args=(x, 2.0), | ||
| enable_cupti=True, | ||
| dry_run_iters=10, | ||
| repeat_iters=100, | ||
| ) | ||
|
|
||
| print(f"{size:>10} {str(dtype):>10} {median_time*1e6:>12.2f} {std_time*1e6:>10.2f}") | ||
|
|
||
| if __name__ == "__main__": | ||
| bench_scale() |
There was a problem hiding this comment.
The example benchmark script has a few issues that would prevent it from running correctly:
- It's missing an import for
numpy, which is needed to calculatenp.medianandnp.std. - The
bench_gpu_timefunction returns a list of execution times in milliseconds, not the median and standard deviation directly. The example code should be updated to calculate these statistics from the returned list. - The conversion to microseconds (
us) should be from milliseconds, so the multiplication factor should be1000, not1e6(which would be for seconds-to-microseconds).
Here is a corrected version of the script.
import torch
import numpy as np
from flashinfer.testing import bench_gpu_time
def bench_scale():
"""Benchmark scale kernel."""
import flashinfer
sizes = [1024, 4096, 16384, 65536, 262144]
dtypes = [torch.float16, torch.bfloat16]
print("Scale Kernel Benchmark")
print("-" * 60)
print(f"{'Size':>10} {'Dtype':>10} {'Time (us)':>12} {'Std (us)':>10}")
print("-" * 60)
for size in sizes:
for dtype in dtypes:
x = torch.randn(size, dtype=dtype, device="cuda")
# Benchmark with CUPTI (auto-fallback to CUDA events)
times_ms = bench_gpu_time(
flashinfer.scale,
args=(x, 2.0),
enable_cupti=True,
dry_run_iters=10,
repeat_iters=100,
)
# bench_gpu_time returns a list of times in milliseconds
median_time_us = np.median(times_ms) * 1000
std_time_us = np.std(times_ms) * 1000
print(f"{size:>10} {str(dtype):>10} {median_time_us:>12.2f} {std_time_us:>10.2f}")
if __name__ == "__main__":
bench_scale()
📌 Description
Skills defined in #2240 doesn't make effect because of missing metadata and wrong file name.
This PR fixes the issue.
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.