-
Notifications
You must be signed in to change notification settings - Fork 450
[Enhancement] Improve GitHub Actions permissions check and refine performance regression testing #1519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Improve GitHub Actions permissions check and refine performance regression testing #1519
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -237,17 +237,11 @@ def run_regression_perf( | |
| dim: int = 128, | ||
| is_causal: bool = False, | ||
| groups: int = 16, | ||
| tune: bool = False, | ||
| ): | ||
| flops_per_matmul = 2.0 * batch * heads * seq_len * seq_len * dim | ||
| total_flops = 2 * flops_per_matmul | ||
| if is_causal: | ||
| total_flops *= 0.5 | ||
|
|
||
| kernel = flashattn(batch, heads, seq_len, dim, is_causal, groups=groups, block_M=128, block_N=128, num_stages=2, threads=256) | ||
|
|
||
| profiler = kernel.get_profiler(tensor_supply_type=tilelang.TensorSupplyType.Normal) | ||
| return profiler.do_bench(warmup=500, backend="cupti") | ||
| return profiler.do_bench(backend="cupti") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # First, let's read the file to see the code around lines 216-244
cat -n examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py | sed -n '210,250p'Repository: tile-ai/tilelang Length of output: 2204 🏁 Script executed: # Find the do_bench method definition
rg -n "def do_bench" --type=py -B 2 -A 10Repository: tile-ai/tilelang Length of output: 5613 🏁 Script executed: # Search for CUPTI references in GitHub Actions workflows and CI configuration
fd -e yml -e yaml . .github/workflows/ 2>/dev/null --exec rg -l 'cupti|CUPTI' \;
# Also check for any CI configuration files
fd -e yml -e yaml -e toml -e cfg . -maxdepth 3 2>/dev/null --exec rg -l 'cupti|CUPTI' \;Repository: tile-ai/tilelang Length of output: 42 🏁 Script executed: # Let's check what CI workflows exist
fd -e yml -e yaml . .github/workflows/ 2>/dev/null
# Also search for any CUPTI references in the entire codebase
rg -i 'cupti' --type=py --type=ymlRepository: tile-ai/tilelang Length of output: 274 🏁 Script executed: # Search for CUPTI references in CI workflows without type restriction
rg -i 'cupti' .github/workflows/
# Also check the profiler backend configuration to understand CUPTI usage
rg -i 'cupti' examples/flash_attention/
# Check if there are any requirements or setup for CUPTI
rg -i 'cupti\|profiler' .github/workflows/pr-regression-test-bot.ymlRepository: tile-ai/tilelang Length of output: 1570 🏁 Script executed: # Check the pr-regression-test-bot.yml workflow to see what environment is used
cat .github/workflows/pr-regression-test-bot.yml
# Also check ci.yml for any CUPTI setup
cat .github/workflows/ci.ymlRepository: tile-ai/tilelang Length of output: 30049 🏁 Script executed: # Let's check the regression test script to understand how it's used
cat maint/scripts/test_perf_regression.py | head -100
# Also check for similar patterns in other flash attention files
rg -n "warmup=" examples/flash_attention/ -A 1 -B 1 | head -60Repository: tile-ai/tilelang Length of output: 8040 🏁 Script executed: # Check if there are environment considerations or failures related to CUPTI in tests
rg -n "CUPTI|cupti|backend.*cupti" testing/ --type py -B 2 -A 2
# Let's also check similar regression tests to understand the pattern better
grep -r "def run_regression_perf" examples/flash_attention/ -A 10Repository: tile-ai/tilelang Length of output: 14240 Add warmup parameter to match main() profiling for consistent regression testing. The Similar functions in the codebase (e.g., - return profiler.do_bench(backend="cupti")
+ return profiler.do_bench(warmup=500, backend="cupti")🤖 Prompt for AI Agents |
||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -227,11 +227,11 @@ def main(): | |||||
| def run_regression_perf(): | ||||||
| M, N, K = 128, 128, 128 | ||||||
| out_dtype, accum_dtype = "float32", "float32" | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Critical: Inconsistent dtype specification causes assertion failure. Line 229 uses string literals 🔎 Proposed fix- out_dtype, accum_dtype = "float32", "float32"
+ out_dtype, accum_dtype = T.float32, T.float32📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
| in_dtype = "float8_e4m3" | ||||||
| in_dtype = T.float8_e4m3fn | ||||||
| kernel_e4m3 = tl_matmul(M, N, K, in_dtype, out_dtype, accum_dtype) | ||||||
| profiler_e4m3 = kernel_e4m3.get_profiler(tilelang.TensorSupplyType.Integer) | ||||||
| latency_e4m3 = profiler_e4m3.do_bench(warmup=25, backend="cupti") | ||||||
| in_dtype = "float8_e5m2" | ||||||
| in_dtype = T.float8_e5m2 | ||||||
| kernel_e5m2 = tl_matmul(M, N, K, in_dtype, out_dtype, accum_dtype) | ||||||
| profiler_e5m2 = kernel_e5m2.get_profiler(tilelang.TensorSupplyType.Integer) | ||||||
| latency_e5m2 = profiler_e5m2.do_bench(warmup=25, backend="cupti") | ||||||
|
|
||||||
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: tile-ai/tilelang
Length of output: 9418
🏁 Script executed:
Repository: tile-ai/tilelang
Length of output: 183
🏁 Script executed:
Repository: tile-ai/tilelang
Length of output: 1594
🏁 Script executed:
Repository: tile-ai/tilelang
Length of output: 6228
🏁 Script executed:
Repository: tile-ai/tilelang
Length of output: 3891
🏁 Script executed:
Repository: tile-ai/tilelang
Length of output: 8776
🏁 Script executed:
Repository: tile-ai/tilelang
Length of output: 873
Clear gradients between backward iterations for consistent timing.
The
run1()closure callsO.backward(dO, retain_graph=True)repeatedly without clearing gradients. Sincedo_bench()does not zero gradients between iterations,Q.grad,K.grad, andV.gradwill accumulate values across warmup and benchmark iterations. This causes timing variations between the first iteration (gradient buffer allocation) and subsequent iterations, contaminating the measurement.Add
Q.grad = None; K.grad = None; V.grad = Nonebefore the backward call inrun1(), or usetorch.no_grad()if gradient tracking isn't needed for the benchmark.The
backend="cupti"choice is intentional and consistent with other backward examples in the codebase.🧰 Tools
🪛 Ruff (0.14.10)
351-351: Ambiguous variable name:
O(E741)
🤖 Prompt for AI Agents