[Refactor]: Change the params in pytest to avoid oom error during ci #1170

yyttt6 · 2025-11-01T03:20:14Z

Summary by CodeRabbit

Tests
- Reduced workload sizes across many example tests to lower resource use and speed execution.
- Adjusted FP8 casting and sparse attention inputs to smaller, more consistent test sizes.
- Updated several tests to call examples with explicit parameter sets for repeatable runs.
Chores
- Parameterized an example inference utility, exposing runtime parameters and a causal flag.
- Added an optional batch_sizes parameter to allow configurable batching in a cast example.

github-actions · 2025-11-01T03:20:23Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-11-01T03:20:23Z

Walkthrough

Reduced test/workload sizes across multiple example tests; parameterized one example main() to accept explicit arguments including a new causal flag, adjusted flop computation when causal is true, and updated tests to call these parameterized functions with smaller, explicit inputs.

Changes

Cohort / File(s)	Summary
Blocksparse tests `examples/blocksparse_attention/test_example_blocksparse_attention.py`	Lowered workload parameters for two test invocations (batch 16→8, heads 16→8, heads_kv 8→4, max_cache_seqlen reduced) to reduce input sizes.
Cast examples & tests `examples/cast/example_group_per_split_token_cast_to_fp8.py`, `examples/cast/test_example_cast.py`	`example_group_per_split_token_cast_to_fp8.main` gains `batch_sizes` param (default when None -> [2048,6144]); tests updated with smaller M/N, changed blk sizes, and one test passes `batch_sizes=[128,896]`.
Deepseek tests `examples/deepseek_v32/test_tilelang_example_deepseek_v32.py`	Reduced S / SKV parameters for several test invocations (e.g., S:1024→512, SKV:2048→1024/512).
Flash attention tests `examples/flash_attention/test_example_flash_attention.py`	Tests now invoke MHA examples with explicit shape/config params (BATCH, H, N_CTX/seq_len, D_HEAD, causal) for affected cases.
Flash decoding example `examples/flash_decoding/example_mha_inference.py`	`main` signature changed to `main(BATCH=1, H=32, Q_CTX=128, KV_CTX=8192, D_HEAD=128, causal=False)` and total_flops is halved when `causal` is True.
Flash decoding tests `examples/flash_decoding/test_example_flash_decoding.py`	Updated test to call `example_mha_inference.main(...)` with explicit, reduced arguments (e.g., `KV_CTX=2048`).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Test as pytest test
  participant Example as example_mha_inference.main
  participant Flop as FlopCalc

  Note over Test,Example: Test calls parameterized example with explicit args
  Test->>Example: main(BATCH, H, Q_CTX, KV_CTX, D_HEAD, causal)
  Example->>Flop: compute total_flops = f(BATCH,H,Q_CTX,KV_CTX,D_HEAD)
  alt causal == True
    Flop-->>Example: total_flops * 0.5
  else causal == False
    Flop-->>Example: total_flops
  end
  Example-->>Test: return profiling/metrics

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas needing attention:
- New batch_sizes defaulting and tensor construction in examples/cast/example_group_per_split_token_cast_to_fp8.py.
- causal-dependent flop calculation and any downstream assumptions in examples/flash_decoding/example_mha_inference.py.
- Confirm tests’ reduced shapes still exercise intended code paths (sparse/blocked kernels) and that no unintended fast-paths are avoided.

Possibly related PRs

[CI]:Reduce test shapes to avoid OOM errors during CI. #1060 — Similar edits reducing example/test sizes and switching tests to explicit parameterized main() calls.
[Enhancement] Enhance and add new GQA backward examples for Hopper #930 — Overlaps edits to flash_attention/flash_decoding examples and test call signatures.

Suggested reviewers

LeiWang1999
tzj-fxz

Poem

🐰 I trimmed my hops and shrank my stride,

Smaller loads now run with pride.
Args line up, neat and bright,
Tests hop through the day and night. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "[Refactor]: Change the params in pytest to avoid oom error during ci" directly aligns with the core purpose of the changeset. The changes across all modified files consistently reduce test workload parameters—such as batch sizes, sequence lengths, heads, and matrix dimensions—to decrease memory consumption. The title accurately captures this main objective of adjusting test parameters to prevent out-of-memory errors in CI pipelines. The phrasing is concise, clear, and provides both the action (changing params) and the reason (avoiding OOM), which allows a teammate scanning history to quickly understand the primary change.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 600ca1f and b9a94b8.

📒 Files selected for processing (1)

examples/flash_attention/test_example_flash_attention.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

examples/flash_attention/test_example_flash_attention.py (2)

examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (1)

main (275-321)

examples/flash_attention/example_gqa_fwd_bshd.py (2)

main (154-192)

main (222-264)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)

🔇 Additional comments (4)

examples/flash_attention/test_example_flash_attention.py (4)

36-42: LGTM! Explicit parameters reduce memory footprint.

The explicit parameters (BATCH=1, H=16, N_CTX=512, D_HEAD=64) align with the PR objective to avoid OOM errors during CI. These smaller values should still provide adequate correctness testing while keeping memory usage manageable.

47-53: LGTM! Consistent parameter reduction.

The explicit parameters match those used in test_example_mha_bwd, providing consistent test configuration across similar backward pass tests. This helps avoid OOM errors while maintaining test coverage.

59-59: LGTM! Significant memory reduction for wgmma test.

The explicit parameters reduce memory usage substantially: BATCH reduced from 8→1 and N_CTX from 1024→256. The attention memory footprint scales with N_CTX², so reducing from 1024 to 256 provides ~16× memory savings on the attention matrix alone. The choice to use H=32 (matching the default) and smaller N_CTX=256 (vs N_CTX=512 in other tests) is appropriate given that wgmma_pipelined implementations typically have higher memory requirements.

99-99: LGTM! Explicit parameters for varlen test.

The explicit parameters (batch=4, heads=16, seq_len=512, dim=64) reduce the test workload size, helping to avoid OOM errors in CI while still providing meaningful test coverage for variable-length attention.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/blocksparse_attention/test_example_blocksparse_attention.py (1)

38-47: Consider reducing batch/heads parameters for consistency.

The mask variant only reduces max_cache_seqlen (4096→1024) while keeping batch=16, heads=16, and heads_kv=8 unchanged. In contrast, the indice variant (lines 27-35) reduces all these parameters. If both tests have similar memory profiles, consider reducing the batch and heads parameters here as well for consistency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54d4bd6 and e14fe32.

📒 Files selected for processing (6)

examples/blocksparse_attention/test_example_blocksparse_attention.py (2 hunks)
examples/cast/test_example_cast.py (1 hunks)
examples/deepseek_v32/test_tilelang_example_deepseek_v32.py (2 hunks)
examples/flash_attention/test_example_flash_attention.py (2 hunks)
examples/flash_decoding/example_mha_inference.py (1 hunks)
examples/flash_decoding/test_example_flash_decoding.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (5)

examples/flash_decoding/example_mha_inference.py (3)

examples/flash_attention/example_mha_fwd_varlen.py (2)

main (94-206)

main (211-283)

examples/deepseek_mla/example_mla_decode.py (1)

main (283-305)

examples/flash_decoding/example_gqa_decode.py (1)

main (442-489)

examples/flash_attention/test_example_flash_attention.py (2)

examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (1)

main (275-321)

examples/flash_attention/example_gqa_fwd_bshd_wgmma_pipelined.py (2)

main (121-164)

main (194-238)

examples/flash_decoding/test_example_flash_decoding.py (1)

examples/flash_decoding/example_mha_inference.py (1)

main (305-323)

examples/deepseek_v32/test_tilelang_example_deepseek_v32.py (2)

examples/deepseek_v32/fp8_lighting_indexer.py (1)

test_fp8_lighting_indexer (260-302)

examples/deepseek_v32/sparse_mla_bwd.py (1)

test_sparse_mla_bwd (334-384)

examples/cast/test_example_cast.py (2)

examples/cast/example_per_token_cast_to_fp8.py (1)

main (80-114)

examples/cast/example_group_per_split_token_cast_to_fp8.py (1)

main (164-204)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Quick Lint

🔇 Additional comments (12)

examples/deepseek_v32/test_tilelang_example_deepseek_v32.py (3)

16-16: LGTM! Workload reduced to prevent OOM.

The parameter reduction (S: 1024→512, SKV: 2048→1024) appropriately lowers memory usage for CI testing while maintaining test coverage.

32-32: LGTM! Workload reduced to prevent OOM.

The parameter reduction (SKV: 1024→512) appropriately lowers memory usage for CI testing.

39-39: LGTM! Workload reduced to prevent OOM.

The parameter reduction (SKV: 1024→512) appropriately lowers memory usage for CI testing.

examples/blocksparse_attention/test_example_blocksparse_attention.py (1)

27-35: LGTM! Workload reduced to prevent OOM.

The parameter reductions (batch: 16→8, heads: 16→8, heads_kv: 8→4, max_cache_seqlen: 4096→2048) appropriately lower memory usage for CI testing.

examples/cast/test_example_cast.py (2)

7-7: Verify M=4196 is intentional.

The value M=4196 is unusual—it's not a power of 2 nor exactly half of the presumed original (8192). Most other reductions in this PR are clean halvings. Please confirm this is intentional (e.g., testing non-power-of-2 sizes or specific alignment) rather than a typo for 4096.

11-11: LGTM! Workload reduced to prevent OOM.

The parameter reductions (M: 8192→2048, N: 2048→512, blk_m: 16→8) appropriately lower memory usage for CI testing.

examples/flash_attention/test_example_flash_attention.py (4)

36-42: LGTM! Explicit parameterization improves clarity.

The change from implicit defaults to explicit parameters (BATCH=1, H=16, N_CTX=512, D_HEAD=64, causal=False) improves test readability and ensures appropriate workload sizing for CI.

47-53: LGTM! Explicit parameterization improves clarity.

The change to explicit parameters ensures appropriate workload sizing for CI and improves test clarity.

59-65: LGTM! Explicit parameterization improves clarity.

The change to explicit parameters ensures appropriate workload sizing for CI and improves test clarity.

105-105: LGTM! Explicit parameterization improves clarity.

The change from no parameters to explicit parameters (batch=4, heads=16, seq_len=512, dim=64) improves test readability and ensures appropriate workload sizing for CI.

examples/flash_decoding/test_example_flash_decoding.py (1)

15-15: LGTM! Explicit parameterization with reduced KV context.

The change to explicit parameters with KV_CTX reduced from the default 8192 to 2048 appropriately lowers memory usage for CI testing while maintaining test coverage.

examples/flash_decoding/example_mha_inference.py (1)

305-323: LGTM! Parameterization enables configurable testing.

The extension of main() from no parameters to a parameterized signature with sensible defaults enables external configuration of batch size, dimensions, and causal behavior. The causal flag is correctly incorporated into the FLOPs calculation (line 308-309), and backward compatibility is maintained through default values.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a652a84 and 2afa91a.

📒 Files selected for processing (7)

examples/blocksparse_attention/test_example_blocksparse_attention.py (2 hunks)
examples/cast/example_group_per_split_token_cast_to_fp8.py (2 hunks)
examples/cast/test_example_cast.py (1 hunks)
examples/deepseek_v32/test_tilelang_example_deepseek_v32.py (2 hunks)
examples/flash_attention/test_example_flash_attention.py (2 hunks)
examples/flash_decoding/example_mha_inference.py (1 hunks)
examples/flash_decoding/test_example_flash_decoding.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

examples/flash_decoding/test_example_flash_decoding.py
examples/deepseek_v32/test_tilelang_example_deepseek_v32.py
examples/blocksparse_attention/test_example_blocksparse_attention.py

🧰 Additional context used

🧬 Code graph analysis (3)

examples/flash_decoding/example_mha_inference.py (2)

examples/flash_attention/example_mha_fwd_varlen.py (2)

main (94-206)

main (211-283)

examples/flash_decoding/example_gqa_decode.py (1)

main (442-489)

examples/cast/test_example_cast.py (2)

examples/cast/example_group_per_split_token_cast_to_fp8.py (1)

main (164-204)

examples/cast/example_per_token_cast_to_fp8.py (1)

main (80-114)

examples/flash_attention/test_example_flash_attention.py (3)

examples/flash_attention/example_mha_bwd_wgmma_pipelined.py (1)

main (275-321)

examples/flash_attention/example_gqa_bwd.py (1)

main (464-522)

examples/flash_attention/example_gqa_fwd_bshd.py (2)

main (154-192)

main (222-264)

🪛 Ruff (0.14.2)

examples/cast/example_group_per_split_token_cast_to_fp8.py

164-164: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (7)

examples/cast/example_group_per_split_token_cast_to_fp8.py (1)

173-173: Good parameterization for test flexibility.

The change from hardcoded values to parameterized batch_sizes enables tests to pass smaller workloads, aligning well with the PR objective to avoid OOM errors during CI.

examples/flash_attention/test_example_flash_attention.py (3)

36-42: LGTM: Reduced workload parameters for CI.

The explicit parameters reduce memory requirements while maintaining test coverage. The smaller values (BATCH=1, H=16, N_CTX=512, D_HEAD=64) are appropriate for avoiding OOM errors during CI.

47-53: LGTM: Reduced workload parameters for CI.

Consistent parameter reduction pattern applied to the BHSD variant, ensuring both test variants use appropriate memory footprints for CI environments.

99-99: LGTM: Reduced workload parameters for CI.

The variable-length attention test now uses smaller dimensions (batch=4, heads=16, seq_len=512, dim=64) suitable for CI resource constraints.

examples/flash_decoding/example_mha_inference.py (1)

305-309: LGTM: Clean parameterization enables flexible test sizing.

The function signature now accepts explicit configuration parameters, allowing tests to specify smaller workload sizes for CI. The causal parameter properly adjusts total_flops calculation, and the implementation follows patterns established in related files.

examples/cast/test_example_cast.py (2)

7-8: LGTM: Significant workload reduction for CI.

The parameter reduction (M: 8192→1024, N: 2048→1024, blk_m: 8→4) substantially decreases memory usage while maintaining test coverage. The batch_sizes=[128, 896] parameter aligns with the updated function signature.

12-12: LGTM: Reduced parameters for CI.

The parameter reduction (M: 8192→2048, N: 2048→512) appropriately reduces memory footprint for CI environments while preserving test validity.

examples/cast/example_group_per_split_token_cast_to_fp8.py

…ile-ai#1170) * [Refactor]: Change the params in pytest to avoid oom error during ci * format * fix * Update test_example_cast.py * Update parameters in test_example_cast * Update test_example_flash_attention.py * update * format * fix * fix * format

* [Test] Add cp async to avoid register spill * [BugFix] GQA fwd and bwd - Fix the undefined behavior of -inf in acc_s - Fix the causal loop range in varlen scenario * [TMA] Move on to TMA and locate the register spill issue * [Debug] Not the reason of zero-assignment. Probably the combination of Parallel op & conditional qkT * [Debug] The SIMT copy in producer occupies too many registers * [BugFix] Use 3D lse and delta to avoid illegal instruction * [Perf] Relaxed order for dQ and SIMT store for dKdV * [Feat] For atomic add version * [Lint] * [Bugfix] Enable code lowering with producer‑copy‑only program (#1168) * bugfix * lint fix * Enhance warp group register allocation to handle missing consumer bodies gracefully. Updated logic to annotate producer side when consumer is absent, ensuring robustness in degenerate warp-specialized patterns. * Refactor VisitExpr_ method in inject_tma_barrier.cc for improved readability. Adjusted formatting and spacing for clarity in barrier handling logic. * Update barrier handling in inject_tma_barrier.cc to accommodate newly appended entries. Adjusted the size of the replace vector to ensure it covers the full needed length, and modified the logic for appending barriers based on the updated replace conditions. * [Bugfix] Support 16bits shfl_sync (#1169) * Add type-safe warp shuffle helpers for 16-bit float types in common.h - Introduced generic passthrough functions for warp shuffle operations: `shfl_xor_sync`, `shfl_down_sync`, `shfl_up_sync`, and `shfl_sync`. - Added specializations for `cutlass::half_t` and `cutlass::bfloat16_t` to ensure type safety during shuffle operations. - Updated `reduce.h` to utilize the new shuffle functions, enhancing code clarity and maintainability. * lint fix * [Testing] Move TMA 1D and test for its functionality (#1167) * [Testing] Move TMA 1D and test for its functionality * [Lint] * [Refactor]: Change the params in pytest to avoid oom error during ci (#1170) * [Refactor]: Change the params in pytest to avoid oom error during ci * format * fix * Update test_example_cast.py * Update parameters in test_example_cast * Update test_example_flash_attention.py * update * format * fix * fix * format * [Bugfix] Fix tvm import path for editable build (#1172) * [Language] Expose `T.warpgroup_fence_operand` for nvcc code motion (#986) * remove debug print * pipeline fix * use the correct buffer access scope * rs support * warp warpgroup_fence_operand * fix * fp8 dtype ptx enhance * mma fix * TCGEN05 Interface * tcgen05 support * rebase * update * Enhance TCGEN05 support by adding new intrinsic operations and descriptors. Introduced `ptx_tcgen05_mma_ts` for tensor-memory to shared-memory instructions and `tcgen05_mma_arrive` for signaling barrier completion. Updated existing descriptors and code generation logic to accommodate these changes, ensuring compatibility with new instruction sets. Refactored related allocation functions and improved handling of shared memory descriptors. * lint fix * Refactor buffer reference handling in CUDA code generation and update test execution in tilelang. Ensure default annotations for unrolling are set correctly in TIR IR module. * wgmma fix --------- Co-authored-by: Zhiwen Mo <[email protected]> * [Language] Add Correctness and performance check scripts for V2 (#1174) * fix * lint fix * fix * lint fix * fix * upd * [Bugfix] Legalize Datatype for mma intrinisc codegen (#1179) * fix * lint fix * Enhance CUDA code generation by updating register type handling for float data types. Introduced a workaround for TF32 type compatibility and improved the registration of MMA register types for A and B operands. * [Perf] Add layout and use_tma to boost performance * [Lint] * [Note] --------- Co-authored-by: Lei Wang <[email protected]> Co-authored-by: Yuqi Dong <[email protected]> Co-authored-by: Zhiwen Mo <[email protected]>

…ile-ai#1170) * [Refactor]: Change the params in pytest to avoid oom error during ci * format * fix * Update test_example_cast.py * Update parameters in test_example_cast * Update test_example_flash_attention.py * update * format * fix * fix * format

* [Test] Add cp async to avoid register spill * [BugFix] GQA fwd and bwd - Fix the undefined behavior of -inf in acc_s - Fix the causal loop range in varlen scenario * [TMA] Move on to TMA and locate the register spill issue * [Debug] Not the reason of zero-assignment. Probably the combination of Parallel op & conditional qkT * [Debug] The SIMT copy in producer occupies too many registers * [BugFix] Use 3D lse and delta to avoid illegal instruction * [Perf] Relaxed order for dQ and SIMT store for dKdV * [Feat] For atomic add version * [Lint] * [Bugfix] Enable code lowering with producer‑copy‑only program (tile-ai#1168) * bugfix * lint fix * Enhance warp group register allocation to handle missing consumer bodies gracefully. Updated logic to annotate producer side when consumer is absent, ensuring robustness in degenerate warp-specialized patterns. * Refactor VisitExpr_ method in inject_tma_barrier.cc for improved readability. Adjusted formatting and spacing for clarity in barrier handling logic. * Update barrier handling in inject_tma_barrier.cc to accommodate newly appended entries. Adjusted the size of the replace vector to ensure it covers the full needed length, and modified the logic for appending barriers based on the updated replace conditions. * [Bugfix] Support 16bits shfl_sync (tile-ai#1169) * Add type-safe warp shuffle helpers for 16-bit float types in common.h - Introduced generic passthrough functions for warp shuffle operations: `shfl_xor_sync`, `shfl_down_sync`, `shfl_up_sync`, and `shfl_sync`. - Added specializations for `cutlass::half_t` and `cutlass::bfloat16_t` to ensure type safety during shuffle operations. - Updated `reduce.h` to utilize the new shuffle functions, enhancing code clarity and maintainability. * lint fix * [Testing] Move TMA 1D and test for its functionality (tile-ai#1167) * [Testing] Move TMA 1D and test for its functionality * [Lint] * [Refactor]: Change the params in pytest to avoid oom error during ci (tile-ai#1170) * [Refactor]: Change the params in pytest to avoid oom error during ci * format * fix * Update test_example_cast.py * Update parameters in test_example_cast * Update test_example_flash_attention.py * update * format * fix * fix * format * [Bugfix] Fix tvm import path for editable build (tile-ai#1172) * [Language] Expose `T.warpgroup_fence_operand` for nvcc code motion (tile-ai#986) * remove debug print * pipeline fix * use the correct buffer access scope * rs support * warp warpgroup_fence_operand * fix * fp8 dtype ptx enhance * mma fix * TCGEN05 Interface * tcgen05 support * rebase * update * Enhance TCGEN05 support by adding new intrinsic operations and descriptors. Introduced `ptx_tcgen05_mma_ts` for tensor-memory to shared-memory instructions and `tcgen05_mma_arrive` for signaling barrier completion. Updated existing descriptors and code generation logic to accommodate these changes, ensuring compatibility with new instruction sets. Refactored related allocation functions and improved handling of shared memory descriptors. * lint fix * Refactor buffer reference handling in CUDA code generation and update test execution in tilelang. Ensure default annotations for unrolling are set correctly in TIR IR module. * wgmma fix --------- Co-authored-by: Zhiwen Mo <[email protected]> * [Language] Add Correctness and performance check scripts for V2 (tile-ai#1174) * fix * lint fix * fix * lint fix * fix * upd * [Bugfix] Legalize Datatype for mma intrinisc codegen (tile-ai#1179) * fix * lint fix * Enhance CUDA code generation by updating register type handling for float data types. Introduced a workaround for TF32 type compatibility and improved the registration of MMA register types for A and B operands. * [Perf] Add layout and use_tma to boost performance * [Lint] * [Note] --------- Co-authored-by: Lei Wang <[email protected]> Co-authored-by: Yuqi Dong <[email protected]> Co-authored-by: Zhiwen Mo <[email protected]>

coderabbitai bot reviewed Nov 1, 2025

View reviewed changes

yyttt6 and others added 8 commits November 2, 2025 05:55

[Refactor]: Change the params in pytest to avoid oom error during ci

25e816c

format

640811f

fix

a4e3469

Update test_example_cast.py

c5fa75f

Update parameters in test_example_cast

5c4d8f1

Update test_example_flash_attention.py

87ee201

update

4d1c63d

format

2afa91a

yyttt6 force-pushed the main branch from a652a84 to 2afa91a Compare November 2, 2025 06:27

fix

600ca1f

coderabbitai bot reviewed Nov 2, 2025

View reviewed changes

examples/cast/example_group_per_split_token_cast_to_fp8.py Outdated Show resolved Hide resolved

yyttt6 added 2 commits November 2, 2025 07:21

fix

2dbdce0

format

b9a94b8

LeiWang1999 approved these changes Nov 2, 2025

View reviewed changes

LeiWang1999 merged commit 13bdcd6 into tile-ai:main Nov 2, 2025
4 of 6 checks passed

kurisu6912 mentioned this pull request Nov 3, 2025

[Fix] fix type imcompatible error in #1115 #1180

Merged

This was referenced Nov 4, 2025

[Feat] Add swap like grammar in tuple assignment #1185

Merged

[Fix] Remove unsupported type params #1186

Merged

[Feat] Add support for T.serial with step and negative step #1188

Merged

[Feat] Add A Pass to Handle Negative Index #1192

Merged

This was referenced Nov 10, 2025

[Fix] Fix buffer re-import typo in tilelang.languge #1214

Merged

[Fix] Fix a type that make wrong T.macro backtrace #1234

Merged

[Language] Add type stubs for tir op #1239

Merged

This was referenced Nov 21, 2025

[Feat] Add missing support for uint32x2, add unsigned implicit cast in bitwise op, add T.Ref as macro annotation #1302

Closed

[Fix] Remove unused let_bindings_ in CodeGenC to fix #1300 #1305

Merged

kurisu6912 mentioned this pull request Nov 21, 2025

[Fix] Fix frame scope error in T.macro #1308

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor]: Change the params in pytest to avoid oom error during ci #1170

[Refactor]: Change the params in pytest to avoid oom error during ci #1170

Uh oh!

yyttt6 commented Nov 1, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Nov 1, 2025

Uh oh!

coderabbitai bot commented Nov 1, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Refactor]: Change the params in pytest to avoid oom error during ci #1170

[Refactor]: Change the params in pytest to avoid oom error during ci #1170

Uh oh!

Conversation

yyttt6 commented Nov 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 1, 2025

Uh oh!

coderabbitai bot commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yyttt6 commented Nov 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 1, 2025 •

edited

Loading