[Bugfix] Fallback to a Linear Layout instead of raising errors #1521

LeiWang1999 · 2025-12-24T09:04:50Z

as title, previous pr introduced atomatically tma swizzle lowering, but some tma shape can not apply swizzling and then raise errors, we should fallback it into linear gemm layout.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Refined memory layout selection for 64-bit elements. Now uses linear layout fallback when stride alignment requirements are not met, preventing attempts to use bank swizzle layouts in incompatible configurations.
- Enhanced stride alignment validation across supported GPU architectures for improved consistency and robustness.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…mat_stride % 8 != 0. Refactor swizzling layout conditions to check mat_stride before mat_continuous, improving layout selection logic for better performance.

github-actions · 2025-12-24T09:04:59Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-12-24T09:05:25Z

📝 Walkthrough

Walkthrough

A single file modification that adds stride alignment checks for 64-bit element sizes in GemmABLayout Hopper and Sm100 paths. When mat_stride is not divisible by 8, the code now falls back to a linear layout instead of attempting bank swizzle layouts, preventing misaligned stride scenarios.

Changes

Cohort / File(s)	Summary
GemmABLayout Stride Alignment `src/layout/gemm_layouts.cc`	Added stride alignment guards (`mat_stride % 8 == 0`) for 64-bit element handling in both Hopper and Sm100 architectural paths. Non-aligned strides now return linear layout instead of proceeding to bank swizzle layout branches, preventing unsafe stride misalignment scenarios.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

A rabbit hops through stride-aligned lands,
With checks for eight-division in hand,
When paths don't align, we take the straight way,
A linear fallback saves the day! 🐰✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: introducing a fallback to linear layout instead of raising errors, which aligns with the PR's core objective to handle TMA swizzle lowering failures gracefully.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0006621 and 44eeb56.

📒 Files selected for processing (1)

src/layout/gemm_layouts.cc

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). However, the layout template parameter should use sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h where Crosswise uses sizeof(ElementA) but the layout template uses sizeof_bits<ElementA>::value.

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h.

📚 Learning: 2025-09-15T10:51:06.985Z

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). However, the layout template parameter should use sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h where Crosswise uses sizeof(ElementA) but the layout template uses sizeof_bits<ElementA>::value.

Applied to files:

src/layout/gemm_layouts.cc

📚 Learning: 2025-09-15T10:51:06.985Z

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h.

Applied to files:

src/layout/gemm_layouts.cc

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)

🔇 Additional comments (3)

src/layout/gemm_layouts.cc (3)

760-765: LGTM! Fallback prevents crashes for misaligned float64 strides.

The early fallback to linear layout when mat_stride % 8 != 0 correctly prevents the downstream ICHECK(stride % 8 == 0) failure in makeQuarterBankSwizzleLayout (line 388). This is appropriately placed after the more specialized F64_Kouter and F64_Kinner layouts are attempted.

769-779: LGTM! Stride alignment guard prevents swizzle layout failures.

Wrapping the swizzle layout selection with if (mat_stride % 8 == 0) correctly prevents calls to makeFullBankSwizzleLayout, makeHalfBankSwizzleLayout, and makeQuarterBankSwizzleLayout when the stride constraint isn't met. All three functions require stride % 8 == 0 (lines 388, 407, 426), so this guard is necessary.

797-807: LGTM! Consistent stride alignment guard for SM100.

The changes mirror the Hopper implementation, correctly guarding swizzle layout selection with mat_stride % 8 == 0. This ensures consistency across architectures and prevents the same class of failures.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/layout/gemm_layouts.cc (1)

732-751: Add stride alignment guard to makeGemmABLayout before calling bank swizzle layouts.

The makeGemmABLayout function lacks the mat_stride % 8 == 0 guard before calling swizzle layouts (lines 745, 747), while both makeGemmABLayoutHopper (line 769) and makeGemmABLayoutSm100 (line 797) include this check. All three bank swizzle functions—makeFullBankSwizzleLayout, makeHalfBankSwizzleLayout, and makeQuarterBankSwizzleLayout—have ICHECK(stride % 8 == 0) assertions. This inconsistency could cause crashes if makeGemmABLayout is called with misaligned strides when appropriate continuous divisibility conditions are met. Add the stride alignment guard to match the protective pattern used in the newer variants.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0006621 and 44eeb56.

📒 Files selected for processing (1)

src/layout/gemm_layouts.cc

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). However, the layout template parameter should use sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h where Crosswise uses sizeof(ElementA) but the layout template uses sizeof_bits<ElementA>::value.

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h.

📚 Learning: 2025-09-15T10:51:06.985Z

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). However, the layout template parameter should use sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h where Crosswise uses sizeof(ElementA) but the layout template uses sizeof_bits<ElementA>::value.

Applied to files:

src/layout/gemm_layouts.cc

📚 Learning: 2025-09-15T10:51:06.985Z

Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h.

Applied to files:

src/layout/gemm_layouts.cc

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)

🔇 Additional comments (3)

src/layout/gemm_layouts.cc (3)

760-765: LGTM! Fallback prevents crashes for misaligned float64 strides.

The early fallback to linear layout when mat_stride % 8 != 0 correctly prevents the downstream ICHECK(stride % 8 == 0) failure in makeQuarterBankSwizzleLayout (line 388). This is appropriately placed after the more specialized F64_Kouter and F64_Kinner layouts are attempted.

769-779: LGTM! Stride alignment guard prevents swizzle layout failures.

Wrapping the swizzle layout selection with if (mat_stride % 8 == 0) correctly prevents calls to makeFullBankSwizzleLayout, makeHalfBankSwizzleLayout, and makeQuarterBankSwizzleLayout when the stride constraint isn't met. All three functions require stride % 8 == 0 (lines 388, 407, 426), so this guard is necessary.

797-807: LGTM! Consistent stride alignment guard for SM100.

The changes mirror the Hopper implementation, correctly guarding swizzle layout selection with mat_stride % 8 == 0. This ensures consistency across architectures and prevents the same class of failures.

LeiWang1999 added 2 commits December 24, 2025 16:59

Enhance GEMM layout functions to include a fallback for float64 when …

9b55961

…mat_stride % 8 != 0. Refactor swizzling layout conditions to check mat_stride before mat_continuous, improving layout selection logic for better performance.

lint fix

44eeb56

LeiWang1999 changed the title ~~[Bugfix] Fallback to a Linear Layout instead of raise errors~~ [Bugfix] Fallback to a Linear Layout instead of raising errors Dec 24, 2025

LeiWang1999 merged commit cfccd63 into tile-ai:main Dec 24, 2025
6 of 7 checks passed

coderabbitai bot reviewed Dec 24, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Jan 27, 2026

Add swizzle layout detection and automatic merging for layout conflicts #1736

Merged

2 tasks

coderabbitai bot mentioned this pull request Feb 5, 2026

[Feature] Introduce higher-dimensional gemm layout support #1798

Open

1 task

kurisu6912 mentioned this pull request Feb 11, 2026

[LoopVectorize] Loop Independent Var Optimization in IfThenElse Expr kurisu6912/tilelang#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fallback to a Linear Layout instead of raising errors #1521

[Bugfix] Fallback to a Linear Layout instead of raising errors #1521

Uh oh!

LeiWang1999 commented Dec 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

coderabbitai bot commented Dec 24, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Bugfix] Fallback to a Linear Layout instead of raising errors #1521

[Bugfix] Fallback to a Linear Layout instead of raising errors #1521

Uh oh!

Conversation

LeiWang1999 commented Dec 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

coderabbitai bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Dec 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 24, 2025 •

edited

Loading