-
Notifications
You must be signed in to change notification settings - Fork 449
[Bugfix] Fallback to a Linear Layout instead of raising errors #1521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…mat_stride % 8 != 0. Refactor swizzling layout conditions to check mat_stride before mat_continuous, improving layout selection logic for better performance.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughA single file modification that adds stride alignment checks for 64-bit element sizes in GemmABLayout Hopper and Sm100 paths. When mat_stride is not divisible by 8, the code now falls back to a linear layout instead of attempting bank swizzle layouts, preventing misaligned stride scenarios. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (3)📓 Common learnings📚 Learning: 2025-09-15T10:51:06.985ZApplied to files:
📚 Learning: 2025-09-15T10:51:06.985ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/layout/gemm_layouts.cc (1)
732-751: Add stride alignment guard tomakeGemmABLayoutbefore calling bank swizzle layouts.The
makeGemmABLayoutfunction lacks themat_stride % 8 == 0guard before calling swizzle layouts (lines 745, 747), while bothmakeGemmABLayoutHopper(line 769) andmakeGemmABLayoutSm100(line 797) include this check. All three bank swizzle functions—makeFullBankSwizzleLayout,makeHalfBankSwizzleLayout, andmakeQuarterBankSwizzleLayout—haveICHECK(stride % 8 == 0)assertions. This inconsistency could cause crashes ifmakeGemmABLayoutis called with misaligned strides when appropriate continuous divisibility conditions are met. Add the stride alignment guard to match the protective pattern used in the newer variants.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/layout/gemm_layouts.cc
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). However, the layout template parameter should use sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h where Crosswise uses sizeof(ElementA) but the layout template uses sizeof_bits<ElementA>::value.
Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h.
📚 Learning: 2025-09-15T10:51:06.985Z
Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). However, the layout template parameter should use sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h where Crosswise uses sizeof(ElementA) but the layout template uses sizeof_bits<ElementA>::value.
Applied to files:
src/layout/gemm_layouts.cc
📚 Learning: 2025-09-15T10:51:06.985Z
Learnt from: botbw
Repo: tile-ai/tilelang PR: 691
File: src/tl_templates/cuda/gemm_sp_sm80.h:81-85
Timestamp: 2025-09-15T10:51:06.985Z
Learning: In CUTLASS tensor operation layouts, crosswise constants should be computed using sizeof(T) (bytes), not cutlass::sizeof_bits<T>::value (bits). This is the established pattern in the official CUTLASS codebase, as seen in default_mma_core_sparse_sm80.h.
Applied to files:
src/layout/gemm_layouts.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (3)
src/layout/gemm_layouts.cc (3)
760-765: LGTM! Fallback prevents crashes for misaligned float64 strides.The early fallback to linear layout when
mat_stride % 8 != 0correctly prevents the downstreamICHECK(stride % 8 == 0)failure inmakeQuarterBankSwizzleLayout(line 388). This is appropriately placed after the more specialized F64_Kouter and F64_Kinner layouts are attempted.
769-779: LGTM! Stride alignment guard prevents swizzle layout failures.Wrapping the swizzle layout selection with
if (mat_stride % 8 == 0)correctly prevents calls tomakeFullBankSwizzleLayout,makeHalfBankSwizzleLayout, andmakeQuarterBankSwizzleLayoutwhen the stride constraint isn't met. All three functions requirestride % 8 == 0(lines 388, 407, 426), so this guard is necessary.
797-807: LGTM! Consistent stride alignment guard for SM100.The changes mirror the Hopper implementation, correctly guarding swizzle layout selection with
mat_stride % 8 == 0. This ensures consistency across architectures and prevents the same class of failures.
as title, previous pr introduced atomatically tma swizzle lowering, but some tma shape can not apply swizzling and then raise errors, we should fallback it into linear gemm layout.
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.