[BUG] Fix dsa_sparse_finetune/sparse_mla_bwd.py bug#1588
[BUG] Fix dsa_sparse_finetune/sparse_mla_bwd.py bug#1588LeiWang1999 merged 2 commits intotile-ai:mainfrom
Conversation
📝 WalkthroughWalkthroughReplaces vectorized 4-wide atomic operations with single-element atomic updates in the sparse backward kernel. Loop structure adjusts from iterating over (BS // split_store, D // 4) to (BS // split_store, D), with index mapping shifted from block-based (d_i * 4) to element-wise (d_i) addressing, modifying the atomic accumulation scheme for gradient computation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
surprised to find the atomic_addx4 is buggy here, I'll also take a look :) |
|
x2 also seems causing issues. Guess might be a padding issue related to the thd format. Thanks for commenting |
|
LGTM, and after pr #1677 , |
#1558
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.