Skip to content

[BUG] Fix dsa_sparse_finetune/sparse_mla_bwd.py bug#1588

Merged
LeiWang1999 merged 2 commits intotile-ai:mainfrom
xiuhu17:dev_
Jan 18, 2026
Merged

[BUG] Fix dsa_sparse_finetune/sparse_mla_bwd.py bug#1588
LeiWang1999 merged 2 commits intotile-ai:mainfrom
xiuhu17:dev_

Conversation

@xiuhu17
Copy link
Contributor

@xiuhu17 xiuhu17 commented Dec 31, 2025

#1558

Summary by CodeRabbit

  • Refactor
    • Updated backward kernel's atomic accumulation mechanism to modify data access patterns and control flow for atomic operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 31, 2025

📝 Walkthrough

Walkthrough

Replaces vectorized 4-wide atomic operations with single-element atomic updates in the sparse backward kernel. Loop structure adjusts from iterating over (BS // split_store, D // 4) to (BS // split_store, D), with index mapping shifted from block-based (d_i * 4) to element-wise (d_i) addressing, modifying the atomic accumulation scheme for gradient computation.

Changes

Cohort / File(s) Summary
Sparse MLA Backward Kernel Atomics
examples/dsa_sparse_finetune/sparse_mla_bwd.py
Changed atomic accumulation from vectorized 4-wide block updates (atomic_addx4) to single-element updates (atomic_add); adjusted loop bounds from D//4 to D and D_tail//4 to D_tail; updated index mapping from d_i*4 to d_i for dKV and dKV_tail gradient accumulators

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

🐰 Hop, hop—those atomics dance,
Four-wide blocks take their last prance,
Now singles flutter, swift and lean,
The finest gradients I've ever seen!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately identifies the specific file and nature of the change, clearly indicating a bug fix in sparse_mla_bwd.py.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcacc5a and 6fbbed1.

📒 Files selected for processing (1)
  • examples/dsa_sparse_finetune/sparse_mla_bwd.py
🔇 Additional comments (2)
examples/dsa_sparse_finetune/sparse_mla_bwd.py (2)

229-233: LGTM!

The fix correctly replaces the 4-wide vectorized atomic with single-element atomic updates. The loop bounds (BS // split_store, D) match the shared memory view shape, and the indexing is consistent with the copy operation at lines 221-223.


236-240: LGTM!

The tail dimension atomic update mirrors the main dimension fix correctly. Loop bounds (BS // split_store, D_tail) match the shared view, and the D + d_i offset correctly addresses the tail portion of dKV.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@LeiWang1999
Copy link
Member

surprised to find the atomic_addx4 is buggy here, I'll also take a look :)

@xiuhu17
Copy link
Contributor Author

xiuhu17 commented Jan 1, 2026

x2 also seems causing issues. Guess might be a padding issue related to the thd format. Thanks for commenting

@LeiWang1999
Copy link
Member

LGTM, and after pr #1677 , atomc_add will be automatically lowered into atomic_addx4 if possible :)

@LeiWang1999 LeiWang1999 merged commit bb7f30c into tile-ai:main Jan 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants