-
Notifications
You must be signed in to change notification settings - Fork 385
[Atomic] Use ptr for atomicAdd dst instead of reference #1425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…o destination * Modified AtomicAdd in CUDA to take a pointer instead of a reference for the destination argument. * Updated related code in atomicadd_vectorize.cc to ensure compatibility with the new signature. * Adjusted Python interface in atomic.py to pass the destination by pointer, aligning with device function requirements.
* Updated AtomicAddRet in both CUDA and HIP to take a pointer instead of a reference for the address argument, improving consistency with the AtomicAdd function. * Adjusted the implementation to ensure proper reinterpretation of the address type for atomic operations.
… pointer * Updated the MakeSIMTLoop function to build a pointer to the destination element using tvm_access_ptr instead of loading the destination value directly. * Simplified the handling of source and destination predicates, improving clarity and maintainability of the code. * Ensured compatibility with the new pointer-based approach for atomic operations.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
WalkthroughChange atomic-add call sites and device templates to pass destination addresses (via Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)testing/python/transform/test_tilelang_transform_legalize_safe_memory_access.py (2)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
src/op/atomic_add.cc (1)
281-288: Nit: comment saystvm_access_ptrbut you’re usingbuiltin::address_of().The code comment (Line 281) references
tvm_access_ptr, but the implementation usesbuiltin::address_of()(Line 282-284). Either rename the comment or (if required by downstream) switch to the intended builtin.src/tl_templates/hip/common.h (1)
119-122: HIPAtomicAddRetpointer migration looks consistent with the new call convention.Only small nit:
addressis alreadyT1*, soreinterpret_cast<T1 *>(address)is redundant unless you expect “pointer-as-integer” call patterns.src/transform/atomicadd_vectorize.cc (1)
261-284: Fallback path may still pass a non-address first arg (e.g.,IfThenElse) under some producers.The fallback only wraps
BufferLoadintoaddress_of(...). Ifnode->args[0]can be anIfThenElse(or other wrappers that evaluate to a BufferLoad/value), the new pointer-first contract could be violated.If those forms are still possible, consider handling
IfThenElseNodeby producing anIfThenElse(cond, address_of(then_load), address_of(else_load))(when both branches areBufferLoad) or forcing earlier canonicalization.src/tl_templates/cuda/atomic.h (1)
172-250:AtomicAddpointer migration is good; consider removing uninitialized reads before inline-asm outputs.In the half/bf16 inline-PTX paths,
ret_valis declared and then used to initializeret_val_castvia a read (e.g.,*reinterpret_cast<unsigned short*>(&ret_val)). That’s an uninitialized read in C++ even if the asm output overwritesret_val_cast. Prefer initializing the scalar directly:- __half ret_val; - unsigned short ret_val_cast = - *reinterpret_cast<unsigned short *>(&ret_val); + unsigned short ret_val_cast = 0;(and similarly for bf16).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/op/atomic_add.cc(1 hunks)src/tl_templates/cuda/atomic.h(2 hunks)src/tl_templates/hip/common.h(1 hunks)src/transform/atomicadd_vectorize.cc(2 hunks)tilelang/language/atomic.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/op/atomic_add.cc (1)
src/op/copy.cc (2)
MakePredicate(325-354)MakePredicate(325-327)
tilelang/language/atomic.py (1)
tilelang/language/tir/op.py (2)
call_extern(173-195)address_of(464-480)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (4)
src/transform/atomicadd_vectorize.cc (1)
249-255: Scalar vectorize path:AtomicAdd(address_of(dst), value, memory_order)is the right layout.This aligns with the new pointer-based
AtomicAddsignature.src/tl_templates/cuda/atomic.h (1)
252-328:AtomicAddRet(T1* address, ...)update looks consistent with callsites passing pointers.The pointer-first API matches the updated IR/frontend; the remaining implementation still uses
*addressforcuda::atomic_refand raw pointer foratomicAdd, which is coherent.src/op/atomic_add.cc (1)
270-293: [Rewritten review comment]
[Classification tag]tilelang/language/atomic.py (1)
175-193: The proposed fix uses non-existent TVM API functions and misunderstands how let-bindings work in TVM expressions; however, there is a real underlying issue: the scalar path attempts to calladdress_of()on non-buffer types where it cannot work.The scalar branch (line 178) is taken when
dst_extent is None and src_extent is None, meaning neitherdstnorvalueis aBufferLoadorBufferRegion. However,tvm.tir.address_of()only works on specific expression types (likeBufferLoad), not plainVarobjects. Testing confirms thataddress_of(Var)raises an exception.The review's proposed solution is flawed:
T.has_let_value()andT.get_let_value()do not exist intvm.tir- In TVM, let-bindings are statement-level constructs (
LetStmt), not properties ofVarobjects- A
Vardoesn't "contain" a let-value that can be resolvedThe actual bug: if
dstis a plainVar(whenget_extentreturns None), the subsequentT.address_of(dst)call will fail. Either this scalar path should not be reachable with non-buffer types, or the approach needs to be fundamentally different from usingaddress_of.
|
Local test can pass, merged :) |
This pull request updates the atomic add infrastructure to consistently pass destination arguments by pointer rather than by reference or value. This change ensures correct semantics and compatibility across CUDA, HIP, and the internal IR, and unifies the calling conventions in both C++ and Python code.
Device-side atomic add API changes:
AtomicAddandAtomicAddRetfunctions in CUDA and HIP templates to take a pointer (T1 *address) instead of a reference (T1 &ref), and updated all internal calls accordingly. [1] [2] [3]IR and transformation logic updates:
AtomicAddNode::MakeSIMTLoopandAtomicAddVectorizeRewriterto useaddress_ofon the destination, ensuring the first argument is always a pointer. [1] [2] [3]Python frontend alignment:
tilelang/language/atomic.py) to always pass the destination as a pointer usingT.address_of(dst)in calls toAtomicAddandAtomicAddRet, matching the new device signature.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.