-
Notifications
You must be signed in to change notification settings - Fork 450
[BugFix] Add int64_t support for AtomicAdd #1716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CUDA's atomicAdd doesn't have an int64_t overload, only unsigned long long. This caused compilation errors when using atomic_add with int64 tensors. Add normalize_atomic_type specialization for int64_t to map it to unsigned long long, which has the same bit representation and works correctly for atomic add operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughA template specialization is added to Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/tl_templates/cuda/atomic.h`:
- Around line 34-36: Update the normalize_atomic_type<int64_t> specialization to
document the signed/unsigned semantics: add a comment above template <> struct
normalize_atomic_type<int64_t> stating that mapping int64_t to unsigned long
long is done to support AtomicAdd (two's complement) but that
AtomicMax/AtomicMin (which call cuda::atomic_ref<NT1>::fetch_max()/fetch_min())
will perform unsigned comparisons and therefore int64_t usage with
AtomicMax/AtomicMin is not supported; keep the using type = unsigned long long
unchanged.
| template <> struct normalize_atomic_type<int64_t> { | ||
| using type = unsigned long long; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the structure of the atomic.h file
wc -l src/tl_templates/cuda/atomic.hRepository: tile-ai/tilelang
Length of output: 94
🏁 Script executed:
# Read the relevant sections of the file to understand the implementation
cat -n src/tl_templates/cuda/atomic.h | head -200Repository: tile-ai/tilelang
Length of output: 8917
🏁 Script executed:
# Search for AtomicMax and AtomicMin definitions and their usage of normalize_atomic_type
rg -n "normalize_atomic_type|AtomicMax|AtomicMin" src/tl_templates/cuda/atomic.h -A 5Repository: tile-ai/tilelang
Length of output: 3496
🏁 Script executed:
# Search for usages of AtomicMax and AtomicMin to see if they're called with int64_t
rg -n "AtomicMax|AtomicMin" --type cpp --type py -B 2 -A 2 | grep -v "^src/tl_templates/cuda/atomic.h" | head -100Repository: tile-ai/tilelang
Length of output: 7421
🏁 Script executed:
# Look for any comments in atomic.h explaining the int64_t design choice
cat src/tl_templates/cuda/atomic.h | grep -A 5 -B 5 "int64_t"Repository: tile-ai/tilelang
Length of output: 333
🏁 Script executed:
# Check git history to understand why this mapping was added
git log -p --all -- src/tl_templates/cuda/atomic.h | grep -A 10 -B 10 "int64_t" | head -60Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Search for tests or documentation about int64_t atomic operations
find . -name "*.py" -o -name "*.md" -o -name "*.txt" | xargs grep -l "int64.*atomic\|atomic.*int64" 2>/dev/null | head -20Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Check if there are any CUDA driver/runtime constraints on int64_t atomics
# Look for comments in the code about why int64_t is normalized to unsigned long long
rg -n "normalize_atomic_type|int64.*unsigned" src/ -B 3 -A 3 | grep -A 5 -B 5 "int64"Repository: tile-ai/tilelang
Length of output: 5124
🏁 Script executed:
# Check what versions of CUDA atomic support int64_t
# Look at the CUDART_VERSION guards in atomic.h
cat -n src/tl_templates/cuda/atomic.h | sed -n '28,80p'Repository: tile-ai/tilelang
Length of output: 2259
🏁 Script executed:
# Search for test files that might use atomic operations
find . -type f \( -name "*test*.py" -o -name "*test*.cc" -o -name "*test*.h" \) | head -20Repository: tile-ai/tilelang
Length of output: 1074
🏁 Script executed:
# Check HIP atomic header to see how it handles int64_t
cat -n src/tl_templates/hip/atomic.h | head -100Repository: tile-ai/tilelang
Length of output: 4870
🏁 Script executed:
# Search for any recent commits or PRs that added int64_t normalization
git log --all --oneline -- src/tl_templates/cuda/atomic.h | head -20Repository: tile-ai/tilelang
Length of output: 208
🏁 Script executed:
# Look at the specific commit that added int64_t support
git show c11e07d --statRepository: tile-ai/tilelang
Length of output: 50372
🏁 Script executed:
# Get the full details of what changed in that commit
git show c11e07dRepository: tile-ai/tilelang
Length of output: 50376
🏁 Script executed:
# Look at the atomic test file to see what's being tested
cat -n testing/python/language/test_tilelang_language_atomic.py | grep -A 5 -B 5 "int64\|AtomicMax\|AtomicMin" | head -150Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Search more broadly for actual usage of AtomicMax/Min with int64_t
rg "AtomicMax|AtomicMin" testing/ -A 3 -B 3 | grep -i "int64\|i64" -A 5 -B 5 | head -80Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Search for any actual usage of int64_t with atomic operations
rg "int64.*Atomic|Atomic.*int64" . -g "!.git" --type cpp --type py | head -30Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
# Check if there are any tests that combine int64 and atomic operations
rg "test.*atomic" . -i --type py | xargs grep -l "int64\|i64" | head -10Repository: tile-ai/tilelang
Length of output: 2946
The mapping is correct for AtomicAdd, but introduces incorrect semantics for AtomicMax/AtomicMin if used with int64_t.
The normalize_atomic_type<int64_t> specialization maps to unsigned long long for AtomicAdd, which works correctly due to two's complement arithmetic. However, AtomicMax and AtomicMin use cuda::atomic_ref<NT1>::fetch_max() / fetch_min(), which will perform unsigned comparisons instead of signed comparisons when NT1 is unsigned long long.
Example: AtomicMax(&val, 1) where val = int64_t(-1) should return 1, but unsigned comparison yields -1 as the maximum (since 0xFFFFFFFFFFFFFFFF > 1 in unsigned arithmetic).
Since no tests or codebase usage of AtomicMax/AtomicMin with int64_t was found, this appears to be an unsupported use case. Add a comment documenting this limitation:
template <> struct normalize_atomic_type<int64_t> {
// Note: This mapping is valid for AtomicAdd (two's complement arithmetic),
// but AtomicMax/AtomicMin with int64_t will use unsigned comparison semantics
// and are not supported.
using type = unsigned long long;
};🤖 Prompt for AI Agents
In `@src/tl_templates/cuda/atomic.h` around lines 34 - 36, Update the
normalize_atomic_type<int64_t> specialization to document the signed/unsigned
semantics: add a comment above template <> struct normalize_atomic_type<int64_t>
stating that mapping int64_t to unsigned long long is done to support AtomicAdd
(two's complement) but that AtomicMax/AtomicMin (which call
cuda::atomic_ref<NT1>::fetch_max()/fetch_min()) will perform unsigned
comparisons and therefore int64_t usage with AtomicMax/AtomicMin is not
supported; keep the using type = unsigned long long unchanged.
Summary
normalize_atomic_typespecialization forint64_tto map it tounsigned long longatomic_addwith int64 tensorsProblem
CUDA's
atomicAdddoesn't have anint64_toverload - it only supportsunsigned long longfor 64-bit integers. This caused compilation errors like:Solution
Since
int64_tandunsigned long longhave the same bit representation, we can safely mapint64_ttounsigned long longvia thenormalize_atomic_typetrait, which is already used forhalf_tandbfloat16_t.Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.