Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/tl_templates/cuda/atomic.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ template <> struct normalize_atomic_type<bfloat16_t> {
};
#endif

template <> struct normalize_atomic_type<int64_t> {
using type = unsigned long long;
};
Comment on lines +34 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the structure of the atomic.h file
wc -l src/tl_templates/cuda/atomic.h

Repository: tile-ai/tilelang

Length of output: 94


🏁 Script executed:

# Read the relevant sections of the file to understand the implementation
cat -n src/tl_templates/cuda/atomic.h | head -200

Repository: tile-ai/tilelang

Length of output: 8917


🏁 Script executed:

# Search for AtomicMax and AtomicMin definitions and their usage of normalize_atomic_type
rg -n "normalize_atomic_type|AtomicMax|AtomicMin" src/tl_templates/cuda/atomic.h -A 5

Repository: tile-ai/tilelang

Length of output: 3496


🏁 Script executed:

# Search for usages of AtomicMax and AtomicMin to see if they're called with int64_t
rg -n "AtomicMax|AtomicMin" --type cpp --type py -B 2 -A 2 | grep -v "^src/tl_templates/cuda/atomic.h" | head -100

Repository: tile-ai/tilelang

Length of output: 7421


🏁 Script executed:

# Look for any comments in atomic.h explaining the int64_t design choice
cat src/tl_templates/cuda/atomic.h | grep -A 5 -B 5 "int64_t"

Repository: tile-ai/tilelang

Length of output: 333


🏁 Script executed:

# Check git history to understand why this mapping was added
git log -p --all -- src/tl_templates/cuda/atomic.h | grep -A 10 -B 10 "int64_t" | head -60

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Search for tests or documentation about int64_t atomic operations
find . -name "*.py" -o -name "*.md" -o -name "*.txt" | xargs grep -l "int64.*atomic\|atomic.*int64" 2>/dev/null | head -20

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Check if there are any CUDA driver/runtime constraints on int64_t atomics
# Look for comments in the code about why int64_t is normalized to unsigned long long
rg -n "normalize_atomic_type|int64.*unsigned" src/ -B 3 -A 3 | grep -A 5 -B 5 "int64"

Repository: tile-ai/tilelang

Length of output: 5124


🏁 Script executed:

# Check what versions of CUDA atomic support int64_t
# Look at the CUDART_VERSION guards in atomic.h
cat -n src/tl_templates/cuda/atomic.h | sed -n '28,80p'

Repository: tile-ai/tilelang

Length of output: 2259


🏁 Script executed:

# Search for test files that might use atomic operations
find . -type f \( -name "*test*.py" -o -name "*test*.cc" -o -name "*test*.h" \) | head -20

Repository: tile-ai/tilelang

Length of output: 1074


🏁 Script executed:

# Check HIP atomic header to see how it handles int64_t
cat -n src/tl_templates/hip/atomic.h | head -100

Repository: tile-ai/tilelang

Length of output: 4870


🏁 Script executed:

# Search for any recent commits or PRs that added int64_t normalization
git log --all --oneline -- src/tl_templates/cuda/atomic.h | head -20

Repository: tile-ai/tilelang

Length of output: 208


🏁 Script executed:

# Look at the specific commit that added int64_t support
git show c11e07d --stat

Repository: tile-ai/tilelang

Length of output: 50372


🏁 Script executed:

# Get the full details of what changed in that commit
git show c11e07d

Repository: tile-ai/tilelang

Length of output: 50376


🏁 Script executed:

# Look at the atomic test file to see what's being tested
cat -n testing/python/language/test_tilelang_language_atomic.py | grep -A 5 -B 5 "int64\|AtomicMax\|AtomicMin" | head -150

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Search more broadly for actual usage of AtomicMax/Min with int64_t
rg "AtomicMax|AtomicMin" testing/ -A 3 -B 3 | grep -i "int64\|i64" -A 5 -B 5 | head -80

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Search for any actual usage of int64_t with atomic operations
rg "int64.*Atomic|Atomic.*int64" . -g "!.git" --type cpp --type py | head -30

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Check if there are any tests that combine int64 and atomic operations
rg "test.*atomic" . -i --type py | xargs grep -l "int64\|i64" | head -10

Repository: tile-ai/tilelang

Length of output: 2946


The mapping is correct for AtomicAdd, but introduces incorrect semantics for AtomicMax/AtomicMin if used with int64_t.

The normalize_atomic_type<int64_t> specialization maps to unsigned long long for AtomicAdd, which works correctly due to two's complement arithmetic. However, AtomicMax and AtomicMin use cuda::atomic_ref<NT1>::fetch_max() / fetch_min(), which will perform unsigned comparisons instead of signed comparisons when NT1 is unsigned long long.

Example: AtomicMax(&val, 1) where val = int64_t(-1) should return 1, but unsigned comparison yields -1 as the maximum (since 0xFFFFFFFFFFFFFFFF > 1 in unsigned arithmetic).

Since no tests or codebase usage of AtomicMax/AtomicMin with int64_t was found, this appears to be an unsupported use case. Add a comment documenting this limitation:

template <> struct normalize_atomic_type<int64_t> {
  // Note: This mapping is valid for AtomicAdd (two's complement arithmetic),
  // but AtomicMax/AtomicMin with int64_t will use unsigned comparison semantics
  // and are not supported.
  using type = unsigned long long;
};
🤖 Prompt for AI Agents
In `@src/tl_templates/cuda/atomic.h` around lines 34 - 36, Update the
normalize_atomic_type<int64_t> specialization to document the signed/unsigned
semantics: add a comment above template <> struct normalize_atomic_type<int64_t>
stating that mapping int64_t to unsigned long long is done to support AtomicAdd
(two's complement) but that AtomicMax/AtomicMin (which call
cuda::atomic_ref<NT1>::fetch_max()/fetch_min()) will perform unsigned
comparisons and therefore int64_t usage with AtomicMax/AtomicMin is not
supported; keep the using type = unsigned long long unchanged.


template <typename T1, typename T2> TL_DEVICE T1 cuda_cast(T2 val) {
return T1(val);
}
Expand Down
Loading