Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Jan 22, 2026

Summary

  • Add normalize_atomic_type specialization for int64_t to map it to unsigned long long
  • Fixes compilation error when using atomic_add with int64 tensors

Problem

CUDA's atomicAdd doesn't have an int64_t overload - it only supports unsigned long long for 64-bit integers. This caused compilation errors like:

error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (NT1 *, int64_t)

Solution

Since int64_t and unsigned long long have the same bit representation, we can safely map int64_t to unsigned long long via the normalize_atomic_type trait, which is already used for half_t and bfloat16_t.

Test plan

  • Verified compilation succeeds with int64 atomic_add kernel

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved atomic operation handling for 64-bit signed integers in CUDA, ensuring correct type normalization and consistent behavior across operations.

✏️ Tip: You can customize this high-level summary in your review settings.

CUDA's atomicAdd doesn't have an int64_t overload, only unsigned long long.
This caused compilation errors when using atomic_add with int64 tensors.

Add normalize_atomic_type specialization for int64_t to map it to
unsigned long long, which has the same bit representation and works
correctly for atomic add operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

A template specialization is added to src/tl_templates/cuda/atomic.h to map int64_t to unsigned long long for CUDA atomic type normalization. This ensures 64-bit signed integers are represented using an unsigned long long underlying type in atomic operations.

Changes

Cohort / File(s) Summary
CUDA Atomic Type Mapping
src/tl_templates/cuda/atomic.h
Adds template specialization normalize_atomic_type<int64_t> with type = unsigned long long, normalizing 64-bit signed integer representation in atomic contexts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A type maps to another with care,
Int64 becomes unsigned so rare,
Atomic operations aligned,
With unsigned long long in mind! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding int64_t support for atomic operations through a type specialization.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/tl_templates/cuda/atomic.h`:
- Around line 34-36: Update the normalize_atomic_type<int64_t> specialization to
document the signed/unsigned semantics: add a comment above template <> struct
normalize_atomic_type<int64_t> stating that mapping int64_t to unsigned long
long is done to support AtomicAdd (two's complement) but that
AtomicMax/AtomicMin (which call cuda::atomic_ref<NT1>::fetch_max()/fetch_min())
will perform unsigned comparisons and therefore int64_t usage with
AtomicMax/AtomicMin is not supported; keep the using type = unsigned long long
unchanged.

Comment on lines +34 to +36
template <> struct normalize_atomic_type<int64_t> {
using type = unsigned long long;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the structure of the atomic.h file
wc -l src/tl_templates/cuda/atomic.h

Repository: tile-ai/tilelang

Length of output: 94


🏁 Script executed:

# Read the relevant sections of the file to understand the implementation
cat -n src/tl_templates/cuda/atomic.h | head -200

Repository: tile-ai/tilelang

Length of output: 8917


🏁 Script executed:

# Search for AtomicMax and AtomicMin definitions and their usage of normalize_atomic_type
rg -n "normalize_atomic_type|AtomicMax|AtomicMin" src/tl_templates/cuda/atomic.h -A 5

Repository: tile-ai/tilelang

Length of output: 3496


🏁 Script executed:

# Search for usages of AtomicMax and AtomicMin to see if they're called with int64_t
rg -n "AtomicMax|AtomicMin" --type cpp --type py -B 2 -A 2 | grep -v "^src/tl_templates/cuda/atomic.h" | head -100

Repository: tile-ai/tilelang

Length of output: 7421


🏁 Script executed:

# Look for any comments in atomic.h explaining the int64_t design choice
cat src/tl_templates/cuda/atomic.h | grep -A 5 -B 5 "int64_t"

Repository: tile-ai/tilelang

Length of output: 333


🏁 Script executed:

# Check git history to understand why this mapping was added
git log -p --all -- src/tl_templates/cuda/atomic.h | grep -A 10 -B 10 "int64_t" | head -60

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Search for tests or documentation about int64_t atomic operations
find . -name "*.py" -o -name "*.md" -o -name "*.txt" | xargs grep -l "int64.*atomic\|atomic.*int64" 2>/dev/null | head -20

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Check if there are any CUDA driver/runtime constraints on int64_t atomics
# Look for comments in the code about why int64_t is normalized to unsigned long long
rg -n "normalize_atomic_type|int64.*unsigned" src/ -B 3 -A 3 | grep -A 5 -B 5 "int64"

Repository: tile-ai/tilelang

Length of output: 5124


🏁 Script executed:

# Check what versions of CUDA atomic support int64_t
# Look at the CUDART_VERSION guards in atomic.h
cat -n src/tl_templates/cuda/atomic.h | sed -n '28,80p'

Repository: tile-ai/tilelang

Length of output: 2259


🏁 Script executed:

# Search for test files that might use atomic operations
find . -type f \( -name "*test*.py" -o -name "*test*.cc" -o -name "*test*.h" \) | head -20

Repository: tile-ai/tilelang

Length of output: 1074


🏁 Script executed:

# Check HIP atomic header to see how it handles int64_t
cat -n src/tl_templates/hip/atomic.h | head -100

Repository: tile-ai/tilelang

Length of output: 4870


🏁 Script executed:

# Search for any recent commits or PRs that added int64_t normalization
git log --all --oneline -- src/tl_templates/cuda/atomic.h | head -20

Repository: tile-ai/tilelang

Length of output: 208


🏁 Script executed:

# Look at the specific commit that added int64_t support
git show c11e07d --stat

Repository: tile-ai/tilelang

Length of output: 50372


🏁 Script executed:

# Get the full details of what changed in that commit
git show c11e07d

Repository: tile-ai/tilelang

Length of output: 50376


🏁 Script executed:

# Look at the atomic test file to see what's being tested
cat -n testing/python/language/test_tilelang_language_atomic.py | grep -A 5 -B 5 "int64\|AtomicMax\|AtomicMin" | head -150

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Search more broadly for actual usage of AtomicMax/Min with int64_t
rg "AtomicMax|AtomicMin" testing/ -A 3 -B 3 | grep -i "int64\|i64" -A 5 -B 5 | head -80

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Search for any actual usage of int64_t with atomic operations
rg "int64.*Atomic|Atomic.*int64" . -g "!.git" --type cpp --type py | head -30

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

# Check if there are any tests that combine int64 and atomic operations
rg "test.*atomic" . -i --type py | xargs grep -l "int64\|i64" | head -10

Repository: tile-ai/tilelang

Length of output: 2946


The mapping is correct for AtomicAdd, but introduces incorrect semantics for AtomicMax/AtomicMin if used with int64_t.

The normalize_atomic_type<int64_t> specialization maps to unsigned long long for AtomicAdd, which works correctly due to two's complement arithmetic. However, AtomicMax and AtomicMin use cuda::atomic_ref<NT1>::fetch_max() / fetch_min(), which will perform unsigned comparisons instead of signed comparisons when NT1 is unsigned long long.

Example: AtomicMax(&val, 1) where val = int64_t(-1) should return 1, but unsigned comparison yields -1 as the maximum (since 0xFFFFFFFFFFFFFFFF > 1 in unsigned arithmetic).

Since no tests or codebase usage of AtomicMax/AtomicMin with int64_t was found, this appears to be an unsupported use case. Add a comment documenting this limitation:

template <> struct normalize_atomic_type<int64_t> {
  // Note: This mapping is valid for AtomicAdd (two's complement arithmetic),
  // but AtomicMax/AtomicMin with int64_t will use unsigned comparison semantics
  // and are not supported.
  using type = unsigned long long;
};
🤖 Prompt for AI Agents
In `@src/tl_templates/cuda/atomic.h` around lines 34 - 36, Update the
normalize_atomic_type<int64_t> specialization to document the signed/unsigned
semantics: add a comment above template <> struct normalize_atomic_type<int64_t>
stating that mapping int64_t to unsigned long long is done to support AtomicAdd
(two's complement) but that AtomicMax/AtomicMin (which call
cuda::atomic_ref<NT1>::fetch_max()/fetch_min()) will perform unsigned
comparisons and therefore int64_t usage with AtomicMax/AtomicMin is not
supported; keep the using type = unsigned long long unchanged.

@LeiWang1999 LeiWang1999 merged commit f1c19fd into tile-ai:main Jan 22, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant