[BugFix] Add int64_t support for AtomicAdd #1716

LeiWang1999 · 2026-01-22T10:27:27Z

Summary

Add normalize_atomic_type specialization for int64_t to map it to unsigned long long
Fixes compilation error when using atomic_add with int64 tensors

Problem

CUDA's atomicAdd doesn't have an int64_t overload - it only supports unsigned long long for 64-bit integers. This caused compilation errors like:

error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (NT1 *, int64_t)

Solution

Since int64_t and unsigned long long have the same bit representation, we can safely map int64_t to unsigned long long via the normalize_atomic_type trait, which is already used for half_t and bfloat16_t.

Test plan

Verified compilation succeeds with int64 atomic_add kernel

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved atomic operation handling for 64-bit signed integers in CUDA, ensuring correct type normalization and consistent behavior across operations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

CUDA's atomicAdd doesn't have an int64_t overload, only unsigned long long. This caused compilation errors when using atomic_add with int64 tensors. Add normalize_atomic_type specialization for int64_t to map it to unsigned long long, which has the same bit representation and works correctly for atomic add operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

github-actions · 2026-01-22T10:27:38Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-01-22T10:27:48Z

📝 Walkthrough

Walkthrough

A template specialization is added to src/tl_templates/cuda/atomic.h to map int64_t to unsigned long long for CUDA atomic type normalization. This ensures 64-bit signed integers are represented using an unsigned long long underlying type in atomic operations.

Changes

Cohort / File(s)	Summary
CUDA Atomic Type Mapping `src/tl_templates/cuda/atomic.h`	Adds template specialization `normalize_atomic_type<int64_t>` with `type = unsigned long long`, normalizing 64-bit signed integer representation in atomic contexts.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A type maps to another with care,
Int64 becomes unsigned so rare,
Atomic operations aligned,
With unsigned long long in mind! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding int64_t support for atomic operations through a type specialization.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/tl_templates/cuda/atomic.h`:
- Around line 34-36: Update the normalize_atomic_type<int64_t> specialization to
document the signed/unsigned semantics: add a comment above template <> struct
normalize_atomic_type<int64_t> stating that mapping int64_t to unsigned long
long is done to support AtomicAdd (two's complement) but that
AtomicMax/AtomicMin (which call cuda::atomic_ref<NT1>::fetch_max()/fetch_min())
will perform unsigned comparisons and therefore int64_t usage with
AtomicMax/AtomicMin is not supported; keep the using type = unsigned long long
unchanged.

coderabbitai · 2026-01-22T10:31:44Z

src/tl_templates/cuda/atomic.h

+template <> struct normalize_atomic_type<int64_t> {
+  using type = unsigned long long;
+};


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the structure of the atomic.h file wc -l src/tl_templates/cuda/atomic.h

Repository: tile-ai/tilelang

Length of output: 94

🏁 Script executed:

# Read the relevant sections of the file to understand the implementation cat -n src/tl_templates/cuda/atomic.h | head -200

Repository: tile-ai/tilelang

Length of output: 8917

🏁 Script executed:

# Search for AtomicMax and AtomicMin definitions and their usage of normalize_atomic_type rg -n "normalize_atomic_type|AtomicMax|AtomicMin" src/tl_templates/cuda/atomic.h -A 5

Repository: tile-ai/tilelang

Length of output: 3496

🏁 Script executed:

# Search for usages of AtomicMax and AtomicMin to see if they're called with int64_t rg -n "AtomicMax|AtomicMin" --type cpp --type py -B 2 -A 2 | grep -v "^src/tl_templates/cuda/atomic.h" | head -100

Repository: tile-ai/tilelang

Length of output: 7421

🏁 Script executed:

# Look for any comments in atomic.h explaining the int64_t design choice cat src/tl_templates/cuda/atomic.h | grep -A 5 -B 5 "int64_t"

Repository: tile-ai/tilelang

Length of output: 333

🏁 Script executed:

# Check git history to understand why this mapping was added git log -p --all -- src/tl_templates/cuda/atomic.h | grep -A 10 -B 10 "int64_t" | head -60

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Search for tests or documentation about int64_t atomic operations find . -name "*.py" -o -name "*.md" -o -name "*.txt" | xargs grep -l "int64.*atomic\|atomic.*int64" 2>/dev/null | head -20

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Check if there are any CUDA driver/runtime constraints on int64_t atomics # Look for comments in the code about why int64_t is normalized to unsigned long long rg -n "normalize_atomic_type|int64.*unsigned" src/ -B 3 -A 3 | grep -A 5 -B 5 "int64"

Repository: tile-ai/tilelang

Length of output: 5124

🏁 Script executed:

# Check what versions of CUDA atomic support int64_t # Look at the CUDART_VERSION guards in atomic.h cat -n src/tl_templates/cuda/atomic.h | sed -n '28,80p'

Repository: tile-ai/tilelang

Length of output: 2259

🏁 Script executed:

# Search for test files that might use atomic operations find . -type f \( -name "*test*.py" -o -name "*test*.cc" -o -name "*test*.h" \) | head -20

Repository: tile-ai/tilelang

Length of output: 1074

🏁 Script executed:

# Check HIP atomic header to see how it handles int64_t cat -n src/tl_templates/hip/atomic.h | head -100

Repository: tile-ai/tilelang

Length of output: 4870

🏁 Script executed:

# Search for any recent commits or PRs that added int64_t normalization git log --all --oneline -- src/tl_templates/cuda/atomic.h | head -20

Repository: tile-ai/tilelang

Length of output: 208

🏁 Script executed:

# Look at the specific commit that added int64_t support git show c11e07d --stat

Repository: tile-ai/tilelang

Length of output: 50372

🏁 Script executed:

# Get the full details of what changed in that commit git show c11e07d

Repository: tile-ai/tilelang

Length of output: 50376

🏁 Script executed:

# Look at the atomic test file to see what's being tested cat -n testing/python/language/test_tilelang_language_atomic.py | grep -A 5 -B 5 "int64\|AtomicMax\|AtomicMin" | head -150

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Search more broadly for actual usage of AtomicMax/Min with int64_t rg "AtomicMax|AtomicMin" testing/ -A 3 -B 3 | grep -i "int64\|i64" -A 5 -B 5 | head -80

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Search for any actual usage of int64_t with atomic operations rg "int64.*Atomic|Atomic.*int64" . -g "!.git" --type cpp --type py | head -30

Repository: tile-ai/tilelang

Length of output: 42

🏁 Script executed:

# Check if there are any tests that combine int64 and atomic operations rg "test.*atomic" . -i --type py | xargs grep -l "int64\|i64" | head -10

Repository: tile-ai/tilelang

Length of output: 2946

The mapping is correct for AtomicAdd, but introduces incorrect semantics for AtomicMax/AtomicMin if used with int64_t.

The normalize_atomic_type<int64_t> specialization maps to unsigned long long for AtomicAdd, which works correctly due to two's complement arithmetic. However, AtomicMax and AtomicMin use cuda::atomic_ref<NT1>::fetch_max() / fetch_min(), which will perform unsigned comparisons instead of signed comparisons when NT1 is unsigned long long.

Example: AtomicMax(&val, 1) where val = int64_t(-1) should return 1, but unsigned comparison yields -1 as the maximum (since 0xFFFFFFFFFFFFFFFF > 1 in unsigned arithmetic).

Since no tests or codebase usage of AtomicMax/AtomicMin with int64_t was found, this appears to be an unsupported use case. Add a comment documenting this limitation:

template <> struct normalize_atomic_type<int64_t> { // Note: This mapping is valid for AtomicAdd (two's complement arithmetic), // but AtomicMax/AtomicMin with int64_t will use unsigned comparison semantics // and are not supported. using type = unsigned long long; };

🤖 Prompt for AI Agents

In `@src/tl_templates/cuda/atomic.h` around lines 34 - 36, Update the normalize_atomic_type<int64_t> specialization to document the signed/unsigned semantics: add a comment above template <> struct normalize_atomic_type<int64_t> stating that mapping int64_t to unsigned long long is done to support AtomicAdd (two's complement) but that AtomicMax/AtomicMin (which call cuda::atomic_ref<NT1>::fetch_max()/fetch_min()) will perform unsigned comparisons and therefore int64_t usage with AtomicMax/AtomicMin is not supported; keep the using type = unsigned long long unchanged.

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

LeiWang1999 merged commit f1c19fd into tile-ai:main Jan 22, 2026
7 checks passed

kurisu6912 mentioned this pull request Feb 11, 2026

[LoopVectorize] Loop Independent Var Optimization in IfThenElse Expr kurisu6912/tilelang#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Add int64_t support for AtomicAdd #1716

[BugFix] Add int64_t support for AtomicAdd #1716

Uh oh!

LeiWang1999 commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[BugFix] Add int64_t support for AtomicAdd #1716

[BugFix] Add int64_t support for AtomicAdd #1716

Uh oh!

Conversation

LeiWang1999 commented Jan 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Test plan

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Jan 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 22, 2026 •

edited

Loading