Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Dec 14, 2025

This pull request refactors the codebase to move the GemmWarpPolicy and related GEMM primitive logic from the primitives module to a new tileop module, and removes unused or redundant code. It updates all relevant imports and removes obsolete files and test cases. The changes help to clarify module boundaries and simplify the codebase.

Module refactoring and import updates:

  • Moved GemmWarpPolicy from tilelang.primitives.gemm.base to tilelang.tileop.base, and updated all imports in the codebase to reference the new location. (examples/amd/example_amd_flash_attn_bwd.py, examples/amd/example_amd_flash_attn_fwd.py, tilelang/language/__init__.py, tilelang/language/experimental/gemm_sp.py, tilelang/language/gemm.py, tilelang/tileop/__init__.py, tilelang/tileop/base.py) [1] [2] [3] [4] [5] [6] [7]

Code cleanup and removal:

  • Removed the entire tilelang/primitives/gemm directory, including the base.py and gemm_mma.py files, and their associated logic, as this functionality has been moved or is no longer needed. (tilelang/primitives/gemm/__init__.py, tilelang/primitives/gemm/gemm_mma.py) [1] [2]
  • Deleted the tilelang/primitives/__init__.py file as it is no longer required after the refactor.

Test and example updates:

  • Removed the test file testing/python/primitives/test_tilelang_primitives_mma.py as it depended on the now-removed primitives GEMM implementation.
  • Modified the test entry point in testing/python/language/test_tilelang_language_atomic_add.py to directly run run_atomic_max instead of the removed tilelang.testing.main().

Summary by CodeRabbit

  • Refactor

    • Moved GemmWarpPolicy to a new module and updated import paths across the codebase.
    • Removed an older GEMM primitive implementation and its internal helpers.
  • Tests

    • Removed the legacy MMA/GEMM primitive test suite and related scaffolding.
  • Breaking Changes

    • CUDA atomic interfaces and language-level atomic calls now use pointer/address-based arguments instead of previously passing buffer/reference objects.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 14, 2025

Walkthrough

Moved and re-exported GemmWarpPolicy to tilelang.tileop.base, removed MMA-based GEMM primitives and tests, updated many import sites to the new location, changed CUDA atomic APIs to pointer-based signatures, and updated language-level atomic call sites to pass buffer addresses.

Changes

Cohort / File(s) Change Summary
GemmWarpPolicy import updates
examples/amd/example_amd_flash_attn_bwd.py, examples/amd/example_amd_flash_attn_fwd.py, tilelang/language/experimental/gemm_sp.py, tilelang/language/gemm.py, tilelang/tileop/gemm/gemm_base.py, tilelang/tileop/gemm_sp/__init__.py, tilelang/tileop/gemm_sp/gemm_sp_base.py
Updated imports to source GemmWarpPolicy from tilelang.tileop.base (was from tilelang.primitives.gemm.base or tilelang.ir). No behavioral changes.
Primitives package cleanup
tilelang/primitives/__init__.py
Removed bootstrap docstring and removed top-level gemm re-export from the package namespace.
Removed GEMM MMA primitives & factory
tilelang/primitives/gemm/__init__.py, tilelang/primitives/gemm/gemm_mma.py
Deleted modules exposing the gemm() factory and GemmPrimitiveMMA class and their public exports.
TileOp surface changes
tilelang/tileop/__init__.py, tilelang/tileop/base.py, tilelang/tileop/gemm/__init__.py
Re-exported GemmWarpPolicy from tilelang.tileop.base; removed GemmBaseParams and unused imports in tilelang/tileop/base.py; removed redundant alias in gemm submodule.
Language package re-exports
tilelang/language/__init__.py
Adjusted imports so GemmWarpPolicy is sourced from tilelang.tileop.base while gemm, gemm_v1, gemm_v2 remain imported from .gemm.
Tests removed
testing/python/primitives/test_tilelang_primitives_mma.py
Entire test module for MMA GEMM variants removed (all helper/test functions and Torch-based reference checks deleted).
CUDA atomic API changes
src/tl_templates/cuda/atomic.h
Changed atomic function signatures from reference parameters (T&) to pointer parameters (T*); updated internal address handling and all specialized/vector variants to use pointer-based addressing.
Language atomic callsite updates
tilelang/language/atomic.py
Updated atomic callsites to pass buffer addresses (T.address_of(dst) / T.address_of(src)) to externs instead of Buffer objects, matching pointer-based CUDA atomic signatures.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas to focus during review:
    • Search for remaining imports referencing removed tilelang.primitives.gemm.* and usages of GemmPrimitiveMMA / gemm().
    • Verify downstream callsites and public API consumers of GemmWarpPolicy still resolve correctly after re-export.
    • Review CUDA atomic signature changes for ABI/callsite consistency and ensure all language-level extern invocations were updated (including specialized/vector paths).

Possibly related PRs

Suggested reviewers

  • chengyupku
  • Rachmanino

Poem

🐰 I hopped through modules, nudged Gemm to new turf,

Policies moved, old prims went off to surf.
Atomics now point where the addresses play,
Tests took a nap, imports found a new way.
🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main refactoring: phasing out the primitives folder by consolidating its GEMM design into the tileop module, which aligns with the extensive codebase changes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Updated the `from_warp_partition` method in the `GemmWarpPolicy` class to return the type `GemmWarpPolicy` instead of a string, enhancing type safety and clarity in the codebase. Removed an unnecessary blank line for improved readability.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tilelang/tileop/base.py (1)

161-186: Fix misleading docstring examples (and optionally drop the redundant forward-ref quotes).
The examples call from_block_row_cols, but the method is from_warp_partition, so the docstring is currently wrong.

 @classmethod
-    def from_warp_partition(cls, m_warp: int, n_warp: int) -> "GemmWarpPolicy":
+    def from_warp_partition(cls, m_warp: int, n_warp: int) -> GemmWarpPolicy:
@@
-            >>> GemmWarpPolicy.from_block_row_cols(4, 1)  # All warps in rows
+            >>> GemmWarpPolicy.from_warp_partition(4, 1)  # All warps in rows
             GemmWarpPolicy.FullRow
-            >>> GemmWarpPolicy.from_block_row_cols(1, 4)  # All warps in columns
+            >>> GemmWarpPolicy.from_warp_partition(1, 4)  # All warps in columns
             GemmWarpPolicy.FullCol
-            >>> GemmWarpPolicy.from_block_row_cols(2, 2)  # Balanced distribution
+            >>> GemmWarpPolicy.from_warp_partition(2, 2)  # Balanced distribution
             GemmWarpPolicy.Square
tilelang/tileop/gemm_sp/__init__.py (1)

10-55: Fix incorrect GemmWarpPolicy import for FFI field annotation.

The policy field (line 54) is a TVM FFI Object that should be annotated with the GemmWarpPolicy from tilelang.ir (a Node/Scriptable class), not the IntEnum from tilelang.tileop.base imported on line 11. The C++ implementation constructs policy as a GemmWarpPolicyNode ObjectRef, which maps to the IR version. Replace the import or add a qualified reference to avoid the name collision and ensure the type annotation correctly reflects the actual FFI field type.

🧹 Nitpick comments (1)
tilelang/language/__init__.py (1)

62-63: Import reorganization is correct.

The imports are properly split: GemmWarpPolicy now comes from the new centralized location tilelang.tileop.base, while the gemm functions continue to be imported from the local .gemm module. This correctly exposes both through the tilelang.language namespace.

Optional cleanup: Static analysis indicates the # noqa: F401 directives may be unnecessary if the F401 rule isn't enabled in your linter configuration. Consider removing them for cleaner code, or keep them as defensive annotations for re-exports.

-from tilelang.tileop.base import GemmWarpPolicy  # noqa: F401
-from .gemm import gemm, gemm_v1, gemm_v2  # noqa: F401
+from tilelang.tileop.base import GemmWarpPolicy
+from .gemm import gemm, gemm_v1, gemm_v2
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00dd738 and 4e359a2.

📒 Files selected for processing (15)
  • examples/amd/example_amd_flash_attn_bwd.py (1 hunks)
  • examples/amd/example_amd_flash_attn_fwd.py (1 hunks)
  • testing/python/primitives/test_tilelang_primitives_mma.py (0 hunks)
  • tilelang/language/__init__.py (1 hunks)
  • tilelang/language/experimental/gemm_sp.py (1 hunks)
  • tilelang/language/gemm.py (1 hunks)
  • tilelang/primitives/__init__.py (0 hunks)
  • tilelang/primitives/gemm/__init__.py (0 hunks)
  • tilelang/primitives/gemm/gemm_mma.py (0 hunks)
  • tilelang/tileop/__init__.py (1 hunks)
  • tilelang/tileop/base.py (1 hunks)
  • tilelang/tileop/gemm/__init__.py (0 hunks)
  • tilelang/tileop/gemm/gemm_base.py (1 hunks)
  • tilelang/tileop/gemm_sp/__init__.py (1 hunks)
  • tilelang/tileop/gemm_sp/gemm_sp_base.py (1 hunks)
💤 Files with no reviewable changes (5)
  • tilelang/tileop/gemm/init.py
  • tilelang/primitives/init.py
  • tilelang/primitives/gemm/gemm_mma.py
  • tilelang/primitives/gemm/init.py
  • testing/python/primitives/test_tilelang_primitives_mma.py
🧰 Additional context used
🧬 Code graph analysis (7)
tilelang/tileop/gemm_sp/gemm_sp_base.py (2)
tilelang/tileop/base.py (1)
  • GemmWarpPolicy (6-186)
tilelang/ir.py (1)
  • GemmWarpPolicy (26-33)
examples/amd/example_amd_flash_attn_bwd.py (1)
tilelang/tileop/base.py (1)
  • GemmWarpPolicy (6-186)
tilelang/tileop/__init__.py (2)
tilelang/tileop/base.py (1)
  • GemmWarpPolicy (6-186)
tilelang/ir.py (1)
  • GemmWarpPolicy (26-33)
examples/amd/example_amd_flash_attn_fwd.py (1)
tilelang/tileop/base.py (1)
  • GemmWarpPolicy (6-186)
tilelang/language/gemm.py (2)
tilelang/tileop/base.py (1)
  • GemmWarpPolicy (6-186)
tilelang/ir.py (1)
  • GemmWarpPolicy (26-33)
tilelang/language/experimental/gemm_sp.py (3)
tilelang/tileop/base.py (1)
  • GemmWarpPolicy (6-186)
tilelang/ir.py (1)
  • GemmWarpPolicy (26-33)
src/op/gemm.h (4)
  • GemmWarpPolicy (59-83)
  • GemmWarpPolicy (64-68)
  • GemmWarpPolicy (70-74)
  • GemmWarpPolicy (76-82)
tilelang/tileop/gemm/gemm_base.py (2)
tilelang/tileop/base.py (1)
  • GemmWarpPolicy (6-186)
tilelang/ir.py (1)
  • GemmWarpPolicy (26-33)
🪛 Ruff (0.14.8)
tilelang/tileop/__init__.py

1-1: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tilelang/language/__init__.py

62-62: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


63-63: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

🔇 Additional comments (5)
tilelang/language/experimental/gemm_sp.py (1)

3-6: Import path update looks correct and consistent with the refactor.

The change from tilelang.primitives to tilelang.tileop.base for GemmWarpPolicy aligns with the module reorganization.

tilelang/tileop/gemm/gemm_base.py (1)

6-6: Import path update looks correct.

The import is correctly updated to use the new centralized location at tilelang.tileop.base. This aligns with the PR's goal of consolidating GEMM-related types into the tileop module.

examples/amd/example_amd_flash_attn_bwd.py (1)

5-5: Import path update is correct and consistent.

The import is properly updated to use tilelang.tileop.base. The file correctly uses both the direct import (GemmWarpPolicy.FullRow in the forward kernel) and the module-qualified access (T.GemmWarpPolicy.FullRow in the backward kernel), both of which resolve to the same relocated type.

tilelang/language/gemm.py (1)

4-4: Import path update is correct.

The import is properly updated to the new centralized location. The GemmWarpPolicy type is consistently used throughout the file for type annotations and default parameter values, and this change maintains full backward compatibility for the public API.

examples/amd/example_amd_flash_attn_fwd.py (1)

5-5: Import path update is correct.

The import is properly updated to tilelang.tileop.base, consistent with the codebase-wide refactoring.

Comment on lines +1 to 3
from .base import GemmWarpPolicy # noqa: F401
from .gemm import GemmPy # noqa: F401
from .gemm_sp import GemmSPPy # noqa: F401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the stale noqa to satisfy Ruff (RUF100) while keeping the re-export.

-from .base import GemmWarpPolicy  # noqa: F401
+from .base import GemmWarpPolicy
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from .base import GemmWarpPolicy # noqa: F401
from .gemm import GemmPy # noqa: F401
from .gemm_sp import GemmSPPy # noqa: F401
from .base import GemmWarpPolicy
from .gemm import GemmPy # noqa: F401
from .gemm_sp import GemmSPPy # noqa: F401
Suggested change
from .base import GemmWarpPolicy # noqa: F401
from .gemm import GemmPy # noqa: F401
from .gemm_sp import GemmSPPy # noqa: F401
from .base import GemmWarpPolicy
from .gemm import GemmPy
from .gemm_sp import GemmSPPy
🧰 Tools
🪛 Ruff (0.14.8)

1-1: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


2-2: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


3-3: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents
In tilelang/tileop/__init__.py lines 1-3, remove the stale "# noqa: F401"
comments and keep the imports as real exports, then add an explicit __all__ =
["GemmWarpPolicy", "GemmPy", "GemmSPPy"] so the names are considered used and
the re-exports remain while satisfying Ruff (RUF100).

Comment on lines 5 to 8
from tilelang.utils.language import is_shared, is_fragment
from tilelang.ir import GemmWarpPolicy
from tilelang.tileop.base import GemmWarpPolicy
from tvm.ir.base import Node

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 1) Find any remaining imports of the IR-side GemmWarpPolicy
echo "=== Searching for imports from tilelang.ir GemmWarpPolicy ==="
rg -n --type=py 'from\s+tilelang\.ir\s+import\s+GemmWarpPolicy' || echo "No matches found"

# 2) Find call sites that might expect the IR method signature compute_warp_partition
echo -e "\n=== Searching for compute_warp_partition calls ==="
rg -n --type=py 'compute_warp_partition\s*\(' -B2 -A3 || echo "No matches found"

# 3) Find explicit references to tilelang.ir.GemmWarpPolicy
echo -e "\n=== Searching for explicit IR.GemmWarpPolicy references ==="
rg -n --type=py 'tilelang\.ir\.GemmWarpPolicy|from tilelang\.ir import.*GemmWarpPolicy' || echo "No matches found"

# 4) Check if tilelang.ir.GemmWarpPolicy still exists
echo -e "\n=== Checking if tilelang/ir module has GemmWarpPolicy ==="
find . -name "*.py" -path "*/tilelang/ir*" -type f | head -20

Repository: tile-ai/tilelang

Length of output: 8294


🏁 Script executed:

#!/bin/bash
# Check the GemmWarpPolicy in tileop/base.py
echo "=== GemmWarpPolicy in tileop/base.py ==="
rg -n --type=py -A15 'class GemmWarpPolicy' tilelang/tileop/base.py || true

# Check how self.policy is defined and typed in gemm_base.py or similar
echo -e "\n=== Checking policy attribute definition in GemmBase ==="
rg -n --type=py 'self\.policy' tilelang/tileop/base.py | head -20 || true

# Check the actual GemmWarpPolicy definitions in tilelang/ir.py
echo -e "\n=== GemmWarpPolicy definitions in tilelang/ir.py ==="
rg -n --type=py -B3 -A8 'class GemmWarpPolicy|class GemmSPWarpPolicy' tilelang/ir.py || true

Repository: tile-ai/tilelang

Length of output: 1593


🏁 Script executed:

#!/bin/bash
# Check GemmSPBase class and its parent to understand self.policy
echo "=== GemmSPBase class definition ==="
rg -n --type=py -B2 -A20 'class GemmSPBase' tilelang/tileop/gemm_sp/gemm_sp_base.py || true

# Check GemmBase class definition
echo -e "\n=== GemmBase class definition ==="
rg -n --type=py -B2 -A30 'class GemmBase' tilelang/tileop/base.py | head -50 || true

# Look for self.policy assignment in GemmSPBase or GemmBase
echo -e "\n=== self.policy assignments in codebase ==="
rg -n --type=py 'self\.policy\s*=' tilelang/tileop/ -B2 -A2 || true

Repository: tile-ai/tilelang

Length of output: 936


🏁 Script executed:

#!/bin/bash
# Check full gemm_sp_base.py file
echo "=== Full gemm_sp_base.py content ==="
cat -n tilelang/tileop/gemm_sp/gemm_sp_base.py || true

# Search for @property in the codebase for policy
echo -e "\n=== @property definitions related to policy ==="
rg -n --type=py -B1 -A5 '@property' tilelang/tileop/gemm_sp/gemm_sp_base.py || true

# Check parent class definitions
echo -e "\n=== GemmBase class in tileop/base.py ==="
rg -n --type=py -B2 -A50 'class GemmBase' tilelang/tileop/base.py | head -80 || true

Repository: tile-ai/tilelang

Length of output: 6790


🏁 Script executed:

#!/bin/bash
# Check git history to see what the original import was
echo "=== Checking git log for the import change ==="
git log --oneline -20 tilelang/tileop/gemm_sp/gemm_sp_base.py 2>/dev/null || echo "Git history not available"

# Verify what gemm_sp_node.policy actually returns
echo -e "\n=== Checking gemm_sp_node type definition ==="
rg -n --type=py 'gemm_sp_node\.policy' tilelang/ -B2 -A2 || true

Repository: tile-ai/tilelang

Length of output: 402


Incorrect type annotation: GemmWarpPolicy import should be from tilelang.ir, not tilelang.tileop.base.

The @property policy on line 130 declares return type GemmWarpPolicy but imports it from tilelang.tileop.base (an IntEnum with values 0, 1, 2). However, the actual runtime value self.gemm_sp_node.policy returns the IR-side GemmWarpPolicy (a Node/Scriptable object with compute_warp_partition method). All call sites in gemm_sp_mma.py and elsewhere invoke self.policy.compute_warp_partition(...), which requires the IR-side type. The type annotation now misleads IDEs and type checkers about the actual type.

Fix: Change line 6 to from tilelang.ir import GemmWarpPolicy to match the actual return type and usage.

🤖 Prompt for AI Agents
In tilelang/tileop/gemm_sp/gemm_sp_base.py around lines 5 to 8, the type
annotation imports GemmWarpPolicy from tilelang.tileop.base but the property
policy actually returns the IR-side GemmWarpPolicy (a Node with methods like
compute_warp_partition); change the import to from tilelang.ir import
GemmWarpPolicy so the annotated type matches runtime usage and
IDE/type-checkers; update the single import line accordingly and ensure no other
references rely on the old IntEnum import.

@LeiWang1999 LeiWang1999 merged commit 89521e6 into tile-ai:main Dec 15, 2025
4 of 6 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/tl_templates/cuda/atomic.h (1)

609-648: Pre-existing: PTX instruction ordering inconsistency in AtomicAddx4Ret.

Not introduced by this PR, but worth noting: the PTX instructions in AtomicAddx4Ret use atom.global.gpu.release (lines 621, 629, 638) while AtomicAddx4 uses atom.release.gpu.global (lines 581, 589, 598).

Per PTX ISA, the correct ordering is .sem.scope.space (e.g., atom.release.gpu.global). The AtomicAddx4Ret variant appears to have the modifiers in the wrong order. Consider addressing this in a follow-up.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfd7721 and 461d11c.

📒 Files selected for processing (2)
  • src/tl_templates/cuda/atomic.h (5 hunks)
  • tilelang/language/atomic.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
🧬 Code graph analysis (1)
tilelang/language/atomic.py (1)
tilelang/language/tir/op.py (2)
  • call_extern (173-195)
  • address_of (464-480)
🔇 Additional comments (9)
tilelang/language/atomic.py (4)

59-68: Pointer-based extern calls look correct.

The change from passing dst directly to T.address_of(dst) aligns with the updated CUDA AtomicMax signature that now expects T1 *ref instead of T1 &ref. The implementation correctly handles both the no-memory-order and memory-order variants.


110-119: LGTM!

Consistent with atomic_max - correctly passes pointer via T.address_of(dst) for both memory-order variants.


340-340: LGTM!

Correctly updated to pass T.address_of(src) matching the pointer-based AtomicLoad(T *ref, ...) signature in the CUDA header.


393-393: LGTM!

Correctly updated to pass T.address_of(dst) for the destination while keeping src as a value, matching the AtomicStore(T1 *ref, T2 value, ...) signature.

src/tl_templates/cuda/atomic.h (5)

48-77: Pointer-based signature change looks correct.

The conversion from T1 &ref to T1 *ref is properly implemented:

  • Parameter changed to pointer type
  • Internal assignment simplified from T1 *address = &ref; to T1 *address = ref;
  • The rest of the function logic (CAS loop for half/bf16, atomic_ref for other types) correctly uses the pointer

79-108: LGTM!

Consistent pointer-based implementation matching AtomicMax. The return semantics are preserved correctly.


110-170: LGTM!

Both AtomicMin and AtomicMinRet correctly updated to pointer-based signatures with the same pattern as the max variants.


693-700: LGTM!

Clean conversion to pointer-based signature. The cuda::atomic_ref<T, cuda::thread_scope_device> aref(*ref) correctly dereferences the pointer.


702-711: LGTM!

Correctly updated to pointer-based signature, consistent with AtomicLoad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant