Skip to content

Conversation

@Rachmanino
Copy link
Collaborator

@Rachmanino Rachmanino commented Jan 29, 2026

Summary by CodeRabbit

  • New Features
    • Added message-only debug printing capability, allowing users to output custom debug messages without printing values.
    • Enhanced print function to support message-only mode when no object is provided, with optional debug message formatting that includes block and thread indices.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

A message-only debug printing feature is added to the TileLang language. Device functions debug_print_msg are introduced in CUDA and HIP templates to print formatted messages with block/thread indices. The public print() API is extended to accept optional obj parameter, allowing message-only printing when obj is None.

Changes

Cohort / File(s) Summary
CUDA/HIP Debug Functions
src/tl_templates/cuda/debug.h, src/tl_templates/hip/debug.h
Added new __device__ void debug_print_msg(const char *msg) function in both files for device-scoped message-only debug printing with block and thread information.
Python Debug Tests
testing/python/debug/test_tilelang_debug_print.py
Added msg_only parameter to debug_print_msg() test function to conditionally print either only the message or both computed value and message; updated test cases to exercise both code paths.
Language Print API
tilelang/language/print_op.py
Modified print() signature to accept optional obj parameter (defaults to None); introduced new print_msg(msg: str) macro; added conditional logic to delegate to print_msg() when obj is None.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

  • LeiWang1999

Poem

🐰 A message hops through the debug stream,
No variables cluttering the beam,
Just words and threads in harmony,
The print now dances wild and free! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Feature] Support message-only debug print' directly and clearly describes the main change: adding support for message-only debug printing functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tilelang/language/print_op.py (1)

163-245: Make the msg-required behavior explicit when obj is None.

print() now defaults to obj=None, msg=""; this currently trips the print_msg assertions. If message-only is intended, emit a clear error when msg is empty.

🔧 Proposed fix
-    elif obj is None:
-        return print_msg(msg)
+    elif obj is None:
+        if not msg:
+            raise ValueError("msg is required when obj is None")
+        return print_msg(msg)
🤖 Fix all issues with AI agents
In `@src/tl_templates/cuda/debug.h`:
- Around line 130-134: The header-defined device function debug_print_msg should
be marked inline like the other utilities to avoid multiple-definition/linkage
issues; change its declaration to use the same inlining macro (e.g., TL_DEVICE
or static inline with __device__/__forceinline__) as
device_assert/device_assert_with_msg so the function is emitted inline across
compilation units while keeping the same signature debug_print_msg(const char
*msg).

In `@testing/python/debug/test_tilelang_debug_print.py`:
- Around line 100-106: The parameter Q in the T.prim_func named program is
unused and triggers Ruff ARG001; add a no-op use of Q inside program (e.g., a
harmless read or evaluate of Q or its shape) so the symbol is referenced without
changing behavior—place this no-op before or inside the Kernel block in function
program to preserve buffer naming and silence the lint.
🧹 Nitpick comments (2)
tilelang/language/print_op.py (1)

101-108: Prefer explicit exceptions over assert for API validation.

Asserts are stripped with -O, and they produce AssertionError rather than a user-friendly exception. Use TypeError/ValueError so behavior is consistent.

🔧 Proposed fix
 `@macro`
 def print_msg(msg: str) -> tir.PrimExpr:
     """
     Prints a message string.
     """
-    assert isinstance(msg, str), "msg must be a string"
-    assert msg != "", "msg must not be empty"
+    if not isinstance(msg, str):
+        raise TypeError("msg must be a string")
+    if msg == "":
+        raise ValueError("msg must not be empty")
     tir.call_extern("handle", "debug_print_msg", msg)
src/tl_templates/hip/debug.h (1)

103-107: Consistency note: While bare __device__ functions in headers can theoretically cause multiple-definition issues, this pattern is consistent throughout the codebase (HIP copy.h has similar functions, CUDA debug.h uses the same approach). If improving consistency is desired, consider using __forceinline__ __device__ (via TL_DEVICE macro) or static __device__ inline, as already used for other inline device functions. However, this is not required since the current approach works in practice.

Comment on lines +130 to +134
// Specialization for msg-only debug print
__device__ void debug_print_msg(const char *msg) {
printf("msg='%s' BlockIdx=(%d, %d, %d), ThreadIdx=(%d, %d, %d)\n", msg,
blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y,
threadIdx.z);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/tl_templates/cuda/debug.h | sed -n '120,140p'

Repository: tile-ai/tilelang

Length of output: 647


🏁 Script executed:

rg -n "define TL_DEVICE" src/tl_templates/

Repository: tile-ai/tilelang

Length of output: 464


🏁 Script executed:

cat -n src/tl_templates/cuda/debug.h | head -50

Repository: tile-ai/tilelang

Length of output: 2770


🏁 Script executed:

find . -name "*.h" -o -name "*.hip" | xargs grep -l "debug.h\|debug_print" | head -20

Repository: tile-ai/tilelang

Length of output: 122


🏁 Script executed:

cat -n src/tl_templates/hip/debug.h | sed -n '120,140p'

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

rg -n "debug_print_msg" src/tl_templates/

Repository: tile-ai/tilelang

Length of output: 228


🏁 Script executed:

cat -n src/tl_templates/hip/debug.h | sed -n '95,115p'

Repository: tile-ai/tilelang

Length of output: 716


🏁 Script executed:

cat -n src/tl_templates/hip/debug.h | sed -n '1,20p'

Repository: tile-ai/tilelang

Length of output: 1014


🏁 Script executed:

rg -n "includes.*debug.h" src/

Repository: tile-ai/tilelang

Length of output: 42


🏁 Script executed:

git ls-files | xargs grep -l "debug.h" | head -20

Repository: tile-ai/tilelang

Length of output: 299


🏁 Script executed:

cat -n src/target/codegen_cuda.cc | grep -A 5 -B 5 "debug.h"

Repository: tile-ai/tilelang

Length of output: 624


🏁 Script executed:

cat -n src/target/codegen_hip.cc | grep -A 5 -B 5 "debug.h"

Repository: tile-ai/tilelang

Length of output: 613


🏁 Script executed:

cd src/tl_templates && find . -name "*.h" | xargs grep -l "__device__.*{" | head -10

Repository: tile-ai/tilelang

Length of output: 189


🏁 Script executed:

rg -n "static __device__|inline.*__device__|__device__.*inline" src/tl_templates/ | head -20

Repository: tile-ai/tilelang

Length of output: 2313


Use inline or static inline with debug_print_msg for consistency and to avoid potential device function linkage issues.

The function debug_print_msg is defined as a non-inline, non-template __device__ function in a header file. Other device utility functions in the same file (device_assert, device_assert_with_msg) use TL_DEVICE which includes __forceinline__, but debug_print_msg lacks this. This inconsistency and the lack of inline annotation can lead to multiple definition issues when the header is included across multiple compilation units. Apply the same inlining pattern as adjacent functions.

🔧 Proposed fix
-__device__ void debug_print_msg(const char *msg) {
+static __device__ inline void debug_print_msg(const char *msg) {
   printf("msg='%s' BlockIdx=(%d, %d, %d), ThreadIdx=(%d, %d, %d)\n", msg,
          blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y,
          threadIdx.z);
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Specialization for msg-only debug print
__device__ void debug_print_msg(const char *msg) {
printf("msg='%s' BlockIdx=(%d, %d, %d), ThreadIdx=(%d, %d, %d)\n", msg,
blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y,
threadIdx.z);
// Specialization for msg-only debug print
static __device__ inline void debug_print_msg(const char *msg) {
printf("msg='%s' BlockIdx=(%d, %d, %d), ThreadIdx=(%d, %d, %d)\n", msg,
blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y,
threadIdx.z);
}
🤖 Prompt for AI Agents
In `@src/tl_templates/cuda/debug.h` around lines 130 - 134, The header-defined
device function debug_print_msg should be marked inline like the other utilities
to avoid multiple-definition/linkage issues; change its declaration to use the
same inlining macro (e.g., TL_DEVICE or static inline with
__device__/__forceinline__) as device_assert/device_assert_with_msg so the
function is emitted inline across compilation units while keeping the same
signature debug_print_msg(const char *msg).

Comment on lines 100 to +106
@T.prim_func
def program(Q: T.Tensor((M, N), dtype)):
with T.Kernel(4, 4, 2, threads=128 * 2) as (bx, by, bz):
tid = T.get_thread_binding()
if tid == 0:
T.print(bx + by + bz, msg="hello world")
if msg_only:
T.print(msg="hello world")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silence unused Q to keep Ruff ARG001 green.

Ruff reports Q unused in this program definition. If lint is enforced, this will fail. Consider a no-op use to preserve buffer naming.

🔧 Proposed fix
     `@T.prim_func`
     def program(Q: T.Tensor((M, N), dtype)):
+        _ = Q
         with T.Kernel(4, 4, 2, threads=128 * 2) as (bx, by, bz):
             tid = T.get_thread_binding()
             if tid == 0:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@T.prim_func
def program(Q: T.Tensor((M, N), dtype)):
with T.Kernel(4, 4, 2, threads=128 * 2) as (bx, by, bz):
tid = T.get_thread_binding()
if tid == 0:
T.print(bx + by + bz, msg="hello world")
if msg_only:
T.print(msg="hello world")
`@T.prim_func`
def program(Q: T.Tensor((M, N), dtype)):
_ = Q
with T.Kernel(4, 4, 2, threads=128 * 2) as (bx, by, bz):
tid = T.get_thread_binding()
if tid == 0:
if msg_only:
T.print(msg="hello world")
🧰 Tools
🪛 Ruff (0.14.14)

101-101: Unused function argument: Q

(ARG001)

🤖 Prompt for AI Agents
In `@testing/python/debug/test_tilelang_debug_print.py` around lines 100 - 106,
The parameter Q in the T.prim_func named program is unused and triggers Ruff
ARG001; add a no-op use of Q inside program (e.g., a harmless read or evaluate
of Q or its shape) so the symbol is referenced without changing behavior—place
this no-op before or inside the Kernel block in function program to preserve
buffer naming and silence the lint.

@LeiWang1999 LeiWang1999 merged commit 3fbf562 into tile-ai:main Jan 29, 2026
6 of 7 checks passed
@Rachmanino Rachmanino deleted the debug-print-str branch January 29, 2026 07:53
@Rachmanino Rachmanino restored the debug-print-str branch January 29, 2026 07:54
@Rachmanino Rachmanino deleted the debug-print-str branch January 29, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants