-
Notifications
You must be signed in to change notification settings - Fork 446
[Feature] Support message-only debug print #1755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughA message-only debug printing feature is added to the TileLang language. Device functions Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/language/print_op.py (1)
163-245: Make the msg-required behavior explicit whenobj is None.
print()now defaults toobj=None, msg=""; this currently trips theprint_msgassertions. If message-only is intended, emit a clear error whenmsgis empty.🔧 Proposed fix
- elif obj is None: - return print_msg(msg) + elif obj is None: + if not msg: + raise ValueError("msg is required when obj is None") + return print_msg(msg)
🤖 Fix all issues with AI agents
In `@src/tl_templates/cuda/debug.h`:
- Around line 130-134: The header-defined device function debug_print_msg should
be marked inline like the other utilities to avoid multiple-definition/linkage
issues; change its declaration to use the same inlining macro (e.g., TL_DEVICE
or static inline with __device__/__forceinline__) as
device_assert/device_assert_with_msg so the function is emitted inline across
compilation units while keeping the same signature debug_print_msg(const char
*msg).
In `@testing/python/debug/test_tilelang_debug_print.py`:
- Around line 100-106: The parameter Q in the T.prim_func named program is
unused and triggers Ruff ARG001; add a no-op use of Q inside program (e.g., a
harmless read or evaluate of Q or its shape) so the symbol is referenced without
changing behavior—place this no-op before or inside the Kernel block in function
program to preserve buffer naming and silence the lint.
🧹 Nitpick comments (2)
tilelang/language/print_op.py (1)
101-108: Prefer explicit exceptions overassertfor API validation.Asserts are stripped with
-O, and they produceAssertionErrorrather than a user-friendly exception. UseTypeError/ValueErrorso behavior is consistent.🔧 Proposed fix
`@macro` def print_msg(msg: str) -> tir.PrimExpr: """ Prints a message string. """ - assert isinstance(msg, str), "msg must be a string" - assert msg != "", "msg must not be empty" + if not isinstance(msg, str): + raise TypeError("msg must be a string") + if msg == "": + raise ValueError("msg must not be empty") tir.call_extern("handle", "debug_print_msg", msg)src/tl_templates/hip/debug.h (1)
103-107: Consistency note: While bare__device__functions in headers can theoretically cause multiple-definition issues, this pattern is consistent throughout the codebase (HIP copy.h has similar functions, CUDA debug.h uses the same approach). If improving consistency is desired, consider using__forceinline__ __device__(viaTL_DEVICEmacro) orstatic __device__ inline, as already used for other inline device functions. However, this is not required since the current approach works in practice.
| // Specialization for msg-only debug print | ||
| __device__ void debug_print_msg(const char *msg) { | ||
| printf("msg='%s' BlockIdx=(%d, %d, %d), ThreadIdx=(%d, %d, %d)\n", msg, | ||
| blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y, | ||
| threadIdx.z); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n src/tl_templates/cuda/debug.h | sed -n '120,140p'Repository: tile-ai/tilelang
Length of output: 647
🏁 Script executed:
rg -n "define TL_DEVICE" src/tl_templates/Repository: tile-ai/tilelang
Length of output: 464
🏁 Script executed:
cat -n src/tl_templates/cuda/debug.h | head -50Repository: tile-ai/tilelang
Length of output: 2770
🏁 Script executed:
find . -name "*.h" -o -name "*.hip" | xargs grep -l "debug.h\|debug_print" | head -20Repository: tile-ai/tilelang
Length of output: 122
🏁 Script executed:
cat -n src/tl_templates/hip/debug.h | sed -n '120,140p'Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
rg -n "debug_print_msg" src/tl_templates/Repository: tile-ai/tilelang
Length of output: 228
🏁 Script executed:
cat -n src/tl_templates/hip/debug.h | sed -n '95,115p'Repository: tile-ai/tilelang
Length of output: 716
🏁 Script executed:
cat -n src/tl_templates/hip/debug.h | sed -n '1,20p'Repository: tile-ai/tilelang
Length of output: 1014
🏁 Script executed:
rg -n "includes.*debug.h" src/Repository: tile-ai/tilelang
Length of output: 42
🏁 Script executed:
git ls-files | xargs grep -l "debug.h" | head -20Repository: tile-ai/tilelang
Length of output: 299
🏁 Script executed:
cat -n src/target/codegen_cuda.cc | grep -A 5 -B 5 "debug.h"Repository: tile-ai/tilelang
Length of output: 624
🏁 Script executed:
cat -n src/target/codegen_hip.cc | grep -A 5 -B 5 "debug.h"Repository: tile-ai/tilelang
Length of output: 613
🏁 Script executed:
cd src/tl_templates && find . -name "*.h" | xargs grep -l "__device__.*{" | head -10Repository: tile-ai/tilelang
Length of output: 189
🏁 Script executed:
rg -n "static __device__|inline.*__device__|__device__.*inline" src/tl_templates/ | head -20Repository: tile-ai/tilelang
Length of output: 2313
Use inline or static inline with debug_print_msg for consistency and to avoid potential device function linkage issues.
The function debug_print_msg is defined as a non-inline, non-template __device__ function in a header file. Other device utility functions in the same file (device_assert, device_assert_with_msg) use TL_DEVICE which includes __forceinline__, but debug_print_msg lacks this. This inconsistency and the lack of inline annotation can lead to multiple definition issues when the header is included across multiple compilation units. Apply the same inlining pattern as adjacent functions.
🔧 Proposed fix
-__device__ void debug_print_msg(const char *msg) {
+static __device__ inline void debug_print_msg(const char *msg) {
printf("msg='%s' BlockIdx=(%d, %d, %d), ThreadIdx=(%d, %d, %d)\n", msg,
blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y,
threadIdx.z);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Specialization for msg-only debug print | |
| __device__ void debug_print_msg(const char *msg) { | |
| printf("msg='%s' BlockIdx=(%d, %d, %d), ThreadIdx=(%d, %d, %d)\n", msg, | |
| blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y, | |
| threadIdx.z); | |
| // Specialization for msg-only debug print | |
| static __device__ inline void debug_print_msg(const char *msg) { | |
| printf("msg='%s' BlockIdx=(%d, %d, %d), ThreadIdx=(%d, %d, %d)\n", msg, | |
| blockIdx.x, blockIdx.y, blockIdx.z, threadIdx.x, threadIdx.y, | |
| threadIdx.z); | |
| } |
🤖 Prompt for AI Agents
In `@src/tl_templates/cuda/debug.h` around lines 130 - 134, The header-defined
device function debug_print_msg should be marked inline like the other utilities
to avoid multiple-definition/linkage issues; change its declaration to use the
same inlining macro (e.g., TL_DEVICE or static inline with
__device__/__forceinline__) as device_assert/device_assert_with_msg so the
function is emitted inline across compilation units while keeping the same
signature debug_print_msg(const char *msg).
| @T.prim_func | ||
| def program(Q: T.Tensor((M, N), dtype)): | ||
| with T.Kernel(4, 4, 2, threads=128 * 2) as (bx, by, bz): | ||
| tid = T.get_thread_binding() | ||
| if tid == 0: | ||
| T.print(bx + by + bz, msg="hello world") | ||
| if msg_only: | ||
| T.print(msg="hello world") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silence unused Q to keep Ruff ARG001 green.
Ruff reports Q unused in this program definition. If lint is enforced, this will fail. Consider a no-op use to preserve buffer naming.
🔧 Proposed fix
`@T.prim_func`
def program(Q: T.Tensor((M, N), dtype)):
+ _ = Q
with T.Kernel(4, 4, 2, threads=128 * 2) as (bx, by, bz):
tid = T.get_thread_binding()
if tid == 0:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @T.prim_func | |
| def program(Q: T.Tensor((M, N), dtype)): | |
| with T.Kernel(4, 4, 2, threads=128 * 2) as (bx, by, bz): | |
| tid = T.get_thread_binding() | |
| if tid == 0: | |
| T.print(bx + by + bz, msg="hello world") | |
| if msg_only: | |
| T.print(msg="hello world") | |
| `@T.prim_func` | |
| def program(Q: T.Tensor((M, N), dtype)): | |
| _ = Q | |
| with T.Kernel(4, 4, 2, threads=128 * 2) as (bx, by, bz): | |
| tid = T.get_thread_binding() | |
| if tid == 0: | |
| if msg_only: | |
| T.print(msg="hello world") |
🧰 Tools
🪛 Ruff (0.14.14)
101-101: Unused function argument: Q
(ARG001)
🤖 Prompt for AI Agents
In `@testing/python/debug/test_tilelang_debug_print.py` around lines 100 - 106,
The parameter Q in the T.prim_func named program is unused and triggers Ruff
ARG001; add a no-op use of Q inside program (e.g., a harmless read or evaluate
of Q or its shape) so the symbol is referenced without changing behavior—place
this no-op before or inside the Kernel block in function program to preserve
buffer naming and silence the lint.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.