-
Notifications
You must be signed in to change notification settings - Fork 446
[Feature] add kUseCooperativeLaunch tag for tvm_ffi #1572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughAdds detection of the "use_cooperative_groups" function attribute in CUDA runtime extraction and tags launch parameters with Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/target/rt_mod_cuda.cc
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
| if (f->HasNonzeroAttr("use_cooperative_groups")) { | ||
| info.launch_param_tags.push_back( | ||
| runtime::launch_param::kUseCooperativeLaunch); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
echo "=== Searching for kUseCooperativeLaunch definition ==="
rg -n 'kUseCooperativeLaunch' --type cpp --type c --type h
echo -e "\n=== Searching for cudaLaunchCooperativeKernel usage ==="
rg -n 'cudaLaunchCooperativeKernel' --type cpp --type c --type h
echo -e "\n=== Checking launch_param enum/constants definitions ==="
rg -n 'launch_param' --type cpp --type c --type h -A 3 | head -80Repository: tile-ai/tilelang
Length of output: 5337
🏁 Script executed:
#!/bin/bash
echo "=== Searching for CUDAModuleCreate implementation ==="
rg -n 'CUDAModuleCreate' --type cpp --type c --type h -B 2 -A 15
echo -e "\n=== Searching for launch_param_tags processing ==="
rg -n 'launch_param_tags' --type cpp --type c --type h -B 2 -A 2 | head -100Repository: tile-ai/tilelang
Length of output: 4241
🏁 Script executed:
#!/bin/bash
echo "=== Check ExtractFuncInfo function context ==="
rg -n 'ExtractFuncInfo' --type cpp --type c --type h -B 2 -A 20 | head -100
echo -e "\n=== Search for use_cooperative_groups attribute ==="
rg -n 'use_cooperative_groups' --type cpp --type c --type hRepository: tile-ai/tilelang
Length of output: 6485
🏁 Script executed:
#!/bin/bash
echo "=== Searching for launch_param namespace/enum definitions ==="
rg -n 'namespace.*launch_param|enum.*launch_param' --type cpp --type c --type h
echo -e "\n=== Searching for kUseCooperativeLaunch in entire codebase ==="
rg -n 'kUseCooperativeLaunch' -B 3 -A 3
echo -e "\n=== Searching for runtime::CUDAModuleCreate definition ==="
rg -n 'CUDAModuleCreate.*{' -A 10 --type cpp --type cRepository: tile-ai/tilelang
Length of output: 834
🏁 Script executed:
#!/bin/bash
echo "=== Check if launch_param_tags is defined in FunctionInfo ==="
rg -n 'struct FunctionInfo|class FunctionInfo' --type cpp --type c --type h -A 15
echo -e "\n=== Search for where launch_param_tags are used in runtime ==="
rg -n 'launch_param_tags' --type cpp --type c --type h -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 2527
🏁 Script executed:
#!/bin/bash
echo "=== Searching for FunctionInfo struct with launch_param_tags ==="
rg -n 'launch_param_tags' -B 15 | head -60
echo -e "\n=== Search in runtime headers for FunctionInfo ==="
fd -e h -e hpp | xargs grep -l 'FunctionInfo' | head -10
echo -e "\n=== Search for runtime module implementation files ==="
fd -e cc -e cpp | xargs grep -l 'CUDAModuleCreate' 2>/dev/null | head -10Repository: tile-ai/tilelang
Length of output: 4136
🏁 Script executed:
#!/bin/bash
echo "=== List all header files to locate runtime definitions ==="
fd -path '*/runtime/*' -name '*.h' -o -path '*/runtime/*' -name '*.hpp' | head -20
echo -e "\n=== Search for launch_param in all files ==="
rg 'launch_param' --type cpp --type c --type h -l | head -10Repository: tile-ai/tilelang
Length of output: 475
🏁 Script executed:
#!/bin/bash
echo "=== Search for FunctionInfo definition ==="
rg 'struct FunctionInfo' -B 2 -A 30 --type cpp --type c
echo -e "\n=== Search for runtime namespace in code ==="
rg 'namespace runtime' --type cpp --type c -B 2 -A 5 | head -50
echo -e "\n=== Check what runtime is being used ==="
rg '#include.*runtime' --type cpp --type c | head -20Repository: tile-ai/tilelang
Length of output: 4983
🏁 Script executed:
#!/bin/bash
echo "=== Search for where launch_param constants might be defined ==="
rg 'kUse|launch_param::' --type cpp --type c --type h | head -30
echo -e "\n=== Check the exact context of kUseCooperativeLaunch ==="
cat src/target/rt_mod_cuda.cc | sed -n '1,80p'Repository: tile-ai/tilelang
Length of output: 3542
🏁 Script executed:
#!/bin/bash
echo "=== Search for runtime/cuda/cuda_module.h ==="
find . -name "cuda_module.h" -type f
echo -e "\n=== Search for runtime FunctionInfo definition ==="
rg 'struct FunctionInfo|class FunctionInfo' --type cpp --type c -B 5 -A 20
echo -e "\n=== Search for launch_param namespace definition ==="
rg 'namespace launch_param|const char.*kUse' --type cpp --type c --type h -B 2 -A 2Repository: tile-ai/tilelang
Length of output: 2303
🏁 Script executed:
#!/bin/bash
echo "=== Check runtime directory structure ==="
find runtime -type f -name "*.h" 2>/dev/null | head -20
echo -e "\n=== Look for FunctionInfo in runtime files ==="
find . -path '*/runtime/*' -name '*.h' -o -path '*/runtime/*' -name '*.cc' 2>/dev/null | xargs grep -l 'FunctionInfo' 2>/dev/null
echo -e "\n=== Check if there is an external TVM dependency ==="
cat CMakeLists.txt 2>/dev/null | head -30 || echo "No CMakeLists found"Repository: tile-ai/tilelang
Length of output: 1283
🏁 Script executed:
#!/bin/bash
echo "=== Check for TVM submodule ==="
ls -la | grep tvm
find . -name "*.h" -path "*/tvm/*" | head -5
echo -e "\n=== Look for where runtime::FunctionInfo comes from ==="
rg 'FunctionInfo' --type cpp --type c -B 3 | grep -E '(include|namespace|runtime)' | head -20
echo -e "\n=== Check what is imported from runtime ==="
cat src/target/rt_mod_cuda.cc | head -20Repository: tile-ai/tilelang
Length of output: 1819
🏁 Script executed:
#!/bin/bash
echo "=== Search all includes in rt_mod_cuda.cc ==="
rg '#include' src/target/rt_mod_cuda.cc
echo -e "\n=== Check for runtime module implementations elsewhere ==="
find . -type d -name runtime
echo -e "\n=== Look for TVM in dependencies ==="
cat CMakeLists.txt | grep -i tvm | head -10Repository: tile-ai/tilelang
Length of output: 945
🏁 Script executed:
#!/bin/bash
echo "=== Find cuda_module.h in local runtime ==="
find ./src/runtime -name "cuda_module.h" -o -name "*.h" | grep -i cuda
echo -e "\n=== Check runtime directory structure ==="
ls -la ./src/runtime/
echo -e "\n=== Search for FunctionInfo definition in src/runtime ==="
grep -r 'struct FunctionInfo\|class FunctionInfo' ./src/runtime/ 2>/dev/null
echo -e "\n=== Search for launch_param in src/runtime ==="
grep -r 'launch_param' ./src/runtime/ 2>/dev/null | head -20Repository: tile-ai/tilelang
Length of output: 621
🏁 Script executed:
#!/bin/bash
echo "=== Check cuda_module.h file ==="
cat ./src/runtime/cuda/cuda_module.h 2>/dev/null | head -100
echo -e "\n=== Look for all header files in runtime/cuda ==="
ls -la ./src/runtime/cuda/ 2>/dev/nullRepository: tile-ai/tilelang
Length of output: 143
🏁 Script executed:
#!/bin/bash
echo "=== Check for TVM submodule or external dependency ==="
find . -path "./3rdparty/*" -o -path "./tvm/*" -o -path "./third_party/*" 2>/dev/null | head -10
echo -e "\n=== Look at CMakeLists to understand TVM dependency ==="
grep -A 10 'load_tvm.cmake\|find_package.*TVM' CMakeLists.txt
echo -e "\n=== Check if TVM is available as a system library ==="
pkg-config --cflags --libs tvm 2>/dev/null || echo "TVM not found via pkg-config"
echo -e "\n=== Search for where kUseCooperativeLaunch might be defined ==="
grep -r 'kUseCooperativeLaunch' . 2>/dev/nullRepository: tile-ai/tilelang
Length of output: 888
🏁 Script executed:
#!/bin/bash
echo "=== List the cmake load_tvm.cmake file ==="
cat cmake/load_tvm.cmake 2>/dev/null | head -50
echo -e "\n=== Look for TVM headers in the build directory ==="
find . -name "runtime.h" -path "*/tvm/*" 2>/dev/null | head -5Repository: tile-ai/tilelang
Length of output: 1048
🏁 Script executed:
#!/bin/bash
echo "=== Search for kUseCooperativeLaunch in TVM submodule ==="
grep -r 'kUseCooperativeLaunch' ./3rdparty/tvm/ 2>/dev/null
echo -e "\n=== Search for launch_param definition in TVM ==="
grep -r 'namespace launch_param\|kUseCooperative' ./3rdparty/tvm/ 2>/dev/null | head -20
echo -e "\n=== Search for FunctionInfo struct in TVM ==="
grep -r 'struct FunctionInfo' ./3rdparty/tvm/src/runtime/ 2>/dev/null | head -20Repository: tile-ai/tilelang
Length of output: 215
🏁 Script executed:
#!/bin/bash
echo "=== Search for cudaLaunchCooperativeKernel usage in TVM ==="
grep -r 'cudaLaunchCooperativeKernel' ./3rdparty/tvm/src/runtime/ 2>/dev/null
echo -e "\n=== Look for CUDAModuleCreate in TVM ==="
grep -r 'CUDAModuleCreate' ./3rdparty/tvm/src/runtime/ 2>/dev/null | head -10
echo -e "\n=== Search for launch_param in TVM runtime ==="
grep -r 'launch_param' ./3rdparty/tvm/include/ 2>/dev/null | head -20Repository: tile-ai/tilelang
Length of output: 209
🏁 Script executed:
#!/bin/bash
echo "=== Check if code compiles by looking at build artifacts ==="
find . -name "*.o" -o -name "*.a" -o -name "*.so" 2>/dev/null | head -5
echo -e "\n=== Search for any definition of kUseCooperativeLaunch in entire repo ==="
grep -r 'kUseCooperativeLaunch' . --include="*.h" --include="*.cc" --include="*.cpp" 2>/dev/null
echo -e "\n=== Check git log for recent changes to rt_mod_cuda.cc ==="
git log --oneline src/target/rt_mod_cuda.cc 2>/dev/null | head -10
echo -e "\n=== Check git diff to see what changed ==="
git diff HEAD src/target/rt_mod_cuda.cc 2>/dev/null | head -100Repository: tile-ai/tilelang
Length of output: 422
🏁 Script executed:
#!/bin/bash
echo "=== Look for all constants defined in launch_param-like patterns ==="
grep -r 'const.*char.*kUse\|const.*string.*kUse' ./3rdparty/tvm/ 2>/dev/null | grep -i launch
echo -e "\n=== Check if launch_param is defined elsewhere ==="
find ./3rdparty/tvm -name "*.h" -exec grep -l 'launch_param' {} \; 2>/dev/null
echo -e "\n=== Look at what the runtime header actually contains ==="
head -200 ./3rdparty/tvm/include/tvm/runtime/c_runtime_api.h 2>/dev/null | grep -A 50 'launch'Repository: tile-ai/tilelang
Length of output: 239
🏁 Script executed:
#!/bin/bash
echo "=== Check the commit that added this ==="
git show c0fac3a 2>/dev/null
echo -e "\n=== Search for where constants should be defined ==="
grep -r 'kUseDynamicSharedMemoryTag' ./3rdparty/tvm/ 2>/dev/null | head -5
echo -e "\n=== Check if it's in a TVM include file ==="
find ./3rdparty/tvm/include -name "*.h" -exec grep -l 'launch_param\|FunctionInfo' {} \; 2>/dev/nullRepository: tile-ai/tilelang
Length of output: 50376
🏁 Script executed:
#!/bin/bash
echo "=== Look for TVM runtime module includes ==="
grep -r 'include.*tvm/runtime' ./3rdparty/tvm/src/runtime/cuda/ 2>/dev/null | head -10
echo -e "\n=== Check the actual TVM cuda module implementation ==="
find ./3rdparty/tvm -name "*cuda_module*" -type f
echo -e "\n=== Look at includes in tilelang's rt_mod_cuda.cc more carefully ==="
head -10 ./src/target/rt_mod_cuda.ccRepository: tile-ai/tilelang
Length of output: 497
The referenced constant runtime::launch_param::kUseCooperativeLaunch does not exist in this codebase or the TVM dependency, preventing compilation.
The code at lines 34-37 correctly checks for the "use_cooperative_groups" attribute and attempts to append a launch parameter tag, but it references runtime::launch_param::kUseCooperativeLaunch, which is not defined anywhere. While a similar pattern exists with tvm::runtime::launch_param::kUseDynamicSharedMemoryTag in lower_device_kernel_launch.cc, the specific cooperative launch constant is missing from the TVM runtime headers. This will cause a linker/compilation error.
Define or import the missing constant from the TVM runtime, or verify that it exists in the TVM version being used. Additionally, confirm that the CUDA runtime module downstream actually uses this tag to invoke cudaLaunchCooperativeKernel.
🤖 Prompt for AI Agents
In src/target/rt_mod_cuda.cc around lines 34-37: the code references
runtime::launch_param::kUseCooperativeLaunch which does not exist and prevents
compilation; fix by either (A) replacing that identifier with the correct
existing TVM runtime constant (confirm exact name in your TVM headers and update
the code/import to use it), or (B) if your TVM version lacks such a tag, add a
single definition for the missing constant in the appropriate
runtime::launch_param namespace/header used by this project (match the
type/value convention used for other tags), and then ensure the CUDA runtime
module checks that tag and calls cudaLaunchCooperativeKernel where required;
also verify and document the TVM version compatibility so future builds use the
correct symbol.
|
Can we add a simple test for it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
testing/python/language/test_tilelang_language_cooperative.py (1)
32-32: Consider usingtorch.zerosortorch.emptyfor clarity.Since the kernel overwrites all values in the first loop, the initial random values are never used. Using
torch.zerosortorch.emptywould make this clearer and avoid confusion.🔎 Suggested refactor
- tensor = torch.rand((N), dtype=torch.float32, device="cuda") + tensor = torch.zeros((N), dtype=torch.float32, device="cuda")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
testing/python/language/test_tilelang_language_cooperative.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
testing/python/language/test_tilelang_language_cooperative.py
📚 Learning: 2025-12-18T04:50:00.512Z
Learnt from: silentCoder-dev
Repo: tile-ai/tilelang PR: 1464
File: testing/python/language/test_tilelang_language_rand.py:14-14
Timestamp: 2025-12-18T04:50:00.512Z
Learning: In `testing/python/language/test_tilelang_language_rand.py`, the TileLang kernel uses `blk_M = M` (single block) and calls `rng_rand()` four times per element to align results with the Triton implementation, which uses `blk_M = 128` (multiple blocks) and calls the RNG once per element. These differences compensate for internal RNG behavior differences between TileLang and Triton.
Applied to files:
testing/python/language/test_tilelang_language_cooperative.py
🧬 Code graph analysis (1)
testing/python/language/test_tilelang_language_cooperative.py (2)
tilelang/language/builtin.py (1)
sync_grid(721-723)tilelang/testing/__init__.py (1)
requires_cuda_compute_version_ge(107-108)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (5)
testing/python/language/test_tilelang_language_cooperative.py (5)
1-4: LGTM: Imports are appropriate.The imports correctly include all necessary modules for the cooperative grid synchronization test.
7-23: LGTM: Kernel logic correctly validates cooperative synchronization.The kernel design is clever:
- First phase writes
A[i] = i- Grid synchronization ensures all blocks complete phase 1
- Second phase computes
A[i] += A[N-i-1], which reads values written by other blocks- Expected result: all elements become
N-1(1023)This effectively validates that cooperative grid sync works correctly across all blocks.
26-28: LGTM: Test requirements are correctly specified.The decorators appropriately require CUDA and compute capability ≥6.0, which is the minimum for cooperative group support (Pascal architecture and later).
38-39: LGTM: Standard test entry point.The main block correctly uses
tilelang.testing.main()for test execution.
26-35: Test adequately validates cooperative launch functionality.This test effectively addresses the PR objectives by:
- Verifying the kernel source includes the cooperative_groups sync call (line 31)
- Ensuring the kernel executes without the runtime failure described in issue #1501 (line 33)
- Validating the synchronization semantics are correct through output verification (line 35)
If the kUseCooperativeLaunch tag were not properly set, the kernel would fail with the CUDA launch error reported in the issue, and this test would catch that regression.
Resolve #1501
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.