Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 39 additions & 5 deletions .clang-tidy
Original file line number Diff line number Diff line change
@@ -1,10 +1,44 @@
Checks: >
# 1. 保留的类别:更容易发现 bug/性能问题
clang-analyzer-*,
cppcoreguidelines-*,
modernize-*,
cppcoreguidelines-pro-type-static-cast-downcast,
cppcoreguidelines-pro-type-member-init,
cppcoreguidelines-pro-bounds-array-to-pointer-decay,
cppcoreguidelines-pro-bounds-pointer-arithmetic,
cppcoreguidelines-slicing,
cppcoreguidelines-narrowing-conversions,
performance-*,
readability-*,
-readability-identifier-length

# 2. 可读性:只保留实用的
readability-braces-around-statements,
readability-container-size-empty,
readability-delete-null-pointer,
readability-redundant-member-init,
readability-redundant-smartptr-get,
readability-redundant-string-cstr,

# 3. 禁用所有侵入式/风格大改的规则
-readability-identifier-length,
-readability-avoid-const-params-in-decls,
-readability-else-after-return,
-cppcoreguidelines-avoid-magic-numbers,
-modernize-use-trailing-return-type,
-modernize-use-nodiscard,
-modernize-use-auto,
-modernize-pass-by-value,
-modernize-return-braced-init-list,
-modernize-use-default-member-init,
-modernize-loop-convert,
-modernize-concat-nested-namespaces,
-llvm-include-order,
-bugprone-unused-return-value,
-clang-diagnostic-unused-result,
-cppcoreguidelines-special-member-functions,
-performance-noexcept-move-constructor,
-cppcoreguidelines-narrowing-conversions,
-clang-diagnostic-error,
-cppcoreguidelines-pro-type-member-init,
Comment on lines +38 to +40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are duplicated rules that are both enabled and disabled. cppcoreguidelines-pro-type-member-init is enabled on line 5 and disabled on line 40. cppcoreguidelines-narrowing-conversions is enabled on line 9 and disabled on line 38. This is likely unintentional. Please remove one of each pair to resolve the conflict.


WarningsAsErrors: '*'

HeaderFilterRegex: '^(?!.*(3rdparty|build)).*$'
HeaderFilterRegex: '^(?!.*(3rdparty|build)).*$'
3 changes: 3 additions & 0 deletions .github/workflows/amd_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ jobs:
- name: Run format check
run: |
source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
mkdir -p build
cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_ROCM=ON; cd ..
if ! output=$(./format.sh 2>&1); then
Comment on lines +52 to 54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden format step (ROCm): fail on cmake error and verify compile_commands.json.

Mirror NVIDIA workflow hardening.

-        mkdir -p build
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_ROCM=ON; cd ..
+        set -euo pipefail
+        cmake -S . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_ROCM=ON
+        test -f build/compile_commands.json || { echo "compile_commands.json not found"; exit 2; }
@@
-        rm -rf build
+        rm -rf build

Also applies to: 61-61

🤖 Prompt for AI Agents
.github/workflows/amd_ci.yml around lines 51-53 (and line 61): the format step
does not fail fast on cmake errors and doesn't verify that
build/compile_commands.json was generated; update the workflow to check the
cmake exit status (fail if cmake returns non-zero) and after running cmake
verify that build/compile_commands.json exists (fail the job if missing) before
invoking ./format.sh so the pipeline mirrors the NVIDIA hardening behavior.

echo "------------------------------------"
echo "message:"
Expand All @@ -56,6 +58,7 @@ jobs:
echo "------------------------------------"
exit 1
fi
rm -rf build

- name: Commit and Push Changes
uses: stefanzweifel/git-auto-commit-action@v5
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ jobs:
- name: Run format check
run: |
source "${{ runner.tool_cache }}/${{ env.VENV_DIR }}/bin/activate"
mkdir -p build
# run cmake to create the build directory with compile_commands.json
cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=ON; cd ..
if ! output=$(./format.sh 2>&1); then
Comment on lines +51 to 54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden format step: fail fast on cmake errors and verify compile_commands.json.

Currently, cmake failures won’t fail the step. Also check the file exists before running format.sh.

-        mkdir -p build
-        # run cmake to create the build directory with compile_commands.json
-        cd build; cmake .. -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=ON; cd ..
+        set -euo pipefail
+        cmake -S . -B build -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DUSE_CUDA=ON
+        test -f build/compile_commands.json || { echo "compile_commands.json not found"; exit 2; }
@@
-        rm -rf build
+        rm -rf build

Also applies to: 61-61

🤖 Prompt for AI Agents
In .github/workflows/ci.yml around lines 50-53 and also at line 61, the workflow
currently runs cmake but does not fail the step on cmake errors and proceeds to
run format.sh without ensuring compile_commands.json exists; modify the script
so that the cmake invocation is checked (fail fast if cmake returns non-zero)
and then verify that build/compile_commands.json exists before calling
./format.sh (fail the job with a clear error if it's missing), applying the same
checks to the later occurrence at line 61.

echo "------------------------------------"
echo "message:"
Expand All @@ -55,6 +58,7 @@ jobs:
echo "------------------------------------"
exit 1
fi
rm -rf build

- name: Commit and Push Changes
uses: stefanzweifel/git-auto-commit-action@v5
Expand Down
67 changes: 67 additions & 0 deletions format.sh
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,73 @@ else
fi
echo 'tile-lang clang-format: Done'

echo 'tile-lang clang-tidy: Check Start'
# If clang-tidy is available, run it; otherwise, skip
if command -v run-clang-tidy &>/dev/null; then
# Check if clang-tidy is available
if ! command -v clang-tidy &>/dev/null; then
echo "clang-tidy not found. Skipping clang-tidy checks."
else
# Get clang-tidy version
CLANG_TIDY_VERSION=$(clang-tidy --version | head -n1 | awk '{print $4}')
echo "Using clang-tidy version: $CLANG_TIDY_VERSION"

# Check if build directory exists
if [ ! -d "build" ]; then
echo "Build directory not found. Skipping clang-tidy checks."
else
# Run clang-tidy on specified files
clang_tidy_files() {
run-clang-tidy -j 64 "$@" -p build
}

# Run clang-tidy on all C/C++ source files
clang_tidy_all() {
run-clang-tidy -j 64 src/*.cc -p build
Comment on lines +273 to +274
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of src/*.cc will not find files in subdirectories of src, and it can fail if the number of files is very large, exceeding command-line length limits. Using find with xargs is a more robust way to handle this.

Suggested change
clang_tidy_all() {
run-clang-tidy -j 64 src/*.cc -p build
find src -name '*.cc' | xargs run-clang-tidy -j 64 -p build

}

# Run clang-tidy on changed C/C++ files relative to main
clang_tidy_changed() {
if git show-ref --verify --quiet refs/remotes/origin/main; then
BASE_BRANCH="origin/main"
else
BASE_BRANCH="main"
fi

MERGEBASE="$(git merge-base $BASE_BRANCH HEAD)"

# Get changed C/C++ files
CHANGED_FILES=$(git diff --name-only --diff-filter=ACM "$MERGEBASE" -- '*.c' '*.cc' '*.cpp' '*.h' '*.hpp' 2>/dev/null || true)

if [ -n "$CHANGED_FILES" ]; then
echo "Running clang-tidy on changed files:"
echo "$CHANGED_FILES"
# Convert newline-separated files to space-separated and run clang-tidy once
CHANGED_FILES_SPACE=$(echo "$CHANGED_FILES" | tr '\n' ' ')
run-clang-tidy -j 64 $CHANGED_FILES_SPACE -p build
else
echo "No C/C++ files changed. Skipping clang-tidy."
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation for getting changed files and running clang-tidy on them can fail if filenames contain spaces. Using xargs with null-terminated filenames is a more robust approach.

Suggested change
CHANGED_FILES=$(git diff --name-only --diff-filter=ACM "$MERGEBASE" -- '*.c' '*.cc' '*.cpp' '*.h' '*.hpp' 2>/dev/null || true)
if [ -n "$CHANGED_FILES" ]; then
echo "Running clang-tidy on changed files:"
echo "$CHANGED_FILES"
# Convert newline-separated files to space-separated and run clang-tidy once
CHANGED_FILES_SPACE=$(echo "$CHANGED_FILES" | tr '\n' ' ')
run-clang-tidy -j 64 $CHANGED_FILES_SPACE -p build
else
echo "No C/C++ files changed. Skipping clang-tidy."
fi
# Get changed C/C++ files
if ! git diff --diff-filter=ACM --quiet --exit-code "$MERGEBASE" -- '*.c' '*.cc' '*.cpp' '*.h' '*.hpp' &>/dev/null; then
echo "Running clang-tidy on changed files:"
git diff --name-only --diff-filter=ACM -z "$MERGEBASE" -- '*.c' '*.cc' '*.cpp' '*.h' '*.hpp' | tee /dev/stderr | xargs -0 run-clang-tidy -j 64 -p build
else
echo "No C/C++ files changed. Skipping clang-tidy."
fi

}

if [[ "$1" == '--files' ]]; then
# If --files is given, run clang-tidy only on the provided files
clang_tidy_files "${@:2}"
elif [[ "$1" == '--all' ]]; then
# If --all is given, run clang-tidy on all source files
clang_tidy_all
else
# Otherwise, run clang-tidy only on changed C/C++ files
clang_tidy_changed
fi
fi
fi
else
echo "run-clang-tidy not found. Skipping clang-tidy checks."
echo "To install clang-tidy tools, you may need to install clang-tidy and run-clang-tidy."
fi
echo 'tile-lang clang-tidy: Done'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden clang-tidy integration: detect binaries, honor custom build dir, and lint all TUs

Current block:

  • Assumes run-clang-tidy and clang-tidy exact names; misses version-suffixed binaries (e.g., run-clang-tidy-18).
  • Hardcodes build path to build and only lints src/*.cc for “--all”, which skips nested sources and headers compiled from compile_commands.json.
  • Uses fixed -j 64; better to adapt to host cores and allow override.

Proposed update keeps behavior but improves robustness.

-echo 'tile-lang clang-tidy: Check Start'
-# If clang-tidy is available, run it; otherwise, skip
-if command -v run-clang-tidy &>/dev/null; then
-    # Check if clang-tidy is available
-    if ! command -v clang-tidy &>/dev/null; then
-        echo "clang-tidy not found. Skipping clang-tidy checks."
-    else
-        # Get clang-tidy version
-        CLANG_TIDY_VERSION=$(clang-tidy --version | head -n1 | awk '{print $4}')
-        echo "Using clang-tidy version: $CLANG_TIDY_VERSION"
-
-        # Check if build directory exists
-        if [ ! -d "build" ]; then
-            echo "Build directory not found. Skipping clang-tidy checks."
-        else
-            # Run clang-tidy on specified files
-            clang_tidy_files() {
-                run-clang-tidy -j 64 "$@" -p build
-            }
-
-            # Run clang-tidy on all C/C++ source files
-            clang_tidy_all() {
-                run-clang-tidy -j 64 src/*.cc -p build
-            }
-
-            # Run clang-tidy on changed C/C++ files relative to main
-            clang_tidy_changed() {
-                if git show-ref --verify --quiet refs/remotes/origin/main; then
-                    BASE_BRANCH="origin/main"
-                else
-                    BASE_BRANCH="main"
-                fi
-
-                MERGEBASE="$(git merge-base $BASE_BRANCH HEAD)"
-
-                # Get changed C/C++ files
-                CHANGED_FILES=$(git diff --name-only --diff-filter=ACM "$MERGEBASE" -- '*.c' '*.cc' '*.cpp' '*.h' '*.hpp' 2>/dev/null || true)
-                
-                if [ -n "$CHANGED_FILES" ]; then
-                    echo "Running clang-tidy on changed files:"
-                    echo "$CHANGED_FILES"
-                    # Convert newline-separated files to space-separated and run clang-tidy once
-                    CHANGED_FILES_SPACE=$(echo "$CHANGED_FILES" | tr '\n' ' ')
-                    run-clang-tidy -j 64 $CHANGED_FILES_SPACE -p build
-                else
-                    echo "No C/C++ files changed. Skipping clang-tidy."
-                fi
-            }
-
-            if [[ "$1" == '--files' ]]; then
-               # If --files is given, run clang-tidy only on the provided files
-               clang_tidy_files "${@:2}"
-            elif [[ "$1" == '--all' ]]; then
-               # If --all is given, run clang-tidy on all source files
-               clang_tidy_all
-            else
-               # Otherwise, run clang-tidy only on changed C/C++ files
-               clang_tidy_changed
-            fi
-        fi
-    fi
-else
-    echo "run-clang-tidy not found. Skipping clang-tidy checks."
-    echo "To install clang-tidy tools, you may need to install clang-tidy and run-clang-tidy."
-fi
-echo 'tile-lang clang-tidy: Done'
+echo 'tile-lang clang-tidy: Check Start'
+# Prefer explicit environment overrides when provided:
+#  - RUN_CLANG_TIDY_BIN / CLANG_TIDY_BIN
+#  - BUILD_DIR (or CMAKE_BUILD_DIR)
+RUN_CLANG_TIDY_BIN="${RUN_CLANG_TIDY_BIN:-$(command -v run-clang-tidy || command -v run-clang-tidy-18 || command -v run-clang-tidy-17 || true)}"
+CLANG_TIDY_BIN="${CLANG_TIDY_BIN:-$(command -v clang-tidy || command -v clang-tidy-18 || command -v clang-tidy-17 || true)}"
+BUILD_DIR="${BUILD_DIR:-${CMAKE_BUILD_DIR:-build}}"
+JOBS="${CLANG_TIDY_JOBS:-$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 8)}"
+
+if [[ -z "$RUN_CLANG_TIDY_BIN" ]]; then
+    echo "run-clang-tidy not found. Skipping clang-tidy checks."
+    echo "To install: llvm (provides run-clang-tidy)."
+elif [[ -z "$CLANG_TIDY_BIN" ]]; then
+    echo "clang-tidy not found. Skipping clang-tidy checks."
+elif [[ ! -f "$BUILD_DIR/compile_commands.json" ]]; then
+    echo "compile_commands.json not found under '$BUILD_DIR'. Skipping clang-tidy checks."
+else
+    CLANG_TIDY_VERSION="$($CLANG_TIDY_BIN --version | head -n1)"
+    echo "Using $CLANG_TIDY_VERSION"
+
+    clang_tidy_files() {
+        "$RUN_CLANG_TIDY_BIN" -j "$JOBS" "$@" -p "$BUILD_DIR"
+    }
+    # Run on all TUs from compile_commands.json
+    clang_tidy_all() {
+        "$RUN_CLANG_TIDY_BIN" -j "$JOBS" -p "$BUILD_DIR"
+    }
+    clang_tidy_changed() {
+        if git show-ref --verify --quiet refs/remotes/origin/main; then
+            BASE_BRANCH="origin/main"
+        else
+            BASE_BRANCH="main"
+        fi
+        MERGEBASE="$(git merge-base "$BASE_BRANCH" HEAD)"
+        CHANGED_FILES="$(git diff --name-only --diff-filter=ACM "$MERGEBASE" -- '*.c' '*.cc' '*.cpp' '*.h' '*.hpp' 2>/dev/null || true)"
+        if [[ -n "$CHANGED_FILES" ]]; then
+            echo "Running clang-tidy on changed files:"
+            echo "$CHANGED_FILES"
+            # Space-separate and run once (run-clang-tidy handles parallelism)
+            CHANGED_FILES_SPACE="$(echo "$CHANGED_FILES" | tr '\n' ' ')"
+            "$RUN_CLANG_TIDY_BIN" -j "$JOBS" $CHANGED_FILES_SPACE -p "$BUILD_DIR"
+        else
+            echo "No C/C++ files changed. Skipping clang-tidy."
+        fi
+    }
+    if [[ "$1" == '--files' ]]; then
+        clang_tidy_files "${@:2}"
+    elif [[ "$1" == '--all' ]]; then
+        clang_tidy_all
+    else
+        clang_tidy_changed
+    fi
+fi
+echo 'tile-lang clang-tidy: Done'
🤖 Prompt for AI Agents
In format.sh around lines 252 to 318, the clang-tidy block assumes exact binary
names, hardcodes the build dir and file globs, and uses a fixed -j 64; update it
to (1) detect version-suffixed binaries by searching for run-clang-tidy* and
clang-tidy* and pick the first available, storing their names in variables; (2)
allow a configurable build directory (e.g., use BUILD_DIR env var or fallback to
"build") and pass it to -p; (3) for --all prefer invoking run-clang-tidy with
just -p $BUILD_DIR (or if you must supply files, recursively find source files
under src) so all translation units from compile_commands.json are linted; (4)
replace the hardcoded -j 64 with JOBS=${CLANG_TIDY_JOBS:-$(nproc)} so users can
override parallelism; and (5) quote variables and safely convert changed-file
lists to arguments when calling run-clang-tidy.

# Check if there are any uncommitted changes after all formatting steps.
# If there are, ask the user to review and stage them.
if ! git diff --quiet &>/dev/null; then
Expand Down
1 change: 1 addition & 0 deletions requirements-lint.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ tomli==2.0.1
ruff==0.6.5
codespell==2.3.0
clang-format==15.0.7
clang-tidy==18.1.8
75 changes: 39 additions & 36 deletions src/ir.cc
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
#include <tvm/ffi/reflection/registry.h>
#include <tvm/script/ir_builder/tir/ir.h>

#include <utility>

namespace tvm {
namespace tl {

Expand All @@ -19,8 +21,8 @@ using namespace script::ir_builder::tir;
static Var CreateEnvThread(String name, String thread_tag, DataType dtype) {
using namespace tvm::tir;
using namespace tvm::script::ir_builder;
IterVar iter_var(Range{nullptr}, Var(name, dtype),
tvm::tir::IterVarType::kThreadIndex, thread_tag);
IterVar iter_var(Range{nullptr}, Var(std::move(name), dtype),
tvm::tir::IterVarType::kThreadIndex, std::move(thread_tag));
Var var = iter_var->var;
if (Optional<PrimFuncFrame> opt_frame =
IRBuilder::Current()->FindFrame<PrimFuncFrame>()) {
Expand All @@ -31,24 +33,24 @@ static Var CreateEnvThread(String name, String thread_tag, DataType dtype) {
return var;
}

static ForFrame MakeIterVarFrame(std::string name, PrimExpr dom) {
static ForFrame MakeIterVarFrame(const std::string &name, const PrimExpr &dom) {
using namespace tvm::tir;
Var var = Var(name, dom->dtype);
// Create a frame that represents a loop over the given domain.
ObjectPtr<ForFrameNode> n = make_object<ForFrameNode>();
n->vars.push_back(var);
n->doms.push_back(Range(0, dom));
n->f_make_for_loop = [](Array<Var> vars, Array<Range> doms,
Stmt body) -> Stmt {
n->f_make_for_loop = [](const Array<Var> &vars, const Array<Range> &doms,
const Stmt &body) -> Stmt {
ICHECK_EQ(vars.size(), 1);
ICHECK_EQ(doms.size(), 1);
return For(vars[0], doms[0]->min, doms[0]->extent, ForKind::kSerial, body);
};
return ForFrame(n);
}

ForFrame ParallelFor(Array<PrimExpr> extents,
Map<String, ObjectRef> annotations) {
ForFrame ParallelFor(const Array<PrimExpr> &extents,
const Map<String, ObjectRef> &annotations) {
using namespace tvm::tir;
ObjectPtr<ForFrameNode> n = make_object<ForFrameNode>();
n->vars.reserve(extents.size());
Expand All @@ -58,59 +60,59 @@ ForFrame ParallelFor(Array<PrimExpr> extents,
n->vars.push_back(Var("v", extent.dtype()));
n->doms.push_back(Range(make_const(dtype, 0), extent));
}
n->f_make_for_loop = [annotations](Array<Var> vars, Array<Range> doms,
n->f_make_for_loop = [annotations](const Array<Var> &vars,
const Array<Range> &doms,
Stmt body) -> Stmt {
ICHECK_EQ(vars.size(), doms.size());
int n = vars.size();
for (int i = n - 1; i >= 0; --i) {
Range dom = doms[i];
Var var = vars[i];
body =
For(var, dom->min, dom->extent, ForKind::kParallel, std::move(body),
/*thread_binding=*/std::nullopt, /*annotations=*/annotations);
body = For(var, dom->min, dom->extent, ForKind::kParallel, body,
/*thread_binding=*/std::nullopt, /*annotations=*/annotations);
Comment on lines +71 to +72
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The std::move(body) was removed from the For constructor call. Since body is a local variable that is being consumed to create a new Stmt and then reassigned within the loop, std::move is appropriate and more efficient. This change introduces a performance regression due to an unnecessary copy.

      body = For(var, dom->min, dom->extent, ForKind::kParallel, std::move(body),
                 /*thread_binding=*/std::nullopt, /*annotations=*/annotations);

}
return body;
};
return ForFrame(n);
}

ForFrame PipelinedFor(PrimExpr start, PrimExpr stop, int num_stages,
Array<PrimExpr> order, Array<PrimExpr> stages,
Array<Array<PrimExpr>> sync,
Array<Array<PrimExpr>> groups) {
ForFrame PipelinedFor(PrimExpr start, const PrimExpr &stop, int num_stages,
const Array<PrimExpr> &order,
const Array<PrimExpr> &stages,
const Array<Array<PrimExpr>> &sync,
const Array<Array<PrimExpr>> &groups) {
using namespace tvm::tir;
ObjectPtr<ForFrameNode> n = make_object<ForFrameNode>();
DataType dtype = stop.dtype();
n->vars.push_back(Var("v", dtype));
n->doms.push_back(Range(start, stop));
n->f_make_for_loop = [=](Array<Var> vars, Array<Range> doms,
n->doms.push_back(Range(std::move(start), stop));
n->f_make_for_loop = [=](const Array<Var> &vars, const Array<Range> &doms,
Stmt body) -> Stmt {
ICHECK_EQ(vars.size(), doms.size());
int n = vars.size();
ICHECK(n == 1);
Map<String, ObjectRef> anno;
if (num_stages > 0)
anno.Set("num_stages", PrimExpr(num_stages));
if (order.size() > 0)
if (!order.empty())
anno.Set("tl_pipeline_order", order);
if (stages.size() > 0)
if (!stages.empty())
anno.Set("tl_pipeline_stage", stages);
if (sync.size() > 0)
if (!sync.empty())
anno.Set("tl_pipeline_sync", sync);
if (groups.size() > 0)
if (!groups.empty())
anno.Set("tl_pipeline_group", groups);
body = For(vars[0], doms[0]->min, doms[0]->extent, ForKind::kSerial,
std::move(body),
body = For(vars[0], doms[0]->min, doms[0]->extent, ForKind::kSerial, body,
/*thread_binding=*/std::nullopt, /*annotations=*/anno);
Comment on lines +105 to 106
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The std::move(body) was removed from the For constructor call. Since body is a local variable that is being consumed to create a new Stmt, std::move is appropriate and more efficient. This change introduces a performance regression due to an unnecessary copy.

    body = For(vars[0], doms[0]->min, doms[0]->extent, ForKind::kSerial, std::move(body),
               /*thread_binding=*/std::nullopt, /*annotations=*/anno);

return body;
};
return ForFrame(n);
}

ForFrame PersistentFor(Array<PrimExpr> domain, PrimExpr wave_size,
PrimExpr index, PrimExpr group_size) {
ForFrame PersistentFor(const Array<PrimExpr> &domain, const PrimExpr &wave_size,
const PrimExpr &index, PrimExpr group_size) {
using namespace tvm::tir;
ICHECK(domain.size() > 0);
ICHECK(!domain.empty());
ObjectPtr<ForFrameNode> n = make_object<ForFrameNode>();
n->vars.reserve(domain.size());
n->doms.reserve(domain.size());
Expand Down Expand Up @@ -139,8 +141,8 @@ ForFrame PersistentFor(Array<PrimExpr> domain, PrimExpr wave_size,
}
grouped_domain.push_back(group_size);

n->f_make_for_loop = [=](Array<Var> vars, Array<Range> doms,
Stmt body) -> Stmt {
n->f_make_for_loop = [=](const Array<Var> &vars, const Array<Range> &doms,
const Stmt &body) -> Stmt {
ICHECK_EQ(vars.size(), doms.size());
Map<String, ObjectRef> anno;
Array<PrimExpr> idxs(grouped_domain.size(), PrimExpr());
Expand Down Expand Up @@ -220,9 +222,9 @@ class KernelLaunchFrame : public TIRFrame {
KernelLaunchFrameNode);
};

KernelLaunchFrame KernelLaunch(Array<PrimExpr> grid_size,
Optional<Array<PrimExpr>> block_size_opt,
Map<String, ffi::Any> attrs) {
KernelLaunchFrame KernelLaunch(const Array<PrimExpr> &grid_size,
const Optional<Array<PrimExpr>> &block_size_opt,
const Map<String, ffi::Any> &attrs) {
ObjectPtr<KernelLaunchFrameNode> n = make_object<KernelLaunchFrameNode>();

// If the kernel is a CPU kernel, we don't need to launch any threads.
Expand All @@ -234,7 +236,7 @@ KernelLaunchFrame KernelLaunch(Array<PrimExpr> grid_size,
if (is_cpu_kernel_frame) {
// Launch CPU Kernel
ICHECK(grid_size.size() >= 0);
ICHECK(block_size.size() == 0) << "CPU kernel cannot have block size";
ICHECK(block_size.empty()) << "CPU kernel cannot have block size";
ICHECK(attrs.defined());
// create grid loop var
for (int i = 0; i < grid_size.size(); i++) {
Expand All @@ -244,7 +246,7 @@ KernelLaunchFrame KernelLaunch(Array<PrimExpr> grid_size,
} else {
// Launch GPU Kernel
ICHECK(grid_size.size() <= 3);
if (grid_size.size() > 0)
if (!grid_size.empty())
n->frames.push_back(LaunchThread(
CreateEnvThread("bx", "blockIdx.x", grid_size[0].dtype()),
grid_size[0]));
Expand All @@ -258,7 +260,7 @@ KernelLaunchFrame KernelLaunch(Array<PrimExpr> grid_size,
grid_size[2]));
if (block_size.defined()) {
ICHECK(block_size.size() <= 3);
if (block_size.size() > 0) {
if (!block_size.empty()) {
n->frames.push_back(LaunchThread(
CreateEnvThread("tx", "threadIdx.x", block_size[0].dtype()),
block_size[0]));
Expand Down Expand Up @@ -333,12 +335,13 @@ class WarpSpecializeFrame : public TIRFrame {
WarpSpecializeFrameNode);
};

WarpSpecializeFrame WarpSpecialize(Array<IntImm> warp_group_ids,
PrimExpr thread_idx,
WarpSpecializeFrame WarpSpecialize(const Array<IntImm> &warp_group_ids,
const PrimExpr &thread_idx,
int warp_group_size = 128) {
ObjectPtr<WarpSpecializeFrameNode> n = make_object<WarpSpecializeFrameNode>();
PrimExpr condition;
std::vector<int> warp_groups;
warp_groups.reserve(warp_group_ids.size());
for (int i = 0; i < warp_group_ids.size(); i++) {
warp_groups.push_back(Downcast<IntImm>(warp_group_ids[i])->value);
}
Expand Down
2 changes: 0 additions & 2 deletions src/op/atomic_add.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ using namespace tir;

class AtomicAddNode : public TileOperatorNode {
public:
Array<PrimExpr> args_;

Buffer src, dst;
Array<Range> src_range, dst_range;
IntImm coalesced_width;
Expand Down
Loading