Skip to content

[clang-tidy] Speed up deduplicating warnings from alias checks#174237

Merged
localspook merged 2 commits intollvm:mainfrom
localspook:optimize-error-deduplication
Jan 4, 2026
Merged

[clang-tidy] Speed up deduplicating warnings from alias checks#174237
localspook merged 2 commits intollvm:mainfrom
localspook:optimize-error-deduplication

Conversation

@localspook
Copy link
Contributor

@localspook localspook commented Jan 2, 2026

Right now, the deduplication algorithm runs in O(n²) time, because it goes warning-by-warning (an O(n) operation), removing duplicates using std::vector::erase (another O(n) operation). This starts taking up a noticeable amount of time as you start getting a lot of warnings. For example, running all checks over clang/lib/Sema/Sema.cpp and the headers it includes:

time ./build/release/bin/clang-tidy -p build/debug --checks=* clang/lib/Sema/Sema.cpp -header-filter=.* > /dev/null

...takes 2m9s on my system before this change and 1m52s after. Now, this scenario is a bit artificial; I imagine runs with so many warnings are uncommon in practice. On the other hand, the change is quite small, so... thoughts?

@llvmbot
Copy link
Member

llvmbot commented Jan 2, 2026

@llvm/pr-subscribers-clang-tools-extra

Author: Victor Chernyakin (localspook)

Changes

Right now, the deduplication algorithm runs in O(n²) time, because whenever it finds a duplicate warning, it removes it using std::vector::erase (which is itself O(n)). This starts taking up a noticeable amount of time as you start getting a lot of warnings. For example, running all checks over clang/lib/Sema/Sema.cpp and the headers it includes:

time ./build/release/bin/clang-tidy -p build/debug --checks=* clang/lib/Sema/Sema.cpp -header-filter=.* > /dev/null

...takes 2m28s on my system before this change and 2m6s after. Now, this scenario is a bit artificial; I imagine runs with so many warnings are uncommon in practice. On the other hand, the change is quite small, so... thoughts?


Full diff: https://github.com/llvm/llvm-project/pull/174237.diff

1 Files Affected:

  • (modified) clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp (+20-25)
diff --git a/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp b/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
index 0355eccc397e5..5718b3f602da9 100644
--- a/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
+++ b/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
@@ -783,38 +783,33 @@ std::vector<ClangTidyError> ClangTidyDiagnosticConsumer::take() {
   return std::move(Errors);
 }
 
-namespace {
-struct LessClangTidyErrorWithoutDiagnosticName {
-  bool operator()(const ClangTidyError *LHS, const ClangTidyError *RHS) const {
-    const tooling::DiagnosticMessage &M1 = LHS->Message;
-    const tooling::DiagnosticMessage &M2 = RHS->Message;
-
-    return std::tie(M1.FilePath, M1.FileOffset, M1.Message) <
-           std::tie(M2.FilePath, M2.FileOffset, M2.Message);
-  }
-};
-} // end anonymous namespace
-
 void ClangTidyDiagnosticConsumer::removeDuplicatedDiagnosticsOfAliasCheckers() {
-  using UniqueErrorSet =
-      std::set<ClangTidyError *, LessClangTidyErrorWithoutDiagnosticName>;
-  UniqueErrorSet UniqueErrors;
+  if (Errors.size() <= 1)
+    return;
+
+  static constexpr auto Projection = [](const ClangTidyError &E) {
+    const tooling::DiagnosticMessage &M = E.Message;
+    return std::tie(M.FilePath, M.FileOffset, M.Message);
+  };
 
-  auto IT = Errors.begin();
-  while (IT != Errors.end()) {
-    ClangTidyError &Error = *IT;
-    const std::pair<UniqueErrorSet::iterator, bool> Inserted =
-        UniqueErrors.insert(&Error);
+  // This orders any duplicates into consecutive runs.
+  llvm::stable_sort(Errors,
+                    [](const ClangTidyError &LHS, const ClangTidyError &RHS) {
+                      return Projection(LHS) < Projection(RHS);
+                    });
 
+  auto LastUniqueError = Errors.begin();
+  for (ClangTidyError &Error : llvm::drop_begin(Errors, 1)) {
+    ClangTidyError &ExistingError = *LastUniqueError;
     // Unique error, we keep it and move along.
-    if (Inserted.second) {
-      ++IT;
+    if (Projection(Error) != Projection(ExistingError)) {
+      ++LastUniqueError;
+      *LastUniqueError = std::move(Error);
     } else {
-      ClangTidyError &ExistingError = **Inserted.first;
       const llvm::StringMap<tooling::Replacements> &CandidateFix =
           Error.Message.Fix;
       const llvm::StringMap<tooling::Replacements> &ExistingFix =
-          (*Inserted.first)->Message.Fix;
+          ExistingError.Message.Fix;
 
       if (CandidateFix != ExistingFix) {
         // In case of a conflict, don't suggest any fix-it.
@@ -833,7 +828,7 @@ void ClangTidyDiagnosticConsumer::removeDuplicatedDiagnosticsOfAliasCheckers() {
 
       // Since it is the same error, we should take it as alias and remove it.
       ExistingError.EnabledDiagnosticAliases.emplace_back(Error.DiagnosticName);
-      IT = Errors.erase(IT);
     }
   }
+  Errors.erase(LastUniqueError + 1, Errors.end());
 }

@llvmbot
Copy link
Member

llvmbot commented Jan 2, 2026

@llvm/pr-subscribers-clang-tidy

Author: Victor Chernyakin (localspook)

Changes

Right now, the deduplication algorithm runs in O(n²) time, because whenever it finds a duplicate warning, it removes it using std::vector::erase (which is itself O(n)). This starts taking up a noticeable amount of time as you start getting a lot of warnings. For example, running all checks over clang/lib/Sema/Sema.cpp and the headers it includes:

time ./build/release/bin/clang-tidy -p build/debug --checks=* clang/lib/Sema/Sema.cpp -header-filter=.* &gt; /dev/null

...takes 2m28s on my system before this change and 2m6s after. Now, this scenario is a bit artificial; I imagine runs with so many warnings are uncommon in practice. On the other hand, the change is quite small, so... thoughts?


Full diff: https://github.com/llvm/llvm-project/pull/174237.diff

1 Files Affected:

  • (modified) clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp (+20-25)
diff --git a/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp b/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
index 0355eccc397e5..5718b3f602da9 100644
--- a/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
+++ b/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
@@ -783,38 +783,33 @@ std::vector<ClangTidyError> ClangTidyDiagnosticConsumer::take() {
   return std::move(Errors);
 }
 
-namespace {
-struct LessClangTidyErrorWithoutDiagnosticName {
-  bool operator()(const ClangTidyError *LHS, const ClangTidyError *RHS) const {
-    const tooling::DiagnosticMessage &M1 = LHS->Message;
-    const tooling::DiagnosticMessage &M2 = RHS->Message;
-
-    return std::tie(M1.FilePath, M1.FileOffset, M1.Message) <
-           std::tie(M2.FilePath, M2.FileOffset, M2.Message);
-  }
-};
-} // end anonymous namespace
-
 void ClangTidyDiagnosticConsumer::removeDuplicatedDiagnosticsOfAliasCheckers() {
-  using UniqueErrorSet =
-      std::set<ClangTidyError *, LessClangTidyErrorWithoutDiagnosticName>;
-  UniqueErrorSet UniqueErrors;
+  if (Errors.size() <= 1)
+    return;
+
+  static constexpr auto Projection = [](const ClangTidyError &E) {
+    const tooling::DiagnosticMessage &M = E.Message;
+    return std::tie(M.FilePath, M.FileOffset, M.Message);
+  };
 
-  auto IT = Errors.begin();
-  while (IT != Errors.end()) {
-    ClangTidyError &Error = *IT;
-    const std::pair<UniqueErrorSet::iterator, bool> Inserted =
-        UniqueErrors.insert(&Error);
+  // This orders any duplicates into consecutive runs.
+  llvm::stable_sort(Errors,
+                    [](const ClangTidyError &LHS, const ClangTidyError &RHS) {
+                      return Projection(LHS) < Projection(RHS);
+                    });
 
+  auto LastUniqueError = Errors.begin();
+  for (ClangTidyError &Error : llvm::drop_begin(Errors, 1)) {
+    ClangTidyError &ExistingError = *LastUniqueError;
     // Unique error, we keep it and move along.
-    if (Inserted.second) {
-      ++IT;
+    if (Projection(Error) != Projection(ExistingError)) {
+      ++LastUniqueError;
+      *LastUniqueError = std::move(Error);
     } else {
-      ClangTidyError &ExistingError = **Inserted.first;
       const llvm::StringMap<tooling::Replacements> &CandidateFix =
           Error.Message.Fix;
       const llvm::StringMap<tooling::Replacements> &ExistingFix =
-          (*Inserted.first)->Message.Fix;
+          ExistingError.Message.Fix;
 
       if (CandidateFix != ExistingFix) {
         // In case of a conflict, don't suggest any fix-it.
@@ -833,7 +828,7 @@ void ClangTidyDiagnosticConsumer::removeDuplicatedDiagnosticsOfAliasCheckers() {
 
       // Since it is the same error, we should take it as alias and remove it.
       ExistingError.EnabledDiagnosticAliases.emplace_back(Error.DiagnosticName);
-      IT = Errors.erase(IT);
     }
   }
+  Errors.erase(LastUniqueError + 1, Errors.end());
 }

@localspook localspook marked this pull request as draft January 2, 2026 19:54
@localspook localspook force-pushed the optimize-error-deduplication branch 2 times, most recently from 22267ce to ed8d0af Compare January 2, 2026 21:10
@localspook localspook force-pushed the optimize-error-deduplication branch from ed8d0af to 59b3946 Compare January 2, 2026 21:47
@localspook localspook marked this pull request as ready for review January 2, 2026 22:00
@github-actions
Copy link

github-actions bot commented Jan 3, 2026

🐧 Linux x64 Test Results

  • 3076 tests passed
  • 7 tests skipped

All executed tests passed, but another part of the build failed. Click on a failure below to see the details.

libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__ /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__ (Likely Already Failing) This test is already failing at the base commit.
FAILED: libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__ /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__
cd /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke && /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke/libc.test.src.math.smoke.log1p_test.__unit__.__build__
libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__.__NO_FMA_OPT /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__.__NO_FMA_OPT (Likely Already Failing) This test is already failing at the base commit.
FAILED: libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__.__NO_FMA_OPT /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__.__NO_FMA_OPT
cd /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke && /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke/libc.test.src.math.smoke.log1p_test.__unit__.__NO_FMA_OPT.__build__

If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the infrastructure label.

@carlosgalvezp
Copy link
Contributor

One test that would be interesting is to run clang-tidy on a .cpp file that includes all C++ standard headers, and enable warnings in system headers, and enable all checks.

Run that on trunk and on the patch and compare the results.

@localspook
Copy link
Contributor Author

Given this file all_headers.cpp:

#include <algorithm>
#include <any>
#include <array>
#include <atomic>
#include <barrier>
#include <bit>
#include <bitset>
#include <cassert>
#include <cctype>
#include <cerrno>
#include <cfenv>
#include <cfloat>
#include <charconv>
#include <chrono>
#include <cinttypes>
#include <climits>
#include <clocale>
#include <cmath>
#include <codecvt>
#include <compare>
#include <complex>
#include <concepts>
#include <condition_variable>
#include <coroutine>
#include <csetjmp>
#include <csignal>
#include <cstdarg>
#include <cstddef>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <ctime>
#include <cuchar>
#include <cwchar>
#include <cwctype>
#include <deque>
#include <exception>
#include <execution>
#include <expected>
#include <filesystem>
#include <format>
#include <forward_list>
#include <fstream>
#include <functional>
#include <future>
#include <generator>
#include <initializer_list>
#include <iomanip>
#include <ios>
#include <iosfwd>
#include <iostream>
#include <istream>
#include <iterator>
#include <latch>
#include <limits>
#include <list>
#include <locale>
#include <map>
#include <mdspan>
#include <memory>
#include <memory_resource>
#include <mutex>
#include <new>
#include <numbers>
#include <numeric>
#include <optional>
#include <ostream>
#include <print>
#include <queue>
#include <random>
#include <ranges>
#include <ratio>
#include <regex>
#include <scoped_allocator>
#include <semaphore>
#include <set>
#include <shared_mutex>
#include <source_location>
#include <span>
#include <spanstream>
#include <sstream>
#include <stack>
#include <stacktrace>
#include <stdexcept>
#include <stop_token>
#include <streambuf>
#include <string>
#include <string_view>
#include <strstream>
#include <syncstream>
#include <system_error>
#include <thread>
#include <tuple>
#include <type_traits>
#include <typeindex>
#include <typeinfo>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <valarray>
#include <variant>
#include <vector>
#include <version>

I ran

time ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 > /dev/null

which generated 134090 warnings and took 1m22s before this change and 51s after (this is with the MSVC STL).

@localspook localspook merged commit fd0bf4a into llvm:main Jan 4, 2026
11 checks passed
@localspook localspook deleted the optimize-error-deduplication branch January 4, 2026 21:42
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Jan 6, 2026
…174237)

Right now, the deduplication algorithm runs in O(n²) time, because it
goes warning-by-warning (an O(n) operation), removing duplicates using
`std::vector::erase` (another O(n) operation). This starts taking up a
noticeable amount of time as you start getting a lot of warnings. For
example, running all checks over `clang/lib/Sema/Sema.cpp` and the
headers it includes:

```sh
time ./build/release/bin/clang-tidy -p build/debug --checks=* clang/lib/Sema/Sema.cpp -header-filter=.* > /dev/null
```

...takes 2m9s on my system before this change and 1m52s after. Now, this
scenario *is* a bit artificial; I imagine runs with so many warnings are
uncommon in practice. On the other hand, the change is quite small,
so we're not really going out of our way to improve it.
localspook added a commit that referenced this pull request Jan 8, 2026
…es (#174357)

The LLVM docs give a good description of [why `std::` containers are
slower than LLVM
alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see
what difference switching to the LLVM ones made, I [reused the
approach](#174237 (comment))
of measuring how long it takes to run all checks over all standard
library headers (MSVC STL in my case). Using hyperfine (which basically
runs a program multiple times and computes how long it took):
```sh
hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23'
```
...the results were:

Before:
```
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     53.253 s ±  0.089 s    [User: 46.480 s, System: 6.748 s]
  Range (min … max):   53.118 s … 53.440 s    10 runs
```
After:
```txt
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     51.798 s ±  0.126 s    [User: 45.194 s, System: 6.575 s]
  Range (min … max):   51.620 s … 51.995 s    10 runs
```

...which is a nice little speedup for just switching some containers. I
didn't investigate which checks in particular were the source of the
speedup though.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 8, 2026
… `std::` ones (#174357)

The LLVM docs give a good description of [why `std::` containers are
slower than LLVM
alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see
what difference switching to the LLVM ones made, I [reused the
approach](llvm/llvm-project#174237 (comment))
of measuring how long it takes to run all checks over all standard
library headers (MSVC STL in my case). Using hyperfine (which basically
runs a program multiple times and computes how long it took):
```sh
hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23'
```
...the results were:

Before:
```
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     53.253 s ±  0.089 s    [User: 46.480 s, System: 6.748 s]
  Range (min … max):   53.118 s … 53.440 s    10 runs
```
After:
```txt
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     51.798 s ±  0.126 s    [User: 45.194 s, System: 6.575 s]
  Range (min … max):   51.620 s … 51.995 s    10 runs
```

...which is a nice little speedup for just switching some containers. I
didn't investigate which checks in particular were the source of the
speedup though.
kshitijvp pushed a commit to kshitijvp/llvm-project that referenced this pull request Jan 9, 2026
…es (llvm#174357)

The LLVM docs give a good description of [why `std::` containers are
slower than LLVM
alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see
what difference switching to the LLVM ones made, I [reused the
approach](llvm#174237 (comment))
of measuring how long it takes to run all checks over all standard
library headers (MSVC STL in my case). Using hyperfine (which basically
runs a program multiple times and computes how long it took):
```sh
hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23'
```
...the results were:

Before:
```
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     53.253 s ±  0.089 s    [User: 46.480 s, System: 6.748 s]
  Range (min … max):   53.118 s … 53.440 s    10 runs
```
After:
```txt
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     51.798 s ±  0.126 s    [User: 45.194 s, System: 6.575 s]
  Range (min … max):   51.620 s … 51.995 s    10 runs
```

...which is a nice little speedup for just switching some containers. I
didn't investigate which checks in particular were the source of the
speedup though.
navaneethshan pushed a commit to qualcomm/cpullvm-toolchain that referenced this pull request Jan 14, 2026
…es (#174357)

The LLVM docs give a good description of [why `std::` containers are
slower than LLVM
alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see
what difference switching to the LLVM ones made, I [reused the
approach](llvm/llvm-project#174237 (comment))
of measuring how long it takes to run all checks over all standard
library headers (MSVC STL in my case). Using hyperfine (which basically
runs a program multiple times and computes how long it took):
```sh
hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23'
```
...the results were:

Before:
```
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     53.253 s ±  0.089 s    [User: 46.480 s, System: 6.748 s]
  Range (min … max):   53.118 s … 53.440 s    10 runs
```
After:
```txt
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     51.798 s ±  0.126 s    [User: 45.194 s, System: 6.575 s]
  Range (min … max):   51.620 s … 51.995 s    10 runs
```

...which is a nice little speedup for just switching some containers. I
didn't investigate which checks in particular were the source of the
speedup though.

(cherry picked from commit 6506f92)
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
…es (llvm#174357)

The LLVM docs give a good description of [why `std::` containers are
slower than LLVM
alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see
what difference switching to the LLVM ones made, I [reused the
approach](llvm#174237 (comment))
of measuring how long it takes to run all checks over all standard
library headers (MSVC STL in my case). Using hyperfine (which basically
runs a program multiple times and computes how long it took):
```sh
hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23'
```
...the results were:

Before:
```
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     53.253 s ±  0.089 s    [User: 46.480 s, System: 6.748 s]
  Range (min … max):   53.118 s … 53.440 s    10 runs
```
After:
```txt
Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23
  Time (mean ± σ):     51.798 s ±  0.126 s    [User: 45.194 s, System: 6.575 s]
  Range (min … max):   51.620 s … 51.995 s    10 runs
```

...which is a nice little speedup for just switching some containers. I
didn't investigate which checks in particular were the source of the
speedup though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants