[clang-tidy] Speed up deduplicating warnings from alias checks#174237
[clang-tidy] Speed up deduplicating warnings from alias checks#174237localspook merged 2 commits intollvm:mainfrom
Conversation
|
@llvm/pr-subscribers-clang-tools-extra Author: Victor Chernyakin (localspook) ChangesRight now, the deduplication algorithm runs in O(n²) time, because whenever it finds a duplicate warning, it removes it using time ./build/release/bin/clang-tidy -p build/debug --checks=* clang/lib/Sema/Sema.cpp -header-filter=.* > /dev/null...takes 2m28s on my system before this change and 2m6s after. Now, this scenario is a bit artificial; I imagine runs with so many warnings are uncommon in practice. On the other hand, the change is quite small, so... thoughts? Full diff: https://github.com/llvm/llvm-project/pull/174237.diff 1 Files Affected:
diff --git a/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp b/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
index 0355eccc397e5..5718b3f602da9 100644
--- a/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
+++ b/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
@@ -783,38 +783,33 @@ std::vector<ClangTidyError> ClangTidyDiagnosticConsumer::take() {
return std::move(Errors);
}
-namespace {
-struct LessClangTidyErrorWithoutDiagnosticName {
- bool operator()(const ClangTidyError *LHS, const ClangTidyError *RHS) const {
- const tooling::DiagnosticMessage &M1 = LHS->Message;
- const tooling::DiagnosticMessage &M2 = RHS->Message;
-
- return std::tie(M1.FilePath, M1.FileOffset, M1.Message) <
- std::tie(M2.FilePath, M2.FileOffset, M2.Message);
- }
-};
-} // end anonymous namespace
-
void ClangTidyDiagnosticConsumer::removeDuplicatedDiagnosticsOfAliasCheckers() {
- using UniqueErrorSet =
- std::set<ClangTidyError *, LessClangTidyErrorWithoutDiagnosticName>;
- UniqueErrorSet UniqueErrors;
+ if (Errors.size() <= 1)
+ return;
+
+ static constexpr auto Projection = [](const ClangTidyError &E) {
+ const tooling::DiagnosticMessage &M = E.Message;
+ return std::tie(M.FilePath, M.FileOffset, M.Message);
+ };
- auto IT = Errors.begin();
- while (IT != Errors.end()) {
- ClangTidyError &Error = *IT;
- const std::pair<UniqueErrorSet::iterator, bool> Inserted =
- UniqueErrors.insert(&Error);
+ // This orders any duplicates into consecutive runs.
+ llvm::stable_sort(Errors,
+ [](const ClangTidyError &LHS, const ClangTidyError &RHS) {
+ return Projection(LHS) < Projection(RHS);
+ });
+ auto LastUniqueError = Errors.begin();
+ for (ClangTidyError &Error : llvm::drop_begin(Errors, 1)) {
+ ClangTidyError &ExistingError = *LastUniqueError;
// Unique error, we keep it and move along.
- if (Inserted.second) {
- ++IT;
+ if (Projection(Error) != Projection(ExistingError)) {
+ ++LastUniqueError;
+ *LastUniqueError = std::move(Error);
} else {
- ClangTidyError &ExistingError = **Inserted.first;
const llvm::StringMap<tooling::Replacements> &CandidateFix =
Error.Message.Fix;
const llvm::StringMap<tooling::Replacements> &ExistingFix =
- (*Inserted.first)->Message.Fix;
+ ExistingError.Message.Fix;
if (CandidateFix != ExistingFix) {
// In case of a conflict, don't suggest any fix-it.
@@ -833,7 +828,7 @@ void ClangTidyDiagnosticConsumer::removeDuplicatedDiagnosticsOfAliasCheckers() {
// Since it is the same error, we should take it as alias and remove it.
ExistingError.EnabledDiagnosticAliases.emplace_back(Error.DiagnosticName);
- IT = Errors.erase(IT);
}
}
+ Errors.erase(LastUniqueError + 1, Errors.end());
}
|
|
@llvm/pr-subscribers-clang-tidy Author: Victor Chernyakin (localspook) ChangesRight now, the deduplication algorithm runs in O(n²) time, because whenever it finds a duplicate warning, it removes it using time ./build/release/bin/clang-tidy -p build/debug --checks=* clang/lib/Sema/Sema.cpp -header-filter=.* > /dev/null...takes 2m28s on my system before this change and 2m6s after. Now, this scenario is a bit artificial; I imagine runs with so many warnings are uncommon in practice. On the other hand, the change is quite small, so... thoughts? Full diff: https://github.com/llvm/llvm-project/pull/174237.diff 1 Files Affected:
diff --git a/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp b/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
index 0355eccc397e5..5718b3f602da9 100644
--- a/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
+++ b/clang-tools-extra/clang-tidy/ClangTidyDiagnosticConsumer.cpp
@@ -783,38 +783,33 @@ std::vector<ClangTidyError> ClangTidyDiagnosticConsumer::take() {
return std::move(Errors);
}
-namespace {
-struct LessClangTidyErrorWithoutDiagnosticName {
- bool operator()(const ClangTidyError *LHS, const ClangTidyError *RHS) const {
- const tooling::DiagnosticMessage &M1 = LHS->Message;
- const tooling::DiagnosticMessage &M2 = RHS->Message;
-
- return std::tie(M1.FilePath, M1.FileOffset, M1.Message) <
- std::tie(M2.FilePath, M2.FileOffset, M2.Message);
- }
-};
-} // end anonymous namespace
-
void ClangTidyDiagnosticConsumer::removeDuplicatedDiagnosticsOfAliasCheckers() {
- using UniqueErrorSet =
- std::set<ClangTidyError *, LessClangTidyErrorWithoutDiagnosticName>;
- UniqueErrorSet UniqueErrors;
+ if (Errors.size() <= 1)
+ return;
+
+ static constexpr auto Projection = [](const ClangTidyError &E) {
+ const tooling::DiagnosticMessage &M = E.Message;
+ return std::tie(M.FilePath, M.FileOffset, M.Message);
+ };
- auto IT = Errors.begin();
- while (IT != Errors.end()) {
- ClangTidyError &Error = *IT;
- const std::pair<UniqueErrorSet::iterator, bool> Inserted =
- UniqueErrors.insert(&Error);
+ // This orders any duplicates into consecutive runs.
+ llvm::stable_sort(Errors,
+ [](const ClangTidyError &LHS, const ClangTidyError &RHS) {
+ return Projection(LHS) < Projection(RHS);
+ });
+ auto LastUniqueError = Errors.begin();
+ for (ClangTidyError &Error : llvm::drop_begin(Errors, 1)) {
+ ClangTidyError &ExistingError = *LastUniqueError;
// Unique error, we keep it and move along.
- if (Inserted.second) {
- ++IT;
+ if (Projection(Error) != Projection(ExistingError)) {
+ ++LastUniqueError;
+ *LastUniqueError = std::move(Error);
} else {
- ClangTidyError &ExistingError = **Inserted.first;
const llvm::StringMap<tooling::Replacements> &CandidateFix =
Error.Message.Fix;
const llvm::StringMap<tooling::Replacements> &ExistingFix =
- (*Inserted.first)->Message.Fix;
+ ExistingError.Message.Fix;
if (CandidateFix != ExistingFix) {
// In case of a conflict, don't suggest any fix-it.
@@ -833,7 +828,7 @@ void ClangTidyDiagnosticConsumer::removeDuplicatedDiagnosticsOfAliasCheckers() {
// Since it is the same error, we should take it as alias and remove it.
ExistingError.EnabledDiagnosticAliases.emplace_back(Error.DiagnosticName);
- IT = Errors.erase(IT);
}
}
+ Errors.erase(LastUniqueError + 1, Errors.end());
}
|
22267ce to
ed8d0af
Compare
ed8d0af to
59b3946
Compare
🐧 Linux x64 Test Results
All executed tests passed, but another part of the build failed. Click on a failure below to see the details. libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__ /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__ (Likely Already Failing)This test is already failing at the base commit.libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__.__NO_FMA_OPT /home/gha/actions-runner/_work/llvm-project/llvm-project/build/runtimes/runtimes-bins/libc/test/src/math/smoke/CMakeFiles/libc.test.src.math.smoke.log1p_test.__unit__.__NO_FMA_OPT (Likely Already Failing)This test is already failing at the base commit.If these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
|
One test that would be interesting is to run clang-tidy on a .cpp file that includes all C++ standard headers, and enable warnings in system headers, and enable all checks. Run that on trunk and on the patch and compare the results. |
|
Given this file #include <algorithm>
#include <any>
#include <array>
#include <atomic>
#include <barrier>
#include <bit>
#include <bitset>
#include <cassert>
#include <cctype>
#include <cerrno>
#include <cfenv>
#include <cfloat>
#include <charconv>
#include <chrono>
#include <cinttypes>
#include <climits>
#include <clocale>
#include <cmath>
#include <codecvt>
#include <compare>
#include <complex>
#include <concepts>
#include <condition_variable>
#include <coroutine>
#include <csetjmp>
#include <csignal>
#include <cstdarg>
#include <cstddef>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <ctime>
#include <cuchar>
#include <cwchar>
#include <cwctype>
#include <deque>
#include <exception>
#include <execution>
#include <expected>
#include <filesystem>
#include <format>
#include <forward_list>
#include <fstream>
#include <functional>
#include <future>
#include <generator>
#include <initializer_list>
#include <iomanip>
#include <ios>
#include <iosfwd>
#include <iostream>
#include <istream>
#include <iterator>
#include <latch>
#include <limits>
#include <list>
#include <locale>
#include <map>
#include <mdspan>
#include <memory>
#include <memory_resource>
#include <mutex>
#include <new>
#include <numbers>
#include <numeric>
#include <optional>
#include <ostream>
#include <print>
#include <queue>
#include <random>
#include <ranges>
#include <ratio>
#include <regex>
#include <scoped_allocator>
#include <semaphore>
#include <set>
#include <shared_mutex>
#include <source_location>
#include <span>
#include <spanstream>
#include <sstream>
#include <stack>
#include <stacktrace>
#include <stdexcept>
#include <stop_token>
#include <streambuf>
#include <string>
#include <string_view>
#include <strstream>
#include <syncstream>
#include <system_error>
#include <thread>
#include <tuple>
#include <type_traits>
#include <typeindex>
#include <typeinfo>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <valarray>
#include <variant>
#include <vector>
#include <version>I ran time ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 > /dev/nullwhich generated 134090 warnings and took 1m22s before this change and 51s after (this is with the MSVC STL). |
…174237) Right now, the deduplication algorithm runs in O(n²) time, because it goes warning-by-warning (an O(n) operation), removing duplicates using `std::vector::erase` (another O(n) operation). This starts taking up a noticeable amount of time as you start getting a lot of warnings. For example, running all checks over `clang/lib/Sema/Sema.cpp` and the headers it includes: ```sh time ./build/release/bin/clang-tidy -p build/debug --checks=* clang/lib/Sema/Sema.cpp -header-filter=.* > /dev/null ``` ...takes 2m9s on my system before this change and 1m52s after. Now, this scenario *is* a bit artificial; I imagine runs with so many warnings are uncommon in practice. On the other hand, the change is quite small, so we're not really going out of our way to improve it.
…es (#174357) The LLVM docs give a good description of [why `std::` containers are slower than LLVM alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see what difference switching to the LLVM ones made, I [reused the approach](#174237 (comment)) of measuring how long it takes to run all checks over all standard library headers (MSVC STL in my case). Using hyperfine (which basically runs a program multiple times and computes how long it took): ```sh hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23' ``` ...the results were: Before: ``` Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 53.253 s ± 0.089 s [User: 46.480 s, System: 6.748 s] Range (min … max): 53.118 s … 53.440 s 10 runs ``` After: ```txt Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 51.798 s ± 0.126 s [User: 45.194 s, System: 6.575 s] Range (min … max): 51.620 s … 51.995 s 10 runs ``` ...which is a nice little speedup for just switching some containers. I didn't investigate which checks in particular were the source of the speedup though.
… `std::` ones (#174357) The LLVM docs give a good description of [why `std::` containers are slower than LLVM alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see what difference switching to the LLVM ones made, I [reused the approach](llvm/llvm-project#174237 (comment)) of measuring how long it takes to run all checks over all standard library headers (MSVC STL in my case). Using hyperfine (which basically runs a program multiple times and computes how long it took): ```sh hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23' ``` ...the results were: Before: ``` Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 53.253 s ± 0.089 s [User: 46.480 s, System: 6.748 s] Range (min … max): 53.118 s … 53.440 s 10 runs ``` After: ```txt Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 51.798 s ± 0.126 s [User: 45.194 s, System: 6.575 s] Range (min … max): 51.620 s … 51.995 s 10 runs ``` ...which is a nice little speedup for just switching some containers. I didn't investigate which checks in particular were the source of the speedup though.
…es (llvm#174357) The LLVM docs give a good description of [why `std::` containers are slower than LLVM alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see what difference switching to the LLVM ones made, I [reused the approach](llvm#174237 (comment)) of measuring how long it takes to run all checks over all standard library headers (MSVC STL in my case). Using hyperfine (which basically runs a program multiple times and computes how long it took): ```sh hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23' ``` ...the results were: Before: ``` Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 53.253 s ± 0.089 s [User: 46.480 s, System: 6.748 s] Range (min … max): 53.118 s … 53.440 s 10 runs ``` After: ```txt Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 51.798 s ± 0.126 s [User: 45.194 s, System: 6.575 s] Range (min … max): 51.620 s … 51.995 s 10 runs ``` ...which is a nice little speedup for just switching some containers. I didn't investigate which checks in particular were the source of the speedup though.
…es (#174357) The LLVM docs give a good description of [why `std::` containers are slower than LLVM alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see what difference switching to the LLVM ones made, I [reused the approach](llvm/llvm-project#174237 (comment)) of measuring how long it takes to run all checks over all standard library headers (MSVC STL in my case). Using hyperfine (which basically runs a program multiple times and computes how long it took): ```sh hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23' ``` ...the results were: Before: ``` Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 53.253 s ± 0.089 s [User: 46.480 s, System: 6.748 s] Range (min … max): 53.118 s … 53.440 s 10 runs ``` After: ```txt Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 51.798 s ± 0.126 s [User: 45.194 s, System: 6.575 s] Range (min … max): 51.620 s … 51.995 s 10 runs ``` ...which is a nice little speedup for just switching some containers. I didn't investigate which checks in particular were the source of the speedup though. (cherry picked from commit 6506f92)
…es (llvm#174357) The LLVM docs give a good description of [why `std::` containers are slower than LLVM alternatives](https://llvm.org/docs/ProgrammersManual.html#set). To see what difference switching to the LLVM ones made, I [reused the approach](llvm#174237 (comment)) of measuring how long it takes to run all checks over all standard library headers (MSVC STL in my case). Using hyperfine (which basically runs a program multiple times and computes how long it took): ```sh hyperfine --shell=none './build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23' ``` ...the results were: Before: ``` Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 53.253 s ± 0.089 s [User: 46.480 s, System: 6.748 s] Range (min … max): 53.118 s … 53.440 s 10 runs ``` After: ```txt Benchmark 1: ./build/release/bin/clang-tidy --checks=* all_headers.cpp -header-filter=.* -system-headers -- -std=c++23 Time (mean ± σ): 51.798 s ± 0.126 s [User: 45.194 s, System: 6.575 s] Range (min … max): 51.620 s … 51.995 s 10 runs ``` ...which is a nice little speedup for just switching some containers. I didn't investigate which checks in particular were the source of the speedup though.
Right now, the deduplication algorithm runs in O(n²) time, because it goes warning-by-warning (an O(n) operation), removing duplicates using
std::vector::erase(another O(n) operation). This starts taking up a noticeable amount of time as you start getting a lot of warnings. For example, running all checks overclang/lib/Sema/Sema.cppand the headers it includes:...takes 2m9s on my system before this change and 1m52s after. Now, this scenario is a bit artificial; I imagine runs with so many warnings are uncommon in practice. On the other hand, the change is quite small, so... thoughts?