Skip to content

Commit 88ea381

Browse files
vitalybukadvbuka
authored andcommitted
[SpecialCaseList] Filtering Globs with matching prefix and suffix (llvm#164543)
This commit enhances the `SpecialCaseList::GlobMatcher` to filter globs more efficiently by considering both prefixes and suffixes. Previously, the `GlobMatcher` used a `RadixTree` to store globs based on their prefixes. This allowed for quick lookup of potential matches by matching the query string's prefix against the stored prefixes. However, for globs with common prefixes but different suffixes, unnecessary glob matching attempts could still occur. This change introduces a nested `RadixTree` structure: `PrefixSuffixToGlob: RadixTree<Prefix, RadixTree<Suffix, Globs>>`. Now, when a query string is matched, it first finds matching prefixes, and then within those prefix matches, it further filters by matching the reversed suffix of the query string against the reversed suffixes of the globs. This significantly reduces the number of `Glob::match` calls, especially for large special case lists with many globs sharing common prefixes but differing in their suffixes. According to SpecialCaseListBM: Lookup benchmarks (significant improvements): ``` OVERALL_GEOMEAN -0.5815 ``` Lookup `*suffix` and `prefix*suffix` like benchmarks (huge improvements): ``` OVERALL_GEOMEAN -0.9316 ``` https://gist.github.com/vitalybuka/e586751902760ced6beefcdf0d7b26fd
1 parent c2e380d commit 88ea381

File tree

2 files changed

+18
-13
lines changed

2 files changed

+18
-13
lines changed

llvm/include/llvm/Support/SpecialCaseList.h

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,8 +167,9 @@ class SpecialCaseList {
167167
std::vector<GlobMatcher::Glob> Globs;
168168

169169
RadixTree<iterator_range<StringRef::const_iterator>,
170-
SmallVector<const GlobMatcher::Glob *, 1>>
171-
PrefixToGlob;
170+
RadixTree<iterator_range<StringRef::const_reverse_iterator>,
171+
SmallVector<const GlobMatcher::Glob *, 1>>>
172+
PrefixSuffixToGlob;
172173
};
173174

174175
/// Represents a set of patterns and their line numbers

llvm/lib/Support/SpecialCaseList.cpp

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -92,25 +92,29 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
9292

9393
for (const auto &G : reverse(Globs)) {
9494
StringRef Prefix = G.Pattern.prefix();
95+
StringRef Suffix = G.Pattern.suffix();
9596

96-
auto &V = PrefixToGlob.emplace(Prefix).first->second;
97+
auto &SToGlob = PrefixSuffixToGlob.emplace(Prefix).first->second;
98+
auto &V = SToGlob.emplace(reverse(Suffix)).first->second;
9799
V.emplace_back(&G);
98100
}
99101
}
100102

101103
void SpecialCaseList::GlobMatcher::match(
102104
StringRef Query,
103105
llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
104-
if (!PrefixToGlob.empty()) {
105-
for (const auto &[_, V] : PrefixToGlob.find_prefixes(Query)) {
106-
for (const auto *G : V) {
107-
if (G->Pattern.match(Query)) {
108-
Cb(G->Name, G->LineNo);
109-
// As soon as we find a match in the vector, we can break for this
110-
// vector, since the globs are already sorted by priority within the
111-
// prefix group. However, we continue searching other prefix groups in
112-
// the map, as they may contain a better match overall.
113-
break;
106+
if (!PrefixSuffixToGlob.empty()) {
107+
for (const auto &[_, SToGlob] : PrefixSuffixToGlob.find_prefixes(Query)) {
108+
for (const auto &[_, V] : SToGlob.find_prefixes(reverse(Query))) {
109+
for (const auto *G : V) {
110+
if (G->Pattern.match(Query)) {
111+
Cb(G->Name, G->LineNo);
112+
// As soon as we find a match in the vector, we can break for this
113+
// vector, since the globs are already sorted by priority within the
114+
// prefix group. However, we continue searching other prefix groups
115+
// in the map, as they may contain a better match overall.
116+
break;
117+
}
114118
}
115119
}
116120
}

0 commit comments

Comments
 (0)