- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[SpecialCaseList] Filtering Globs with matching prefix #164531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/vitalybuka/spr/main.specialcaselist-filtering-globs-with-matching-prefix
Are you sure you want to change the base?
Conversation
Created using spr 1.3.7
| @llvm/pr-subscribers-llvm-support Author: Vitaly Buka (vitalybuka) ChangesThis commit optimizes  Full diff: https://github.com/llvm/llvm-project/pull/164531.diff 2 Files Affected: 
 diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index ead765562504d..16f309329a0b5 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -13,10 +13,13 @@
 #define LLVM_SUPPORT_SPECIALCASELIST_H
 
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringMap.h"
+#include "llvm/ADT/iterator_range.h"
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/GlobPattern.h"
+#include "llvm/Support/RadixTree.h"
 #include "llvm/Support/Regex.h"
 #include <memory>
 #include <string>
@@ -162,6 +165,10 @@ class SpecialCaseList {
     };
 
     std::vector<GlobMatcher::Glob> Globs;
+
+    RadixTree<iterator_range<StringRef::const_iterator>,
+              SmallVector<const GlobMatcher::Glob *, 1>>
+        PrefixToGlob;
   };
 
   /// Represents a set of patterns and their line numbers
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index f74e52a3a7fa9..2a86cc37b6000 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -89,14 +89,28 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
       return A.Name.size() < B.Name.size();
     });
   }
+
+  for (auto &G : Globs) {
+    StringRef Prefix = G.Pattern.prefix();
+
+    auto &V = PrefixToGlob.emplace(Prefix).first->second;
+    V.emplace_back(&G);
+  }
 }
 
 void SpecialCaseList::GlobMatcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
-  for (const auto &G : reverse(Globs))
-    if (G.Pattern.match(Query))
-      return Cb(G.Name, G.LineNo);
+  if (!PrefixToGlob.empty()) {
+    for (const auto &[_, V] : PrefixToGlob.find_prefixes(Query)) {
+      for (const auto *G : reverse(V)) {
+        if (G->Pattern.match(Query)) {
+          Cb(G->Name, G->LineNo);
+          break;
+        }
+      }
+    }
+  }
 }
 
 SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
 | 
| return Cb(G.Name, G.LineNo); | ||
| if (!PrefixToGlob.empty()) { | ||
| for (const auto &[_, V] : PrefixToGlob.find_prefixes(Query)) { | ||
| for (const auto *G : reverse(V)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment to explain the reverse
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * #164531 * #164543 * #164545 --------- Co-authored-by: Kazu Hirata <[email protected]> Co-authored-by: Copilot <[email protected]>
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * llvm/llvm-project#164531 * llvm/llvm-project#164543 * llvm/llvm-project#164545 --------- Co-authored-by: Kazu Hirata <[email protected]> Co-authored-by: Copilot <[email protected]>
This commit optimizes
SpecialCaseListby using aRadixTreeto filterglob patterns based on their prefixes. When matching a query, the
RadixTreequickly identifies all glob patterns whose prefixes matchthe query's prefix. This significantly reduces the number of glob
patterns that need to be fully evaluated, leading to performance
improvements, especially when dealing with a large number of patterns.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
Lookup like
prefix*benchmarks (huge improvements):https://gist.github.com/vitalybuka/824884bcbc1713e815068c279159dafe