-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
In #36017, we opitmised index matching performance by splitting the patterns into two categories: exact match and non-exact (wildcard) match. Set.contains was used for exact matches and Automaton is used for non-exact matches. The issue occurs when a pattern contains the escape char (/), which is handled specially (basically gets dropped) when building the automaton, but passed through unchanged when building the exact match Set.
For example:
- The pattern
ab\candab\c*do not both matchab\c. In fact, only the former exact match patternab\cdoes. The patternab\c*result in an automaton that matchesabc*. - Similarly, the pattern
abc\matches exactlyabc\whileabc\*matches exactlyabc*. - Also, between patterns
abc\anda*c\, only the former matchesabc\while the later cannot match any string that ends with a\.
In addition to the above bug, there is also a tiny missed opportunity for optimisation:
Lines 93 to 97 in c7ad737
| if (pattern.startsWith("/") || pattern.contains("*") || pattern.contains("?")) { | |
| nonExactMatch.add(pattern); | |
| } else { | |
| exactMatch.add(pattern); | |
| } |
If the chars * or ? is immediately after an escape char (which itself is not immediately after an escape), the pattern is in fact an exact instead of a non-exact match.