Skip to content

Escape char not handled for patterns categorised as exact match #69851

@ywangd

Description

@ywangd

In #36017, we opitmised index matching performance by splitting the patterns into two categories: exact match and non-exact (wildcard) match. Set.contains was used for exact matches and Automaton is used for non-exact matches. The issue occurs when a pattern contains the escape char (/), which is handled specially (basically gets dropped) when building the automaton, but passed through unchanged when building the exact match Set.

For example:

  • The pattern ab\c and ab\c* do not both match ab\c. In fact, only the former exact match pattern ab\c does. The pattern ab\c* result in an automaton that matches abc*.
  • Similarly, the pattern abc\ matches exactly abc\ while abc\* matches exactly abc*.
  • Also, between patterns abc\ and a*c\, only the former matches abc\ while the later cannot match any string that ends with a \.

In addition to the above bug, there is also a tiny missed opportunity for optimisation:

if (pattern.startsWith("/") || pattern.contains("*") || pattern.contains("?")) {
nonExactMatch.add(pattern);
} else {
exactMatch.add(pattern);
}

If the chars * or ? is immediately after an escape char (which itself is not immediately after an escape), the pattern is in fact an exact instead of a non-exact match.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions