Skip to content

Improve LIKE performance#13479

Merged
martint merged 1 commit intotrinodb:masterfrom
martint:like
Aug 25, 2022
Merged

Improve LIKE performance#13479
martint merged 1 commit intotrinodb:masterfrom
martint:like

Conversation

@martint
Copy link
Member

@martint martint commented Aug 3, 2022

Implements a dedicated NFA/DFA-based matcher for LIKE expressions.

Before:

  (pattern)  Mode  Cnt    Score    Error  Units
          %  avgt   90  249.201 ±  1.761  ns/op
         _%  avgt   90  290.866 ±  0.947  ns/op
         %_  avgt   90  299.377 ±  2.079  ns/op
       abc%  avgt   90  244.724 ±  2.004  ns/op
       %abc  avgt   90  139.165 ±  7.407  ns/op
      _____  avgt   90   25.508 ±  0.161  ns/op
abc%def%ghi  avgt   90  209.570 ± 11.658  ns/op
  %abc%def%  avgt   90  447.848 ±  0.739  ns/op

After:

          %  avgt   90    3.989 ±  0.012  ns/op
         _%  avgt   90    3.969 ±  0.014  ns/op
         %_  avgt   90    3.958 ±  0.013  ns/op
       abc%  avgt   90    5.501 ±  0.019  ns/op
       %abc  avgt   90    5.861 ±  0.015  ns/op
      _____  avgt   90    7.096 ±  0.028  ns/op
abc%def%ghi  avgt   90   84.293 ±  0.623  ns/op
  %abc%def%  avgt   90  152.031 ±  0.247  ns/op

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# General
* Improve performance of `LIKE` expressions. ({issue}`13479`)

@cla-bot cla-bot bot added the cla-signed label Aug 3, 2022
@martint martint force-pushed the like branch 2 times, most recently from e2ebef6 to 8c3e370 Compare August 3, 2022 15:31
@martint martint marked this pull request as ready for review August 4, 2022 07:48
@martint martint requested review from dain and phd3 August 5, 2022 17:46
Copy link
Member

@jirassimok jirassimok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through how this PR works, I think there's something of an extra step here in the construction of the DFA. It looks like the DFA class is only used as a transitory state between NFA and DenseDfaMatcher, the latter of which actually functions as a DFA.

I wonder if replacing the current DFA with a DFA builder and renaming DenseDfaMatcher to DFA might be a better idea.

Either way, I think the part of LikeMatcher.compile that essentially compiles a DFA should be moved to its own method on the current DFA class.

@martint
Copy link
Member Author

martint commented Aug 22, 2022

Looking through how this PR works, I think there's something of an extra step here in the construction of the DFA. It looks like the DFA class is only used as a transitory state between NFA and DenseDfaMatcher, the latter of which actually functions as a DFA.

The DFA class is an abstract description of the DFA (useful for understanding the states and transitions). The one used by DfaMatcher is a physical version designed for efficient matching.

Either way, I think the part of LikeMatcher.compile that essentially compiles a DFA should be moved to its own method on the current DFA class.

You make a good point. That bit of code should actually go in DenseDfaMatcher, which should provide a method for constructing a instance of that class from the abstract representation of the DFA.

Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reviewing

Copy link
Member

@phd3 phd3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just one pending comment about likepattern serde

@martint martint force-pushed the like branch 2 times, most recently from ad2737e to f374887 Compare August 24, 2022 22:55
Implements a dedicated NFA/DFA-based matcher for LIKE expressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants