Skip to content

Conversation

@martint
Copy link
Member

@martint martint commented Feb 18, 2023

When the pattern contains only literals and %, use substring
search for each of the tokens, via an implementation of the
FJS algorithm: https://cgjennings.ca/articles/fjs/

Benchmark results follow:

  • dynamicXXX measures the end-to-end performance of compiling
    the matcher and calling it.
  • matchXXX measures the performance of the match call after
    the matcher has been compiled
  • xxxNonOptimized vs xxxOptimized measures the performance
    when LikeMatcher is constructed with optimize = true/false
    Benchmark                            Pattern                     Before                          After

    dynamicNonOptimized               SHORT_TOKENS_1       3206.181 ±    16.858  ns/op     1301.583 ±  6.762  ns/op
    dynamicNonOptimized               SHORT_TOKENS_2       3534.404 ±    20.939  ns/op     2073.400 ± 17.597  ns/op
    dynamicNonOptimized                  SHORT_TOKEN       2568.900 ±    24.562  ns/op      582.184 ±  2.452  ns/op
    dynamicNonOptimized                LONG_TOKENS_1      12055.974 ±    72.518  ns/op     1594.760 ±  8.006  ns/op
    dynamicNonOptimized                LONG_TOKENS_2      17133.678 ±   119.793  ns/op      700.485 ±  3.883  ns/op
    dynamicNonOptimized                 LONG_TOKEN_1       7152.323 ±    54.488  ns/op      451.341 ±  2.386  ns/op
    dynamicNonOptimized                 LONG_TOKEN_2       2852.432 ±    29.256  ns/op      342.418 ±  3.757  ns/op
    dynamicNonOptimized                 LONG_TOKEN_3       5238.197 ±    46.751  ns/op      933.180 ±  5.290  ns/op
    dynamicNonOptimized  SHORT_TOKENS_WITH_LONG_SKIP       3063.792 ±    37.088  ns/op      833.256 ± 26.775  ns/op
    dynamicOptimized                  SHORT_TOKENS_1     283428.816 ±  1611.467  ns/op     1305.750 ±  9.497  ns/op
    dynamicOptimized                  SHORT_TOKENS_2   10059684.325 ± 44593.208  ns/op     2013.463 ± 15.444  ns/op
    dynamicOptimized                     SHORT_TOKEN      81244.561 ±   339.620  ns/op      586.187 ±  2.540  ns/op
    dynamicOptimized                   LONG_TOKENS_1    4733209.512 ± 30825.948  ns/op     1603.712 ± 15.636  ns/op
    dynamicOptimized                   LONG_TOKENS_2    6875531.823 ± 33728.556  ns/op      707.062 ±  3.214  ns/op
    dynamicOptimized                    LONG_TOKEN_1     665877.955 ± 30123.355  ns/op      453.508 ±  2.343  ns/op
    dynamicOptimized                    LONG_TOKEN_2     370405.576 ±  2891.106  ns/op      342.558 ±  2.781  ns/op
    dynamicOptimized                    LONG_TOKEN_3     402514.307 ±  1920.966  ns/op      932.587 ±  4.264  ns/op
    dynamicOptimized     SHORT_TOKENS_WITH_LONG_SKIP     254232.154 ±  1114.968  ns/op      821.808 ±  4.116  ns/op

    matchNonOptimized                 SHORT_TOKENS_1       2833.111 ±    13.485  ns/op      701.785 ±  3.181  ns/op
    matchNonOptimized                 SHORT_TOKENS_2       3221.687 ±    20.231  ns/op      543.724 ±  2.822  ns/op
    matchNonOptimized                    SHORT_TOKEN       2311.488 ±    11.088  ns/op      458.462 ±  1.643  ns/op
    matchNonOptimized                  LONG_TOKENS_1      11778.521 ±    52.387  ns/op      865.535 ±  3.973  ns/op
    matchNonOptimized                  LONG_TOKENS_2      16922.399 ±    72.356  ns/op      193.247 ±  0.574  ns/op
    matchNonOptimized                   LONG_TOKEN_1       6871.454 ±    35.185  ns/op      259.938 ±  1.161  ns/op
    matchNonOptimized                   LONG_TOKEN_2       2517.248 ±    13.335  ns/op      151.030 ±  0.579  ns/op
    matchNonOptimized                   LONG_TOKEN_3       5021.075 ±    39.784  ns/op      709.089 ±  3.854  ns/op
    matchNonOptimized    SHORT_TOKENS_WITH_LONG_SKIP       2757.342 ±    16.299  ns/op      504.451 ±  1.964  ns/op
    matchOptimized                    SHORT_TOKENS_1        783.268 ±     3.646  ns/op      702.478 ±  3.716  ns/op
    matchOptimized                    SHORT_TOKENS_2       1147.895 ±     4.307  ns/op      543.043 ±  2.447  ns/op
    matchOptimized                       SHORT_TOKEN       1044.000 ±     4.159  ns/op      458.934 ±  2.049  ns/op
    matchOptimized                     LONG_TOKENS_1       1044.809 ±     5.375  ns/op      867.075 ±  4.226  ns/op
    matchOptimized                     LONG_TOKENS_2       1062.192 ±     5.323  ns/op      193.253 ±  0.678  ns/op
    matchOptimized                      LONG_TOKEN_1       1045.351 ±     4.702  ns/op      259.962 ±  1.199  ns/op
    matchOptimized                      LONG_TOKEN_2       1084.966 ±     3.921  ns/op      150.928 ±  0.652  ns/op
    matchOptimized                      LONG_TOKEN_3       1061.450 ±     3.678  ns/op      707.735 ±  3.565  ns/op
    matchOptimized       SHORT_TOKENS_WITH_LONG_SKIP       1148.827 ±     8.071  ns/op      504.854 ±  2.521  ns/op


Includes commits from #15999. Only the last two commits are new.

Release notes

(x) Release notes are required, with the following suggested text:

# General
* Improve performance of `LIKE` expressions that contain `%`. ({issue}``)

@cla-bot cla-bot bot added the cla-signed label Feb 18, 2023
@martint martint force-pushed the like-fjs branch 3 times, most recently from 7d05135 to 5bd7d4a Compare February 23, 2023 23:41
@martint martint force-pushed the like-fjs branch 2 times, most recently from f1c4371 to 90dde8d Compare February 25, 2023 21:42
@phd3 phd3 self-requested a review March 3, 2023 19:57
martint added 3 commits April 27, 2023 17:54
When the pattern contains only literals and %, use substring
search for each of the tokens, via an implementation of the
FJS algorithm: https://cgjennings.ca/articles/fjs/

Benchmark results follow:

* dynamicXXX measures the end-to-end performance of compiling
  the matcher and calling it.
* matchXXX measures the performance of the match call after
  the matcher has been compiled
* xxxNonOptimized vs xxxOptimized measures the performance
  when LikeMatcher is constructed with optimize = true/false

    Benchmark                                      (case)                 Before                          After

    dynamicNonOptimized               SHORT_TOKENS_1       3206.181 ±    16.858  ns/op     1301.583 ±  6.762  ns/op
    dynamicNonOptimized               SHORT_TOKENS_2       3534.404 ±    20.939  ns/op     2073.400 ± 17.597  ns/op
    dynamicNonOptimized                  SHORT_TOKEN       2568.900 ±    24.562  ns/op      582.184 ±  2.452  ns/op
    dynamicNonOptimized                LONG_TOKENS_1      12055.974 ±    72.518  ns/op     1594.760 ±  8.006  ns/op
    dynamicNonOptimized                LONG_TOKENS_2      17133.678 ±   119.793  ns/op      700.485 ±  3.883  ns/op
    dynamicNonOptimized                 LONG_TOKEN_1       7152.323 ±    54.488  ns/op      451.341 ±  2.386  ns/op
    dynamicNonOptimized                 LONG_TOKEN_2       2852.432 ±    29.256  ns/op      342.418 ±  3.757  ns/op
    dynamicNonOptimized                 LONG_TOKEN_3       5238.197 ±    46.751  ns/op      933.180 ±  5.290  ns/op
    dynamicNonOptimized  SHORT_TOKENS_WITH_LONG_SKIP       3063.792 ±    37.088  ns/op      833.256 ± 26.775  ns/op
    dynamicOptimized                  SHORT_TOKENS_1     283428.816 ±  1611.467  ns/op     1305.750 ±  9.497  ns/op
    dynamicOptimized                  SHORT_TOKENS_2   10059684.325 ± 44593.208  ns/op     2013.463 ± 15.444  ns/op
    dynamicOptimized                     SHORT_TOKEN      81244.561 ±   339.620  ns/op      586.187 ±  2.540  ns/op
    dynamicOptimized                   LONG_TOKENS_1    4733209.512 ± 30825.948  ns/op     1603.712 ± 15.636  ns/op
    dynamicOptimized                   LONG_TOKENS_2    6875531.823 ± 33728.556  ns/op      707.062 ±  3.214  ns/op
    dynamicOptimized                    LONG_TOKEN_1     665877.955 ± 30123.355  ns/op      453.508 ±  2.343  ns/op
    dynamicOptimized                    LONG_TOKEN_2     370405.576 ±  2891.106  ns/op      342.558 ±  2.781  ns/op
    dynamicOptimized                    LONG_TOKEN_3     402514.307 ±  1920.966  ns/op      932.587 ±  4.264  ns/op
    dynamicOptimized     SHORT_TOKENS_WITH_LONG_SKIP     254232.154 ±  1114.968  ns/op      821.808 ±  4.116  ns/op

    matchNonOptimized                 SHORT_TOKENS_1       2833.111 ±    13.485  ns/op      701.785 ±  3.181  ns/op
    matchNonOptimized                 SHORT_TOKENS_2       3221.687 ±    20.231  ns/op      543.724 ±  2.822  ns/op
    matchNonOptimized                    SHORT_TOKEN       2311.488 ±    11.088  ns/op      458.462 ±  1.643  ns/op
    matchNonOptimized                  LONG_TOKENS_1      11778.521 ±    52.387  ns/op      865.535 ±  3.973  ns/op
    matchNonOptimized                  LONG_TOKENS_2      16922.399 ±    72.356  ns/op      193.247 ±  0.574  ns/op
    matchNonOptimized                   LONG_TOKEN_1       6871.454 ±    35.185  ns/op      259.938 ±  1.161  ns/op
    matchNonOptimized                   LONG_TOKEN_2       2517.248 ±    13.335  ns/op      151.030 ±  0.579  ns/op
    matchNonOptimized                   LONG_TOKEN_3       5021.075 ±    39.784  ns/op      709.089 ±  3.854  ns/op
    matchNonOptimized    SHORT_TOKENS_WITH_LONG_SKIP       2757.342 ±    16.299  ns/op      504.451 ±  1.964  ns/op
    matchOptimized                    SHORT_TOKENS_1        783.268 ±     3.646  ns/op      702.478 ±  3.716  ns/op
    matchOptimized                    SHORT_TOKENS_2       1147.895 ±     4.307  ns/op      543.043 ±  2.447  ns/op
    matchOptimized                       SHORT_TOKEN       1044.000 ±     4.159  ns/op      458.934 ±  2.049  ns/op
    matchOptimized                     LONG_TOKENS_1       1044.809 ±     5.375  ns/op      867.075 ±  4.226  ns/op
    matchOptimized                     LONG_TOKENS_2       1062.192 ±     5.323  ns/op      193.253 ±  0.678  ns/op
    matchOptimized                      LONG_TOKEN_1       1045.351 ±     4.702  ns/op      259.962 ±  1.199  ns/op
    matchOptimized                      LONG_TOKEN_2       1084.966 ±     3.921  ns/op      150.928 ±  0.652  ns/op
    matchOptimized                      LONG_TOKEN_3       1061.450 ±     3.678  ns/op      707.735 ±  3.565  ns/op
    matchOptimized       SHORT_TOKENS_WITH_LONG_SKIP       1148.827 ±     8.071  ns/op      504.854 ±  2.521  ns/op
@martint
Copy link
Member Author

martint commented Apr 28, 2023

@phd3, addressed comments in a fixup commit.

@martint martint requested a review from phd3 April 28, 2023 00:55
@martint martint merged this pull request into trinodb:master May 1, 2023
@github-actions github-actions bot added this to the 416 milestone May 1, 2023
@martint martint deleted the like-fjs branch May 1, 2023 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants