Skip to content

string-matcher: introduce OO string-matcher prior to refactor#37324

Closed
adisuissa wants to merge 10 commits intoenvoyproxy:mainfrom
adisuissa:string_matcher_template_method_part1
Closed

string-matcher: introduce OO string-matcher prior to refactor#37324
adisuissa wants to merge 10 commits intoenvoyproxy:mainfrom
adisuissa:string_matcher_template_method_part1

Conversation

@adisuissa
Copy link
Copy Markdown
Contributor

Commit Message: string-matcher: introduce OO string-matcher prior to refactor
Additional Description:
This is a first part in a refactor that replaces the current StringMatcherImpl class.
These are the goals for this new class:

  1. improve encapsulation and use polymorphism that will make the code a bit more readable.
  2. reduce the amount of memory used by the StringMatcher (e.g., an object of type ExactStringMatcher doesn't need to hold a pointer to a regex object).
  3. reduce the amount of memory that is essentially repeated (removing the kept protobuf matcher object that holds redundant information).
  4. replace the uses of the matcher() that are needed to find the contents of the string-matcher by introducing the stringRepresentation() method.

The followup PR(s) will change the many locations that use StringMatcherImpl to use the new matchers.

Risk Level: low - not being used at the moment (but future PRs will).
Testing: duplicated the tests needed
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@repokitteh-read-only
Copy link
Copy Markdown

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #37324 was opened by adisuissa.

see: more, trace.

@adisuissa
Copy link
Copy Markdown
Contributor Author

adisuissa commented Nov 22, 2024

In my draft branch (that includes this one and the future PRs), I've executed the router_config_impl_speed_test, using the following:
bazel build -c opt //test/common/router:config_impl_speed_test --config=libc++
and then executing:
bazel-bin/test/common/router/config_impl_speed_test --benchmark_repetitions=10 --benchmark_out=output.csv --benchmark_out_format=csv

and got the following results - showing only the mean of each 10 runs, and the second column is "diff%" (negative number implies that the new version is better - less time, positive number implies that the old/HEAD version is better):

name diff% ( (NEW-HEAD)/HEAD)*100 ) HEAD real_time HEAD cpu_time NEW real_time NEW cpu_time diff (NEW - HEAD)
bmRouteTableSizeWithPathPrefixMatch/1_mean -0.00257327 855.687 854.944 855.186 854.922 -0.022
bmRouteTableSizeWithPathPrefixMatch/2_mean -2.67257 936.822 936.027 911.421 911.011 -25.016
bmRouteTableSizeWithPathPrefixMatch/4_mean -1.30542 1074.67 1073.98 1060.81 1059.96 -14.02
bmRouteTableSizeWithPathPrefixMatch/8_mean -0.0569082 1301.35 1300.34 1300.36 1299.6 -0.74
bmRouteTableSizeWithPathPrefixMatch/16_mean -0.147567 1824.33 1822.9 1821.46 1820.21 -2.69
bmRouteTableSizeWithPathPrefixMatch/32_mean -3.26951 2965.97 2964.05 2869.14 2867.14 -96.91
bmRouteTableSizeWithPathPrefixMatch/64_mean -3.82015 5089.23 5084.88 4893.88 4890.63 -194.25
bmRouteTableSizeWithPathPrefixMatch/128_mean -6.04889 9688.22 9679.62 9101.47 9094.11 -585.51
bmRouteTableSizeWithPathPrefixMatch/256_mean -5.47031 18361.3 18346.3 17358.8 17342.7 -1003.6
bmRouteTableSizeWithPathPrefixMatch/512_mean -7.50744 36239.3 36209.4 33522.1 33491 -2718.4
bmRouteTableSizeWithPathPrefixMatch/1024_mean -7.59155 80719.4 80656.8 74600.7 74533.7 -6123.1
bmRouteTableSizeWithPathPrefixMatch/2048_mean -4.44011 169841 169703 162288 162168 -7535
bmRouteTableSizeWithPathPrefixMatch/4096_mean -5.50092 328432 328236 310418 310180 -18056
bmRouteTableSizeWithPathPrefixMatch/8192_mean -2.9208 619777 619248 601659 601161 -18087
bmRouteTableSizeWithPathPrefixMatch/16384_mean -2.37 1.25e+06 1.25e+06 1.22e+06 1.22e+06 -29600
bmRouteTableSizeWithExactPathMatch/1_mean 1.5072 852.689 852.045 865.504 864.887 12.842
bmRouteTableSizeWithExactPathMatch/2_mean -0.259019 929.656 928.891 927.309 926.485 -2.406
bmRouteTableSizeWithExactPathMatch/4_mean 2.38012 1062.2 1061.29 1087.47 1086.55 25.26
bmRouteTableSizeWithExactPathMatch/8_mean 1.99948 1306.42 1305.34 1332.52 1331.44 26.1
bmRouteTableSizeWithExactPathMatch/16_mean 0.743863 1816.09 1814.85 1829.8 1828.35 13.5
bmRouteTableSizeWithExactPathMatch/32_mean -2.64556 2936.02 2934.35 2859.19 2856.72 -77.63
bmRouteTableSizeWithExactPathMatch/64_mean -5.31975 5161.36 5157.01 4886.03 4882.67 -274.34
bmRouteTableSizeWithExactPathMatch/128_mean -5.52981 9644.87 9635.59 9108.9 9102.76 -532.83
bmRouteTableSizeWithExactPathMatch/256_mean -4.1457 18275 18259.9 17514.3 17502.9 -757
bmRouteTableSizeWithExactPathMatch/512_mean -2.3331 34815.9 34790.6 34005.4 33978.9 -811.7
bmRouteTableSizeWithExactPathMatch/1024_mean 2.94396 75013.2 74939.8 77213.3 77146 2206.2
bmRouteTableSizeWithExactPathMatch/2048_mean 4.40531 165018 164892 172282 172156 7264
bmRouteTableSizeWithExactPathMatch/4096_mean 0.558162 306931 306721 308675 308433 1712
bmRouteTableSizeWithExactPathMatch/8192_mean 0.183376 601368 600950 602667 602052 1102
bmRouteTableSizeWithExactPathMatch/16384_mean -1.72 1.25e+06 1.25e+06 1.23e+06 1.23e+06 -21400
bmRouteTableSizeWithRegexMatch/1_mean -0.293389 989.236 988.448 986.303 985.548 -2.9
bmRouteTableSizeWithRegexMatch/2_mean 1.18258 1190.07 1188.93 1203.82 1202.99 14.06
bmRouteTableSizeWithRegexMatch/4_mean -1.46386 1548.59 1547.28 1526.06 1524.63 -22.65
bmRouteTableSizeWithRegexMatch/8_mean -1.10764 2303.97 2302.2 2278.87 2276.7 -25.5
bmRouteTableSizeWithRegexMatch/16_mean 0.830106 4186.04 4182.6 4221.65 4217.32 34.72
bmRouteTableSizeWithRegexMatch/32_mean -1.90378 7460.86 7455.69 7319.4 7313.75 -141.94
bmRouteTableSizeWithRegexMatch/64_mean -3.63235 14238.9 14227.7 13722.4 13710.9 -516.8
bmRouteTableSizeWithRegexMatch/128_mean -3.04811 28384.9 28365.1 27524.2 27500.5 -864.6
bmRouteTableSizeWithRegexMatch/256_mean -3.21886 57147.6 57110.3 55312.4 55272 -1838.3
bmRouteTableSizeWithRegexMatch/512_mean 10.6715 172286 172141 190676 190511 18370
bmRouteTableSizeWithRegexMatch/1024_mean 11.2265 385876 385579 429347 428866 43287
bmRouteTableSizeWithRegexMatch/2048_mean 7.62185 538986 538426 580040 579464 41038
bmRouteTableSizeWithRegexMatch/4096_mean 3.88 1.06e+06 1.06e+06 1.1e+06 1.1e+06 41100
bmRouteTableSizeWithRegexMatch/8192_mean 8.01 2.56e+06 2.56e+06 2.77e+06 2.76e+06 205000
bmRouteTableSizeWithRegexMatch/16384_mean 23.9 8.96e+06 8.96e+06 1.11e+07 1.11e+07 2.14e+06
bmRouteTableSizeWithExactMatcherTree/1_mean -0.0757665 1109.56 1108.67 1108.59 1107.83 -0.84
bmRouteTableSizeWithExactMatcherTree/2_mean 0.137118 1094.79 1093.95 1096.24 1095.45 1.5
bmRouteTableSizeWithExactMatcherTree/4_mean -1.5512 1119.44 1118.49 1102.06 1101.14 -17.35
bmRouteTableSizeWithExactMatcherTree/8_mean -1.54459 1109.82 1109.03 1092.81 1091.9 -17.13
bmRouteTableSizeWithExactMatcherTree/16_mean 1.21957 1107.81 1106.95 1121.37 1120.45 13.5
bmRouteTableSizeWithExactMatcherTree/32_mean 1.84042 1106.08 1105.18 1126.52 1125.52 20.34
bmRouteTableSizeWithExactMatcherTree/64_mean 1.2884 1106.89 1106.02 1121.1 1120.27 14.25
bmRouteTableSizeWithExactMatcherTree/128_mean 0.934613 1119.01 1118.11 1129.5 1128.56 10.45
bmRouteTableSizeWithExactMatcherTree/256_mean -2.74594 1157.74 1156.98 1126.01 1125.21 -31.77
bmRouteTableSizeWithExactMatcherTree/512_mean -0.500833 1134.93 1134.11 1129.24 1128.43 -5.68
bmRouteTableSizeWithExactMatcherTree/1024_mean 1.9973 1104.32 1103.49 1126.42 1125.53 22.04
bmRouteTableSizeWithExactMatcherTree/2048_mean 1.16643 1100.17 1099.08 1113.17 1111.9 12.82
bmRouteTableSizeWithExactMatcherTree/4096_mean 0.504024 1098.01 1097.17 1103.73 1102.7 5.53
bmRouteTableSizeWithExactMatcherTree/8192_mean 0.286322 1100.88 1100.16 1104.26 1103.31 3.15
bmRouteTableSizeWithExactMatcherTree/16384_mean -2.63722 1138.46 1137.94 1108.82 1107.93 -30.01
bmRouteTableSizeWithPrefixMatcherTree/1_mean -2.73251 1132.12 1131.56 1101.64 1100.64 -30.92
bmRouteTableSizeWithPrefixMatcherTree/2_mean -4.49468 1166.29 1165.6 1113.95 1113.21 -52.39
bmRouteTableSizeWithPrefixMatcherTree/4_mean 0.825694 1110.27 1109.37 1119.38 1118.53 9.16
bmRouteTableSizeWithPrefixMatcherTree/8_mean 0.425643 1114.52 1113.61 1119.34 1118.35 4.74
bmRouteTableSizeWithPrefixMatcherTree/16_mean 0.887086 1110.26 1109.25 1120.18 1119.09 9.84
bmRouteTableSizeWithPrefixMatcherTree/32_mean 1.523 1106.66 1105.71 1123.6 1122.55 16.84
bmRouteTableSizeWithPrefixMatcherTree/64_mean 1.39178 1112.37 1111.53 1127.92 1127 15.47
bmRouteTableSizeWithPrefixMatcherTree/128_mean 0.754815 1120.2 1119.48 1128.81 1127.93 8.45
bmRouteTableSizeWithPrefixMatcherTree/256_mean 0.966175 1118.55 1117.81 1129.6 1128.61 10.8
bmRouteTableSizeWithPrefixMatcherTree/512_mean -0.440828 1134.78 1134.23 1129.9 1129.23 -5
bmRouteTableSizeWithPrefixMatcherTree/1024_mean 0.495403 1121.28 1120.3 1126.6 1125.85 5.55
bmRouteTableSizeWithPrefixMatcherTree/2048_mean 0.669579 1118.04 1117.12 1125.56 1124.6 7.48
bmRouteTableSizeWithPrefixMatcherTree/4096_mean 1.18465 1115.2 1114.25 1128.55 1127.45 13.2
bmRouteTableSizeWithPrefixMatcherTree/8192_mean 3.86798 1111.31 1110.4 1154.56 1153.35 42.95
bmRouteTableSizeWithPrefixMatcherTree/16384_mean 2.25985 1102.26 1101.4 1127.24 1126.29 24.89

Here are the full results in the files:
orig_router_config_impl_speed_test_10.csv
new_router_config_impl_speed_test_10.csv

Looking a bit closer a some of these differences, it seems that there is a wide variation, which makes me think that the above diff% are not highly reliable.

I've also executed the stats_matcher_impl_benchamrk which showed even less difference between the HEAD and NEW versions.

@adisuissa adisuissa marked this pull request as ready for review November 22, 2024 22:30
@adisuissa
Copy link
Copy Markdown
Contributor Author

/assign @ggreenway

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa adisuissa requested a review from soulxu as a code owner November 25, 2024 15:45
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa adisuissa requested a review from ggreenway as a code owner November 25, 2024 20:28
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
Signed-off-by: Adi Suissa-Peleg <adip@google.com>
…late_method_part1

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@adisuissa
Copy link
Copy Markdown
Contributor Author

@ggreenway PTAL

@ggreenway
Copy link
Copy Markdown
Member

There are a few rows in the performance table that are large degradations. Do you know why those got so much worse?

Do we change from values to pointers in some cases? That could conceivably hurt performance, especially if there's cache pressure (large working set).

/wait-any

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 4, 2025

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions Bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 4, 2025
@adisuissa
Copy link
Copy Markdown
Contributor Author

This PR will probably be superseded by #37782. Keeping it open until an agreement is reached.
/waiting-any

@github-actions github-actions Bot removed the stale stalebot believes this issue/PR has not been touched recently label Jan 6, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 5, 2025

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions Bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 5, 2025
yanavlasov pushed a commit that referenced this pull request Feb 7, 2025
This is similar to #37324 but doesn't use virtual functions (uses
std::variant instead).
These are the goals for this new StringMatcherImpl class:

1. improve encapsulation and use polymorphism that will make the code a
bit more readable.
2. reduce the amount of memory used by the StringMatcher (e.g., an
object of type ExactStringMatcher doesn't need to also hold a pointer to
a regex object).
3. reduce the amount of memory that is essentially repeated (removing
the kept protobuf matcher object that holds redundant information).
4. replace the uses of the
[matcher()](https://github.com/envoyproxy/envoy/blob/9aad3320cf58a19cb4bbbbbaa357dbedf8715900/source/common/common/matchers.h#L157)
that are needed to find the contents of the string-matcher by
introducing the stringRepresentation() method.

A [simple
test](#37782 (comment))
showed that for 1000 perfix (non-regex) matchers, the new approach
reduced the amount of memory from 570368 to 530176 bytes (~40KiB).
For a case of 1000 regex-matchers with a varying length of prefix
(1-1001), the new approach reduced the amount of memory from 15671616 to
15021632 bytes (~635KiB).

Compared to #37324, this version enjoys the inlining compiler
optimizations that can be made when virtual tables aren't used. The
downside is the use of std::variant, which may require a bit more
memory, and results in a more coupled code for the string matchers, but
IMHO it improves upon the current implementation.

Risk Level: low - not being used at the moment (but future PRs will).
Testing: duplicated the tests needed
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

---------

Signed-off-by: Adi Suissa-Peleg <adip@google.com>
@ggreenway
Copy link
Copy Markdown
Member

@adisuissa should this be closed now in favor of the other one that just merged?

@github-actions github-actions Bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 8, 2025
@adisuissa
Copy link
Copy Markdown
Contributor Author

@adisuissa should this be closed now in favor of the other one that just merged?

Yes, this can be closed now as #37782 supersedes it.

@adisuissa adisuissa closed this Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants