Improve performance of `ListResources` by rosstimothy · Pull Request #23534 · gravitational/teleport

rosstimothy · 2023-03-23T19:31:17Z

Reduces latency of proto.AuthService/ListResources in a variety of ways:

6f9ac02: Moves RBAC logging from DEBUG to TRACE
c71ef7b: Stores compiled *regexp.Regexp in a LRU cache so that they can be reused during RBAC
ec0860c: Adds a GetLabel(key string) (value string, ok bool) to types.ResourcesWithLabels to prevent copying the entire label set when we just need to look up keys
78d0bef: Prevents loading an extra page from the cache to determine the next key in auth.ServerWithRoles.ListResources
81b08ed: Modifies services.UnmarshalServer to unmarshal directly into a types.ServerV2 instead of first into a types.ResourceHeader to check that the version is types.V2

Comparison of BenchmarkListNodes from b1715a5 to ec0860c:

benchstat b1715a5.txt ec0860c.txt
goos: darwin
goarch: amd64
pkg: github.com/gravitational/teleport/lib/auth
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
             │ b1715a5.txt │             ec0860c.txt             │
             │   sec/op    │   sec/op     vs base                │
ListNodes-16   19.307 ± 5%   1.033 ± 14%  -94.65% (p=0.000 n=10)

             │  b1715a5.txt   │             ec0860c.txt              │
             │      B/op      │     B/op      vs base                │
ListNodes-16   11154.5Mi ± 0%   493.0Mi ± 0%  -95.58% (p=0.000 n=10)

             │  b1715a5.txt  │             ec0860c.txt             │
             │   allocs/op   │  allocs/op   vs base                │
ListNodes-16   112.067M ± 0%   8.341M ± 0%  -92.56% (p=0.000 n=10)

fspmarshall

nit: the old CombineLabels strategy would perform the combination s.t. command labels took precedence over static labels (i.e. if a command label and a static label exist for the same key, only the command label would be observed). As implemented, these GetLabel methods are giving precedence to static labels. I actually prefer this strategy, but it would technically be a breaking change in RBAC behavior, so probably best to change it.

zmb3

nice work!

strideynet

Great work, awesome to see benchmarks being used to ensure improvements are worthwhile.

BenchmarkListNodes is twice as slow when RBAC logging is enabled. By switching RBAC logging from debug to trace we can eliminate the performance hit while still providing a way for users to opt in to the behavior if they need to debug RBAC.

Profiles of the benchmark test revealed that the `regexp.Compile` done within `utils.matchString` was the most cpu and memory intensive portion of the tests. By leveraging a `lru.Cache` to intern the compiled regular expressions we get quite a performance improvement.

Increases the request limit prior to loading the resources from the cache so that we load enough items in a single page to determine the start key of the next page.

Unmarshal directly to a `types.ServerV2` instead of first creating a `types.ResourceHeader` to inspect the version. There is only a single version for `types.ServerV2` making the check unnecessary.

`GetAllLabels` can be overkill if one simply needs to look up the value for a particular label. It creates a new `map[string]string` and copies all of a resources existing labels. RBAC decisions driven by labels incurred the penalty of the copy each time access was checked. The impact of the copy is much more noticeable when a resource has several labels or really long strings in the key or value. By leveraging `GetLabel` RBAC can avoid copying the labels altogether and simply lookup each label key when required.

* Add benchmark for ListNodes * Move RBAC logging to trace level BenchmarkListNodes is twice as slow when RBAC logging is enabled. By switching RBAC logging from debug to trace we can eliminate the performance hit while still providing a way for users to opt in to the behavior if they need to debug RBAC. * Intern compiled regular expressions Profiles of the benchmark test revealed that the `regexp.Compile` done within `utils.matchString` was the most cpu and memory intensive portion of the tests. By leveraging a `lru.Cache` to intern the compiled regular expressions we get quite a performance improvement. * Only fetch a single page of resources Increases the request limit prior to loading the resources from the cache so that we load enough items in a single page to determine the start key of the next page. * Remove version checking from `services.UnmarshalServer` Unmarshal directly to a `types.ServerV2` instead of first creating a `types.ResourceHeader` to inspect the version. There is only a single version for `types.ServerV2` making the check unnecessary. * Add `GetLabel` to `types.ResourceWithLables` `GetAllLabels` can be overkill if one simply needs to look up the value for a particular label. It creates a new `map[string]string` and copies all of a resources existing labels. RBAC decisions driven by labels incurred the penalty of the copy each time access was checked. The impact of the copy is much more noticeable when a resource has several labels or really long strings in the key or value. By leveraging `GetLabel` RBAC can avoid copying the labels altogether and simply lookup each label key when required.

The main changes here are: - Using an LRU cache to store compiled regular expressions. - Removing stack traces captured by trace.NotFound/trace.Wrap when there are no matches. This was heavily inspired by #23534 which made similar changes to improve the performance of utils.MatchString. BenchmarkReplaceRegexp was added to validate the improvements in this change and prevent regressions in the future. Results from before and after this change: benchstat old.txt new.txt goos: darwin goarch: arm64 pkg: github.com/gravitational/teleport/lib/utils cpu: Apple M2 Pro │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ReplaceRegexp/same_expression-12 29.527µ ± 12% 6.837µ ± 1% -76.84% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 22.94µ ± 1% 25.10µ ± 1% +9.41% (p=0.002 n=10) ReplaceRegexp/no_matches-12 24.692µ ± 3% 2.861µ ± 1% -88.41% (p=0.000 n=10) geomean 25.57µ 7.889µ -69.15% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ReplaceRegexp/same_expression-12 22071.5 ± 1% 164.0 ± 0% -99.26% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 12.90Ki ± 1% 12.87Ki ± 0% ~ (p=0.165 n=10) ReplaceRegexp/no_matches-12 14257.00 ± 1% 15.00 ± 0% -99.89% (p=0.000 n=10) geomean 15.70Ki 318.8 -98.02% │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ReplaceRegexp/same_expression-12 58.000 ± 0% 6.000 ± 0% -89.66% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 55.00 ± 0% 55.00 ± 0% ~ (p=1.000 n=10) ¹ ReplaceRegexp/no_matches-12 58.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) geomean 56.98 ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomean

The main changes here are: - Using an LRU cache to store compiled regular expressions. - Removing stack traces captured by trace.NotFound/trace.Wrap when there are no matches. This was heavily inspired by #23534 which made similar changes to improve the performance of utils.MatchString. The main beneficiaries of this change are services.MapRoles and services.TraitsToRoles which rely heavily on utils.ReplaceRegexp, utils.RegexpWithConfig, or ReplaceRegexpWith. BenchmarkReplaceRegexp was added to validate the improvements in this change and prevent regressions in the future. Results from before and after this change: benchstat old.txt new.txt goos: darwin goarch: arm64 pkg: github.com/gravitational/teleport/lib/utils cpu: Apple M2 Pro │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ReplaceRegexp/same_expression-12 29.527µ ± 12% 6.837µ ± 1% -76.84% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 22.94µ ± 1% 25.10µ ± 1% +9.41% (p=0.002 n=10) ReplaceRegexp/no_matches-12 24.692µ ± 3% 2.861µ ± 1% -88.41% (p=0.000 n=10) geomean 25.57µ 7.889µ -69.15% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ReplaceRegexp/same_expression-12 22071.5 ± 1% 164.0 ± 0% -99.26% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 12.90Ki ± 1% 12.87Ki ± 0% ~ (p=0.165 n=10) ReplaceRegexp/no_matches-12 14257.00 ± 1% 15.00 ± 0% -99.89% (p=0.000 n=10) geomean 15.70Ki 318.8 -98.02% │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ReplaceRegexp/same_expression-12 58.000 ± 0% 6.000 ± 0% -89.66% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 55.00 ± 0% 55.00 ± 0% ~ (p=1.000 n=10) ¹ ReplaceRegexp/no_matches-12 58.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) geomean 56.98 ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomean

The main changes here are: - Using an LRU cache to store compiled regular expressions. - Removing stack traces captured by trace.NotFound/trace.Wrap when there are no matches. This was heavily inspired by gravitational#23534 which made similar changes to improve the performance of utils.MatchString. The main beneficiaries of this change are services.MapRoles and services.TraitsToRoles which rely heavily on utils.ReplaceRegexp, utils.RegexpWithConfig, or ReplaceRegexpWith. BenchmarkReplaceRegexp was added to validate the improvements in this change and prevent regressions in the future. Results from before and after this change: benchstat old.txt new.txt goos: darwin goarch: arm64 pkg: github.com/gravitational/teleport/lib/utils cpu: Apple M2 Pro │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ ReplaceRegexp/same_expression-12 29.527µ ± 12% 6.837µ ± 1% -76.84% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 22.94µ ± 1% 25.10µ ± 1% +9.41% (p=0.002 n=10) ReplaceRegexp/no_matches-12 24.692µ ± 3% 2.861µ ± 1% -88.41% (p=0.000 n=10) geomean 25.57µ 7.889µ -69.15% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ ReplaceRegexp/same_expression-12 22071.5 ± 1% 164.0 ± 0% -99.26% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 12.90Ki ± 1% 12.87Ki ± 0% ~ (p=0.165 n=10) ReplaceRegexp/no_matches-12 14257.00 ± 1% 15.00 ± 0% -99.89% (p=0.000 n=10) geomean 15.70Ki 318.8 -98.02% │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ ReplaceRegexp/same_expression-12 58.000 ± 0% 6.000 ± 0% -89.66% (p=0.000 n=10) ReplaceRegexp/unique_expressions-12 55.00 ± 0% 55.00 ± 0% ~ (p=1.000 n=10) ¹ ReplaceRegexp/no_matches-12 58.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) geomean 56.98 ? ² ³ ¹ all samples are equal ² summaries must be >0 to compute geomean ³ ratios must be >0 to compute geomean

fspmarshall reviewed Mar 23, 2023

View reviewed changes

rosstimothy force-pushed the tross/ls_bench branch 5 times, most recently from 64cf613 to ec0860c Compare March 24, 2023 13:00

rosstimothy marked this pull request as ready for review March 24, 2023 13:52

rosstimothy requested a review from fspmarshall March 24, 2023 13:52

github-actions Bot requested review from EdwardDowling and strideynet March 24, 2023 13:53

github-actions Bot added the size/md label Mar 24, 2023

rosstimothy added backport/branch/v10 labels Mar 24, 2023

zmb3 approved these changes Mar 24, 2023

View reviewed changes

strideynet approved these changes Mar 24, 2023

View reviewed changes

Comment thread lib/auth/auth_with_roles_test.go Outdated

public-teleport-github-review-bot Bot removed the request for review from EdwardDowling March 24, 2023 15:54

fspmarshall approved these changes Mar 24, 2023

View reviewed changes

rosstimothy force-pushed the tross/ls_bench branch 2 times, most recently from fc4dc2b to 623322a Compare March 24, 2023 17:43

rosstimothy added 6 commits March 24, 2023 13:51

Add benchmark for ListNodes

aa5d5f8

Move RBAC logging to trace level

bcb590c

BenchmarkListNodes is twice as slow when RBAC logging is enabled. By switching RBAC logging from debug to trace we can eliminate the performance hit while still providing a way for users to opt in to the behavior if they need to debug RBAC.

Only fetch a single page of resources

07c085d

Increases the request limit prior to loading the resources from the cache so that we load enough items in a single page to determine the start key of the next page.

Remove version checking from services.UnmarshalServer

ff0f5cf

Unmarshal directly to a `types.ServerV2` instead of first creating a `types.ResourceHeader` to inspect the version. There is only a single version for `types.ServerV2` making the check unnecessary.

rosstimothy force-pushed the tross/ls_bench branch from 623322a to 7f74697 Compare March 24, 2023 17:51

rosstimothy enabled auto-merge March 24, 2023 18:02

rosstimothy added this pull request to the merge queue Mar 24, 2023

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Mar 24, 2023

This was referenced Mar 24, 2023

[v12] Improve performance of ListResources #23596

Merged

[v11] Improve performance of ListResources #23597

Merged

rosstimothy mentioned this pull request Mar 25, 2023

[v10] Improve performance of ListResources #23606

Merged

rosstimothy mentioned this pull request Feb 5, 2025

Improve performance of utils.ReplaceRegexp #51872

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `ListResources`#23534

Improve performance of `ListResources`#23534
rosstimothy merged 6 commits intomasterfrom
tross/ls_bench

rosstimothy commented Mar 23, 2023 •

edited

Loading

Uh oh!

fspmarshall left a comment

Uh oh!

zmb3 left a comment

Uh oh!

strideynet left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rosstimothy commented Mar 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fspmarshall left a comment

Choose a reason for hiding this comment

Uh oh!

zmb3 left a comment

Choose a reason for hiding this comment

Uh oh!

strideynet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rosstimothy commented Mar 23, 2023 •

edited

Loading