Skip to content

Fix the performance problem of InternalResourceGroup.getActiveWorkCount#20269

Merged
rschlussel merged 1 commit intoprestodb:masterfrom
dongshengbc:scratch/get_active_worker_count
Jul 12, 2023
Merged

Fix the performance problem of InternalResourceGroup.getActiveWorkCount#20269
rschlussel merged 1 commit intoprestodb:masterfrom
dongshengbc:scratch/get_active_worker_count

Conversation

@dongshengbc
Copy link

@dongshengbc dongshengbc commented Jul 10, 2023

During an investigation of "missing metrics" problem, we traced it back to a performance problem with InternalResourceGroup.getActiveWorkCount.

The evidences:

  1. We took a JFR recording and we can see all the fb303 threads are blocked.
  2. By checking the thread dump, the lock is often held by this call InternalResourceGroup.getActiveWorkCount
  3. We did a small benchmark, and it shows a simple Sets.diff(Sets.diff) is 30 times slower than HashSet.size(), which should be O(1).

Change

To fix the problem, we calculate the value in AllNodes' contr once, and then read many times. Since all the node sets in AllNodes are immutable, this is safe to do.

Problem

The benchmark result is here. avgt = average time.

https://gist.github.com/dongshengbc/be7eee51a4d98b0dc3363b294266424b

Benchmark Mode Samples Score Score error Units
o.s.ImmutableSetBenchmark.testImmutableDiffSize avgt 5 8.605 0.241 ns/op
o.s.ImmutableSetBenchmark.testImmutableSize avgt 5 0.865 0.012 ns/op
o.s.ImmutableSetBenchmark.testTwoImmutableDiffSize avgt 5 22.483 0.923 ns/op

== RELEASE NOTES ==

General Changes
* Improve performance of getting resource group metrics

To fix the performance problem, we calculate the value in AllNode's constrctor. It is only calculated once,
and then read many times. Since all the node sets are immutable, this is safe to do.
@dongshengbc dongshengbc requested a review from a team as a code owner July 10, 2023 20:44
@dongshengbc dongshengbc requested a review from presto-oss July 10, 2023 20:44
@patzar
Copy link
Contributor

patzar commented Jul 11, 2023

Overall looks good, thanks for the fix.

Copy link
Contributor

@swapsmagic swapsmagic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing it.

Copy link
Contributor

@ajaygeorge ajaygeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % a comment

Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because: (a) Does this optimization actually help in terms of reducing the cost of the expensive call or the num of its invocations. (b) If it can be done in NodeManager proper

Copy link
Contributor

@agrawaldevesh agrawaldevesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to request changes in the prev round of reviews

@rschlussel
Copy link
Contributor

@dongshengbc can you add a release note with a user facing description? Maybe something like Improve performance of fetching resource group metrics?

@rschlussel rschlussel merged commit 8f66352 into prestodb:master Jul 12, 2023
@dongshengbc
Copy link
Author

@dongshengbc can you add a release note with a user facing description? Maybe something like Improve performance of fetching resource group metrics?

Sure, how to do that? link to the instruction?

@rschlussel
Copy link
Contributor

should have been in the PR template when you created the PR. But basically, add it in a code block at the end of the description following the format in this example: #19814 (comment)

@wanglinsong wanglinsong mentioned this pull request Jul 27, 2023
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants