Skip to content

Optimize how we merge multiple operatorStats#24414

Merged
arhimondr merged 1 commit intoprestodb:masterfrom
shangm2:optimize_merging_operator_stats
Jan 28, 2025
Merged

Optimize how we merge multiple operatorStats#24414
arhimondr merged 1 commit intoprestodb:masterfrom
shangm2:optimize_merging_operator_stats

Conversation

@shangm2
Copy link
Contributor

@shangm2 shangm2 commented Jan 22, 2025

Description

  1. Thanks to @arhimondr who observed that GC could take too much cpu to clean up memory during heavy load
Screenshot 2025-01-21 at 21 55 30
  1. This pr will be the first of a series of optimization to improve how objects are being created along the critical path.
  2. This pr optimizes how we merge multiple operatorStats without creating temporary/intermediate objects

Motivation and Context

  1. The original code will create temporary objects every time we add two OperatorStats together (with same id) using v.add(operatorStats) and this intermediate object will be discarded when it is used to merge with next OperatorStats object, This PR groups all operatorStats by their id and then merge them together in one go.
  2. Refactoring the code by moving local variables into a dedicated class so that we can easily use one loop within the create method to aggregate all necessary metrics.

Impact

Test Plan

  1. local hiveQueryRunner works fine
  2. Internal verifier tests passed

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Optimizations
* Improve how we merge multiple operator stats together. :pr:`24414`
* Improve metrics creation by refactoring local variables to a dedicated class. :pr:`24414`


@shangm2 shangm2 requested a review from a team as a code owner January 22, 2025 04:59
@shangm2 shangm2 requested a review from presto-oss January 22, 2025 04:59
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Jan 22, 2025
@shangm2 shangm2 force-pushed the optimize_merging_operator_stats branch from dfcc4ca to a8ecf10 Compare January 22, 2025 05:04
@shangm2 shangm2 force-pushed the optimize_merging_operator_stats branch 2 times, most recently from 727cf06 to 6808005 Compare January 24, 2025 07:38
@steveburnett
Copy link
Contributor

Thanks for the release note! Rephrasing suggestions to follow the Order of changes in the Release Notes Guidelines:

== RELEASE NOTES ==

General Changes
* Improve how we merge multiple operator stats together. :pr:`24414`
* Improve metrics creation by refactoring local variables to a dedicated class. :pr:`24414`

@shangm2
Copy link
Contributor Author

shangm2 commented Jan 24, 2025

@arhimondr feel free to take another look. Thank you so much for all the awesome suggestionsl

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for following up. Looks good to me % comments

@shangm2 shangm2 force-pushed the optimize_merging_operator_stats branch 2 times, most recently from 1d78017 to a3c6221 Compare January 27, 2025 21:41
for (OperatorStats operator : operators) {
operatorSummaries.compute(operator.getOperatorId(), (operatorId, summaryStats) -> summaryStats == null ? operator : summaryStats.add(operator));
}
operators.stream().collect(Collectors.groupingBy(OperatorStats::getOperatorId))
Copy link
Member

@arhimondr arhimondr Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only have one OperatorStats per operator. Probably we can keep the loop and avoid groupingBy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I should have double checked on that. Thanks for catching it.

}
operatorSummaries.put(operatorId, combined);
}
runningOperators.asMap().forEach((operatorId, runningStats) -> {
Copy link
Member

@arhimondr arhimondr Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the same approach with merge + toImmutableList)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Did not realize this part is rather similar to StageExecutionStats

@shangm2 shangm2 force-pushed the optimize_merging_operator_stats branch 2 times, most recently from 1cedaea to 749d60b Compare January 28, 2025 02:07
int stageExecutionId = first.getStageExecutionId();
int pipelineId = first.getPipelineId();
PlanNodeId planNodeId = first.getPlanNodeId();
String operatorType = first.getOperatorType();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arhimondr one issue with "0" based initial value is we dont have a starting value for those ids but need to grab them from the first item of the collections passed in. This is safe since we check its emptiness right above. Let me know what you think.

@shangm2 shangm2 force-pushed the optimize_merging_operator_stats branch 3 times, most recently from 6ea95c3 to cec07cd Compare January 28, 2025 04:51
arhimondr
arhimondr previously approved these changes Jan 28, 2025
Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good. Thank you for the follow up.

Looks great to me % two small questions

ListMultimap<Integer, OperatorStats> runningOperators = ArrayListMultimap.create();
ImmutableList.Builder<DriverStats> drivers = ImmutableList.builderWithExpectedSize(driverContexts.size());
// Make deep copy of each list
ConcurrentMap<Integer, List<OperatorStats>> operatorStatsById = this.operatorStatsById.entrySet().stream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be ConcurrentMap? Have you considered a simple Map and the toMap collector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely correct. It does not need to be concurrentMap. Regular map will do the job.

ImmutableList.Builder<DriverStats> drivers = ImmutableList.builderWithExpectedSize(driverContexts.size());
// Make deep copy of each list
ConcurrentMap<Integer, List<OperatorStats>> operatorStatsById = this.operatorStatsById.entrySet().stream()
.collect(toConcurrentMap(Map.Entry::getKey, e -> new ArrayList<>(Arrays.asList(e.getValue()))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to have an extra new ArrayList<>(...)? Would Arrays.asList(e.getValue()) alone work?

Copy link
Contributor Author

@shangm2 shangm2 Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it needs to be new ArrayList<> since Arrays.asList(e.getValue()) will create a immutable resizable list but we need it be mutable so we can add item to it later on here

for (OperatorStats operatorStats : driverStats.getOperatorStats()) {
                operatorStatsById.computeIfAbsent(operatorStats.getOperatorId(), k -> new ArrayList<>()).add(operatorStats);
            }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard JDK Arrays.asList returns a mutable ArrayList: https://github.com/openjdk/jdk21/blob/master/src/java.base/share/classes/java/util/Arrays.java#L4222

Is this the JDK Arrays.asList used here? Or is there one in Guava?

Copy link
Contributor Author

@shangm2 shangm2 Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is from standard JDK. It gives the following exception because the Arrays.asList call "returns a fixed-size list backed by the specified array." (so basically a view of the array) We can modify existing elements but can not resize the list. I should have said it is mutable by not resizable. Sorry about the confusion and let me know if there is a better way. Really appreciate all the discussion. Learned a lot!
Screenshot 2025-01-28 at 12 49 15

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that. Thank you for the explanation

@arhimondr arhimondr merged commit 510024d into prestodb:master Jan 28, 2025
52 checks passed
@prestodb-ci prestodb-ci mentioned this pull request Mar 28, 2025
30 tasks
@kewang1024 kewang1024 mentioned this pull request Apr 1, 2025
7 tasks
shangm2 added a commit that referenced this pull request Apr 16, 2025
## Description
1. this pr re-introduce the #24414 , which cause a sev where written
partition was not logged. The bug is a corner case, where while merging
only one single non-mergeable operatorInfo, the old code will NOT
perform any merge operation (since the add operation will only get
invoke when the second operator stats shows up) and give back the
operator info itself while #24414 will actually kick off a merge and
gives null result.
2. This pr reintroduce #24414 and handles this corner case and also
added specific unit tests for this scenario.

## Motivation and Context
1. re-introduce #24414 

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
1.  verifier runs log written partition correctly:
<img width="1469" alt="Screenshot 2025-04-15 at 17 13 08"
src="https://github.com/user-attachments/assets/f7c84a8f-7381-411a-95d1-15b075870b83"
/>


## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* Improve how we merge multiple operator stats together.
* Improve metrics creation by refactoring local variables to a dedicated class.

```
AnuragKDwivedi pushed a commit to AnuragKDwivedi/presto-1 that referenced this pull request Apr 21, 2025
## Description
1. this pr re-introduce the prestodb#24414 , which cause a sev where written
partition was not logged. The bug is a corner case, where while merging
only one single non-mergeable operatorInfo, the old code will NOT
perform any merge operation (since the add operation will only get
invoke when the second operator stats shows up) and give back the
operator info itself while prestodb#24414 will actually kick off a merge and
gives null result.
2. This pr reintroduce prestodb#24414 and handles this corner case and also
added specific unit tests for this scenario.

## Motivation and Context
1. re-introduce prestodb#24414 

## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
1.  verifier runs log written partition correctly:
<img width="1469" alt="Screenshot 2025-04-15 at 17 13 08"
src="https://github.com/user-attachments/assets/f7c84a8f-7381-411a-95d1-15b075870b83"
/>


## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General Changes
* Improve how we merge multiple operator stats together.
* Improve metrics creation by refactoring local variables to a dedicated class.

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants