Skip to content

Implement flushing for partial TopNOperator#10166

Closed
JunhyungSong wants to merge 1 commit intotrinodb:masterfrom
JunhyungSong:flushing-partial-topnoperator
Closed

Implement flushing for partial TopNOperator#10166
JunhyungSong wants to merge 1 commit intotrinodb:masterfrom
JunhyungSong:flushing-partial-topnoperator

Conversation

@JunhyungSong
Copy link
Copy Markdown
Member

It will prevent accumulating too many rows in partial TopNOperator.
So, it mitigates memory pressures in a cluster and results in less exceeding memory limit errors.

@cla-bot cla-bot bot added the cla-signed label Dec 3, 2021
@findepi findepi requested a review from sopel39 December 3, 2021 10:43
@sopel39 sopel39 requested a review from erichwang December 3, 2021 11:16
@JunhyungSong
Copy link
Copy Markdown
Member Author

Any updates?

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Dec 14, 2021

@erichwang can you help here?

@JunhyungSong
Copy link
Copy Markdown
Member Author

JunhyungSong commented Jan 4, 2022

any updates? I will fix the merge conflict after reviews.

@erichwang
Copy link
Copy Markdown
Contributor

I took a quick look through this PR, but I'm confused a bit about the unfinished semantics and why we need that. Can you explain a bit more about the intended behavior?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't "topNProcessor.isBuilderFull()" guaranteed to be false due to the checkState above?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be full after "addInput".

Comment on lines 102 to 105
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break how processors and state in general is intended to work in Trino. Once a a Processor moves to a finished state, it shouldn't ever be able to move out of it. It is expected to be a terminal state.

Can you explain a bit more about what you are trying to do with this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to the way WorkProcessor works. TopNPages.process will be never called until TopNOperator is finishing(a.k.a. pageBuffer.finish is called). So, flushing is not possible until TopNOperator's state is changed to finishing. If the state is not changed back from finishing(pageBuffer.finished == true) after the flushing, TopNOperator cannot receive the rest input pages. So, the rest input pages will be discarded. TopNRowNumberOperator and HashAggregationOperator haven't been migrated into WorkProcessor. So, those operators' flushing implementations don't have this issue. I know this might be a bit tricky. Please let me know if you have a better idea. Feel free to ping me through Slack as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that when you call finish() on the PageBuffer, the WorkProcessor can then report finished() state to downstream workprocessors, which indicates that no more data will appear in the future. This is not correct for partials. I think you can get this to work without finishing the PageBuffer. I believe an unfinished WorkProcessor can still yield data.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic to utilize PageBuffer, instead.

@JunhyungSong
Copy link
Copy Markdown
Member Author

Any updates?

Comment on lines 102 to 105
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that when you call finish() on the PageBuffer, the WorkProcessor can then report finished() state to downstream workprocessors, which indicates that no more data will appear in the future. This is not correct for partials. I think you can get this to work without finishing the PageBuffer. I believe an unfinished WorkProcessor can still yield data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of requiring the pageBuffer to be not finished, can we can use the partial and isFlushing flags here to indicate whether input is needed? That way, we may not need the unfinish call?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change this since pageBuffer.isEmpty() == false when flushing.

Comment on lines 238 to 253
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39, can you take a look at the WorkProcessor and double check this logic and state machine? it's been awhile since i've looked at these parts.

@JunhyungSong JunhyungSong force-pushed the flushing-partial-topnoperator branch from 0371e77 to 14d2b5c Compare January 20, 2022 19:43
Copy link
Copy Markdown
Contributor

@erichwang erichwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JunhyungSong, this is much easier for me to read. A few more comments for you, but I also want @sopel39 to take a look at some comments I left for him. The usage of the WorkProcessor feels kind of weird, and he would be able to best advise on how to do this properly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you build the result on line 90, it's not clear that any of the other side effects happening to topNBuilder will be reflected in the output (even though it may not occur in practice), it just looks more correct to an average programmer to clear the topNBuilder immediately after calling buildResult, so we should do that (unless it makes this code incorrect somehow).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though buildResult() is called, topNBuilder is still maintained in outputIterator. So, it needs to be memory-accounted until outputIterator is nullified. This is the reason why other operators like HashAggregationOperator and TopNRankingOperator maintain their builder until flushing is completed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39 this looks weird, what is the right way to do this?

Copy link
Copy Markdown
Member Author

@JunhyungSong JunhyungSong Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this approach is not ideal. This is like picking the lesser evil. @sopel39 if you have a better way to accomplish this, please let me know.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation to add one more method in Transformation interface. With this, operators implementing WorkProcessor can handle input buffer full situation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39, this looks like an awkward way to do this. Do you have any suggestions on the right structure for this pattern?

Copy link
Copy Markdown
Member Author

@JunhyungSong JunhyungSong Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this approach is not ideal. This is like picking the lesser evil. @sopel39 if you have a better way to accomplish this, please let me know.

Copy link
Copy Markdown
Member Author

@JunhyungSong JunhyungSong Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation to add one more method in Transformation interface. With this, operators implementing WorkProcessor can handle input buffer full situation. BTW, I found that when late materialization is on, input pages will be conveyed through TopNPages(WorkProcessor.Transformation).process function by WorkProcessorPipelineSourceOperator. Since, the approach using pageBuffer is not able to cover this scenario, this approach is the best option that I can think of.

@JunhyungSong JunhyungSong force-pushed the flushing-partial-topnoperator branch 4 times, most recently from 19001bb to 13ed3f0 Compare January 22, 2022 06:28
@JunhyungSong
Copy link
Copy Markdown
Member Author

is much easier for me to read. A few more comments for y

I changed the implementation to add one more method in Transformation interface. With this, operators implementing WorkProcessor can handle input buffer full situation.

Copy link
Copy Markdown
Member

@sopel39 sopel39 Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can manage flushing entirely within process method, e.g:

if (!isFlushing && inputPage != null) {
  addPage(inputPage);
  if (partial && isBuilderFull()) {
    isFlushing = true;
  } else {
    // accumulate more data
    return needsMoreData(); 
  }
}

// flushing or finishing (inputPage == null)
Page page = null;
while (page == null && !topNProcessor.noMoreOutput()) {
  page = topNProcessor.getOutput();
}

if (page != null) {
  return TransformationState.ofResult(page, false)
}

// all accumulated data have been outputted (topNProcessor.noMoreOutput() == true)
// and there will be no more input data
if (inputPage == null) {
   return finished();
}

// all accumulated data have been outputted, resume consuming pages
isFlushing = false;
return needsMoreData();

Copy link
Copy Markdown
Member Author

@JunhyungSong JunhyungSong Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is process method will be never called until finish method is called in non late materialization mode. Even in late materialization mode, it will keep sending input pages even if TopNOperator is in partial flushing mode.

Copy link
Copy Markdown
Member

@sopel39 sopel39 Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is process method will be never called until finish

We should just make TopNOperator a WorkProcessorOperator (as for example FilterAndProjectOperator).
Please run BenchmarkTopNOperator afterwards

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented new Workprocessor.Process for TopNOperator(similar to FilterAndProjectOperator). BenchmarkTopNOperator showed almost no discrepancy.

@JunhyungSong JunhyungSong force-pushed the flushing-partial-topnoperator branch from 13ed3f0 to adc1a90 Compare January 27, 2022 06:26
@JunhyungSong
Copy link
Copy Markdown
Member Author

@pettyjamesm please review it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need TopNTransformingProcess. Just modifying https://github.com/trinodb/trino/pull/10166/files#r791829469 as in should be sufficient

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the benchmark result.

AdaptiveWorkProcessorOperator
Benchmark (positionsPerPage) (topN) Mode Cnt Score Error Units
BenchmarkTopNOperator.topN 32 1 thrpt 60 16.917 ± 1.182 ops/s
BenchmarkTopNOperator.topN 32 100 thrpt 60 17.369 ± 0.517 ops/s
BenchmarkTopNOperator.topN 32 10000 thrpt 60 5.286 ± 0.058 ops/s
BenchmarkTopNOperator.topN 1024 1 thrpt 60 23.479 ± 0.387 ops/s
BenchmarkTopNOperator.topN 1024 100 thrpt 60 22.266 ± 0.267 ops/s
BenchmarkTopNOperator.topN 1024 10000 thrpt 60 7.306 ± 0.087 ops/s

WorkProcessorOperator
Benchmark (positionsPerPage) (topN) Mode Cnt Score Error Units
BenchmarkTopNOperator.topN 32 1 thrpt 60 18.258 ± 0.512 ops/s
BenchmarkTopNOperator.topN 32 100 thrpt 60 17.205 ± 0.331 ops/s
BenchmarkTopNOperator.topN 32 10000 thrpt 60 5.377 ± 0.082 ops/s
BenchmarkTopNOperator.topN 1024 1 thrpt 60 22.251 ± 1.348 ops/s
BenchmarkTopNOperator.topN 1024 100 thrpt 60 21.964 ± 0.295 ops/s
BenchmarkTopNOperator.topN 1024 10000 thrpt 60 7.349 ± 0.118 ops/s

WorkProcessorOperator showed a little bit of performance degradation when positionsPerPage is bigger. Do you think we can disregard this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the results, they seem like they are in the margin error. Even if you ignore the error... my eyes say this it a few % difference, so my gut says, do the simplest thing.

Copy link
Copy Markdown
Member Author

@JunhyungSong JunhyungSong Jan 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already submitted a separate PR for this. #10832 I just didn't update comments here.

@JunhyungSong
Copy link
Copy Markdown
Member Author

Migrate TopNOperator to WorkProcessorOperator
#10832

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Jan 31, 2022

Please rebase on top of #10832

@JunhyungSong JunhyungSong force-pushed the flushing-partial-topnoperator branch from adc1a90 to 3a4a9aa Compare February 2, 2022 01:05
@JunhyungSong
Copy link
Copy Markdown
Member Author

Please rebase on top of #10832

Done.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can inline addPage. It's only used once

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have it as a separate method.

@JunhyungSong JunhyungSong force-pushed the flushing-partial-topnoperator branch from 3a4a9aa to be65e2d Compare February 9, 2022 19:02
It will prevent accumulating too many rows in partial TopNOperator.
So, it mitigates memory pressures in a cluster and results in less
exceeding memory limit errors.
@JunhyungSong JunhyungSong force-pushed the flushing-partial-topnoperator branch from be65e2d to 3e0526f Compare February 10, 2022 01:54
@JunhyungSong
Copy link
Copy Markdown
Member Author

Any updates?

@JunhyungSong
Copy link
Copy Markdown
Member Author

Any updates?

@erichwang
Copy link
Copy Markdown
Contributor

@sopel39

@JunhyungSong
Copy link
Copy Markdown
Member Author

JunhyungSong commented Mar 17, 2022

Can you clarify whether this PR can be merged or not?

It has been so song after addressing all the comments.

@sopel39
Copy link
Copy Markdown
Member

sopel39 commented Mar 17, 2022

Can you clarify whether this PR can be merged or not?

I would like to learn about real use case for this PR

@bitsondatadev
Copy link
Copy Markdown
Member

👋 @JunhyungSong - this PR has become inactive. If you're still interested in working on it, please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

@colebow
Copy link
Copy Markdown
Member

colebow commented Nov 30, 2022

Closing this one out due to inactivity, but please reopen if you would like to pick this back up.

@colebow colebow closed this Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants