[TopNOperator-Spilling] [Part1] Move GroupByHash management inside GroupedTopNBuilder#18379
[TopNOperator-Spilling] [Part1] Move GroupByHash management inside GroupedTopNBuilder#18379shrinidhijoshi wants to merge 1 commit intoprestodb:masterfrom
Conversation
d0819f0 to
d7ec9ac
Compare
a5ab438 to
ba11a85
Compare
2247387 to
2ab875f
Compare
2ab875f to
5dd5b52
Compare
highker
left a comment
There was a problem hiding this comment.
If you want, feel free to schedule a design review. If you are only focusing on TopNOperator, the workflow should be fairly simple. If you want to attack TopNRowNumberOperator, things will be a lot more complicated.
For TopNOperator, ideally, the process would be:
- having a spiller in the operator
- once the memory hits the limit, drain the GroupedTopNBuilder and spill
- create a new GroupedTopNBuilder and repeat
- once the operator finishes, do external merge.
The process would be fairly similar to OrderByOperator
| OperatorContext operatorContext, | ||
| List<Type> sourceTypes, | ||
| List<Type> partitionTypes, | ||
| List<Integer> partitionChannels, | ||
| Optional<Integer> hashChannel, | ||
| int expectedPositions, | ||
| boolean isDictionaryAggregationEnabled, | ||
| JoinCompiler joinCompiler, |
There was a problem hiding this comment.
It's anti pattern to pass operator-related fields into a non-operator class.... We probably don't need to change anything in this class. It's up to the calling operators to spill. OrderByOperator has the closest logic.
There was a problem hiding this comment.
Ah that's a good point. I borrowed this pattern from SpillableHashAggregationBuilder as I was implementing for TopNRowNumberOperator as well, which does grouped TopN and in that aspect SpillableHashAggregationBuilder is closer to this problem. I can reference OrderByOperator for code patterns.
There was a problem hiding this comment.
Update from offline discussions:
Instead of moving GroupByHash into the GroupedTopNBuilder, we can create a GroupByHash Supplier/Factory and in the Operator pass it to the TopNBuilder.
This will avoid
- leaking operator related fields (operatorContext, user/revocableMemoryContext, etc) and logic into TopNBuilder
- Avoid functions with long list of argument fields
|
As a part of the design review. We also discussed that we don't need to split the PR into 3 parts. So will be closing this PR |
This PR is Part 1 of 3 part PR to implement Spilling in TopNOperator/TopNRowNumberOperator Part2, Part3)
In this PR we encapsulate the GroupByHash inside the InMemoryGroupedTopNBuilder so that it is easier to have multiple instances of GroupedTopNBuilder in the same class
Test plan - Unit tests show now failures
== NO RELEASE NOTE ==