Skip to content

[TopNOperator-Spilling] [Part1] Move GroupByHash management inside GroupedTopNBuilder#18379

Closed
shrinidhijoshi wants to merge 1 commit intoprestodb:masterfrom
shrinidhijoshi:SpillableTopNOperator-A
Closed

[TopNOperator-Spilling] [Part1] Move GroupByHash management inside GroupedTopNBuilder#18379
shrinidhijoshi wants to merge 1 commit intoprestodb:masterfrom
shrinidhijoshi:SpillableTopNOperator-A

Conversation

@shrinidhijoshi
Copy link
Collaborator

@shrinidhijoshi shrinidhijoshi commented Sep 21, 2022

This PR is Part 1 of 3 part PR to implement Spilling in TopNOperator/TopNRowNumberOperator Part2, Part3)

In this PR we encapsulate the GroupByHash inside the InMemoryGroupedTopNBuilder so that it is easier to have multiple instances of GroupedTopNBuilder in the same class

Test plan - Unit tests show now failures

== NO RELEASE NOTE ==

@shrinidhijoshi shrinidhijoshi requested a review from a team as a code owner September 21, 2022 06:02
@shrinidhijoshi shrinidhijoshi force-pushed the SpillableTopNOperator-A branch 4 times, most recently from d0819f0 to d7ec9ac Compare September 21, 2022 06:36
@shrinidhijoshi shrinidhijoshi changed the title [TopNOperator-Spilling] [Pre-work] Move GroupByHash management inside InMemoryGroupedTopNBuilder [TopNOperator-Spilling] [Pre-work] Move GroupByHash management inside GroupedTopNBuilder Sep 21, 2022
@shrinidhijoshi shrinidhijoshi marked this pull request as draft September 21, 2022 06:38
@shrinidhijoshi shrinidhijoshi force-pushed the SpillableTopNOperator-A branch 3 times, most recently from a5ab438 to ba11a85 Compare September 24, 2022 19:48
@shrinidhijoshi shrinidhijoshi changed the title [TopNOperator-Spilling] [Pre-work] Move GroupByHash management inside GroupedTopNBuilder [TopNOperator-Spilling] [Part1] Move GroupByHash management inside GroupedTopNBuilder Sep 25, 2022
@shrinidhijoshi shrinidhijoshi force-pushed the SpillableTopNOperator-A branch 2 times, most recently from 2247387 to 2ab875f Compare September 25, 2022 22:07
Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, feel free to schedule a design review. If you are only focusing on TopNOperator, the workflow should be fairly simple. If you want to attack TopNRowNumberOperator, things will be a lot more complicated.

For TopNOperator, ideally, the process would be:

  1. having a spiller in the operator
  2. once the memory hits the limit, drain the GroupedTopNBuilder and spill
  3. create a new GroupedTopNBuilder and repeat
  4. once the operator finishes, do external merge.

The process would be fairly similar to OrderByOperator

Comment on lines +110 to +117
OperatorContext operatorContext,
List<Type> sourceTypes,
List<Type> partitionTypes,
List<Integer> partitionChannels,
Optional<Integer> hashChannel,
int expectedPositions,
boolean isDictionaryAggregationEnabled,
JoinCompiler joinCompiler,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's anti pattern to pass operator-related fields into a non-operator class.... We probably don't need to change anything in this class. It's up to the calling operators to spill. OrderByOperator has the closest logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a good point. I borrowed this pattern from SpillableHashAggregationBuilder as I was implementing for TopNRowNumberOperator as well, which does grouped TopN and in that aspect SpillableHashAggregationBuilder is closer to this problem. I can reference OrderByOperator for code patterns.

Copy link
Collaborator Author

@shrinidhijoshi shrinidhijoshi Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update from offline discussions:
Instead of moving GroupByHash into the GroupedTopNBuilder, we can create a GroupByHash Supplier/Factory and in the Operator pass it to the TopNBuilder.
This will avoid

  1. leaking operator related fields (operatorContext, user/revocableMemoryContext, etc) and logic into TopNBuilder
  2. Avoid functions with long list of argument fields

@shrinidhijoshi
Copy link
Collaborator Author

@highker Thanks for the review. Yes I can schedule a design review meeting. I agree that for TopNOperator its fairly simple. But I am solving for both TopNOperator and TopnRowNumber operator, as can be seen in the #18400

@shrinidhijoshi
Copy link
Collaborator Author

As a part of the design review. We also discussed that we don't need to split the PR into 3 parts. So will be closing this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants