Skip to content

Implement ORDER BY spilling#14836

Merged
highker merged 10 commits intoprestodb:masterfrom
sachdevs:orderbyspilling
Jul 24, 2020
Merged

Implement ORDER BY spilling#14836
highker merged 10 commits intoprestodb:masterfrom
sachdevs:orderbyspilling

Conversation

@sachdevs
Copy link
Contributor

@sachdevs sachdevs commented Jul 13, 2020

PrestoSQL PR trinodb/trino#228

TODO

  • Add docs for ORDER BY spilling
== RELEASE NOTES ==

General Changes
* Add local disk spilling support for `ORDER BY` syntax.

@sachdevs sachdevs requested a review from highker July 13, 2020 22:39
@sachdevs sachdevs force-pushed the orderbyspilling branch 2 times, most recently from b5ba144 to fbb4976 Compare July 13, 2020 23:31
@tdcmeehan
Copy link
Contributor

Looks like the Co-authored-by tags need to be on separate lines in order for the attribution to show up in Github.

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits

  • We need to fix the authoring as suggested by @tdcmeehan

@highker highker self-assigned this Jul 18, 2020
@sachdevs sachdevs force-pushed the orderbyspilling branch 2 times, most recently from 155efc3 to 11e84ae Compare July 22, 2020 21:39
@sachdevs sachdevs requested a review from highker July 22, 2020 21:44
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a visual scan. Not an expert in the details of spilling however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be squashed with Add Spill To Disk for ORDER BY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this as a separate commit since it was a bug fixed in a later commit. However, that later commit also fixes the equivalent bug for window spilling. So, to make it explicit I just created a new commit. Can either squash this or leave it, no opinion on this.

Copy link

@highker highker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is style check failure

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Revoke memory after initial output page has been produced in tests".

The commit message says "Cherry-pick of trinodb/trino@5fef5aa". But looks very different to trinodb/trino@5fef5aa ? :)

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Allow memory revoke only during operator finish phase" . Code LGTM. Just curious if you understand the motivation of this commit? @sachdevs -- so essentially we no longer revoke memory during operator execution phase, even some data is spilled to disk? -- Maybe the answer gets obvious when I look into the code :)

@sachdevs
Copy link
Contributor Author

Good question Wenlei - the first 3 commits are pulled from a separate dependent PR for ORDER BY spilling. That commit makes much more sense in the context of that PR https://github.com/prestosql/presto/pull/164/files. Essentially there used to be a bug in Aggregation spilling that I also caught independently that is fixed in our trunk. The bug is as follows: HashAggregation memory revoking can be requested at any time - including after the operator has already run finish(). The commit outlined here basically serves to enable tests to check that the feature works correctly regardless of if you call revoke before/after Operator#finish().

To further clarify, this is only for improving testing. I will be cherry picking their changes in to HashAggregation as well in a follow up PR.

@sachdevs
Copy link
Contributor Author

"Revoke memory after initial output page has been produced in tests".

The commit message says "Cherry-pick of prestosql/presto@5fef5aa". But looks very different to prestosql/presto@5fef5aa ? :)

This was a typo - fixed in latest iteration

@sachdevs sachdevs requested a review from wenleix July 23, 2020 18:17
Saksham Sachdev and others added 9 commits July 23, 2020 13:24
Cherry-pick of trinodb/trino@0a9b804

Co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>
Cherry-pick of trinodb/trino@5fef5aa

Co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>
Cherry-pick of trinodb/trino@e69b668

Co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>
Cherry-pick of trinodb/trino@f01067b

Co-authored-by: Atri Sharma <atri@linux.com>
Cherry-pick of trinodb/trino@e7d8dd5

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
ORDER BY currently will error out if the data being processed exceeds
query memory limit. This commit introduces paging from disk and ensures
that ORDER BY is limited only by the amount of disk present.

Cherry-pick of trinodb/trino@24868b3

Co-authored-by: Piotr Findeisen <piotr.findeisen@gmail.com>
Cherry-pick of trinodb/trino@472fb5a

Co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>
Cherry-pick of trinodb/trino@d35e5fb

Co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>
Cherry-pick of trinodb/trino@94b4d0a

Co-authored-by: Karol Sobczak <karol.sobczak@karolsobczak.com>
Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Make sure Travis turn green.

@highker
Copy link

highker commented Jul 24, 2020

Test failure is unrelated (#14882). Other PRs also fail on the same test.

@highker highker merged commit 392ec92 into prestodb:master Jul 24, 2020
@caithagoras caithagoras mentioned this pull request Jul 28, 2020
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants