Add reservoir_sample aggregation function#21296
Conversation
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 24aec09...6f47a4c.
|
cb91ef5 to
5aac275
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the doc! I made some suggestions for conciseness and readability. As always, if my suggestions change your intended meaning in a way that doesn't accurately represent your intended meaning, let me know and I'm happy to discuss.
...n/java/com/facebook/presto/operator/aggregation/reservoirsample/ReservoirSampleFunction.java
Outdated
Show resolved
Hide resolved
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Tested latest changes in a new local build, the docs look fine. Thanks!
steveburnett
left a comment
There was a problem hiding this comment.
Two small nits rechecking the doc in a new local build.
e758d07 to
1c58a85
Compare
|
@ClarenceThreepwood FYI..found the cause of the issue I was having earlier. I needed the function to be called on null inputs, but I have to put the undocumented |
aaneja
left a comment
There was a problem hiding this comment.
Can you add a pointer in the docs or as a code comment on what the error message would be if a very large sample would need to cross an exchange boundary. IIRC it was the same error as any other non-scalar type that doesn't fit a max-width limit we have for non-scalars ?
.../src/main/java/com/facebook/presto/operator/aggregation/reservoirsample/ReservoirSample.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/operator/aggregation/reservoirsample/ReservoirSample.java
Outdated
Show resolved
Hide resolved
.../src/main/java/com/facebook/presto/operator/aggregation/reservoirsample/ReservoirSample.java
Outdated
Show resolved
Hide resolved
...n/java/com/facebook/presto/operator/aggregation/reservoirsample/ReservoirSampleFunction.java
Outdated
Show resolved
Hide resolved
...n/java/com/facebook/presto/operator/aggregation/reservoirsample/ReservoirSampleFunction.java
Outdated
Show resolved
Hide resolved
...com/facebook/presto/operator/aggregation/reservoirsample/TestReservoirSampleAggregation.java
Outdated
Show resolved
Hide resolved
...com/facebook/presto/operator/aggregation/reservoirsample/TestReservoirSampleAggregation.java
Outdated
Show resolved
Hide resolved
steveburnett
left a comment
There was a problem hiding this comment.
I like the additions! I've suggested some minor punctuation and phrasing changes for you to consider.
fd4f489 to
56145b5
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
New local build, everything looks great. Thanks!
56145b5 to
7d4a024
Compare
7d4a024 to
7df7367
Compare
This commit introduces a new `reservoir_sample` aggregate function which, as opposed to the existing TABLESAMPLE operator lets users pick a fixed sample size. The fixed sample sizes lets users create samples of a known total size while guaranteeing every record has an equal probability of being chosen. Co-authored-by: xiz675 <32505316+xiz675@users.noreply.github.com>
7df7367 to
6f47a4c
Compare
Description
This commit introduces a new
reservoir_sampleaggregate function which, as opposed to the existingTABLESAMPLEoperator lets users pick a fixed sample size.The fixed sample sizes lets users create samples of a known total size while guaranteeing every record has an equal probability of being chosen.
Motivation and Context
Useful for generating fixed-size samples
Impact
New
reservoir_sampleaggregation function.Test Plan
Unit tests for the function are included.
Contributor checklist
Release Notes