-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37165][SQL] Add REPEATABLE in TABLESAMPLE to specify seed #34442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
cc @viirya |
|
Test build #144761 has finished for PR 34442 at commit
|
|
retest this please |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
| * are defined as a number between 0 and 100. | ||
| * - TABLESAMPLE(BUCKET x OUT OF y): Sample the table down to a 'x' divided by 'y' fraction. | ||
| */ | ||
| private def withSample(ctx: SampleContext, query: LogicalPlan): LogicalPlan = withOrigin(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also update method doc.
| test("TABLE SAMPLE") { | ||
| withTable("test") { | ||
| sql("CREATE TABLE test(c int) USING PARQUET") | ||
| for( i <- 0 to 20) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for (i <- 0 to 20)
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay. Just a few minor comments.
|
Test build #144769 has finished for PR 34442 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #144775 has finished for PR 34442 at commit
|
|
Merged to mater. Thanks a lot for reviewing! @viirya |
|
FYI @cloud-fan |
|
late LGTM |
|
Was this merged in the end? I can only see the PR as closed (not merged) and I can't see the functionality in the spark SQL docs or the codebase - but aware I may be missing something! |
|
@RossKen This PR was merged. |
What changes were proposed in this pull request?
Add REPEATABLE in SQL syntax TABLESAMPLE so user can specify seed.
Why are the changes needed?
Current syntax for TABLESAMPLE:
Dataset.samplehas a param to specify seed, so we should allow SQL has a way to specify seed too.Most of the DBMS uses REPEATABLE to let user specify seed, e.g. DB2, we will follow the same way.
Does this PR introduce any user-facing change?
Yes
new SQL syntax
How was this patch tested?
new UT