Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

For queries like t1d in (SELECT t2d FROM t2 ORDER BY t2c LIMIT 2), the result can be non-deterministic as the result of the subquery may output different results (it's not sorted by t2d and it has shuffle).

This PR makes the test more robust by sorting the output column.

Why are the changes needed?

avoid flaky test

Does this PR introduce any user-facing change?

no

How was this patch tested?

N/A

@cloud-fan
Copy link
Contributor Author

cc @HyukjinKwon @maropu

@HyukjinKwon HyukjinKwon changed the title [MINOR][TEST][SQL] make in-limit.sql more robust [MINOR][TEST][SQL] Make in-limit.sql more robust Jul 2, 2020
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, too.

HyukjinKwon pushed a commit that referenced this pull request Jul 2, 2020
### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the result can be non-deterministic as the result of the subquery may output different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit f834156)
Signed-off-by: HyukjinKwon <[email protected]>
@HyukjinKwon
Copy link
Member

The tests this PR fix passed.

Merged to master, branch-3.0 and branch-2.4.

HyukjinKwon pushed a commit that referenced this pull request Jul 2, 2020
### What changes were proposed in this pull request?

For queries like `t1d in (SELECT t2d FROM  t2 ORDER  BY t2c LIMIT 2)`, the result can be non-deterministic as the result of the subquery may output different results (it's not sorted by `t2d` and it has shuffle).

This PR makes the test more robust by sorting the output column.

### Why are the changes needed?

avoid flaky test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28976 from cloud-fan/small.

Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit f834156)
Signed-off-by: HyukjinKwon <[email protected]>
@SparkQA
Copy link

SparkQA commented Jul 2, 2020

Test build #124874 has finished for PR 28976 at commit 750522b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants