Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a regression caused by #38511 . For FROM t WHERE rand() > 0.5 AND col = 1, we can still push down col = 1 because we don't guarantee the predicates evaluation order within a Filter.

This PR updates ScanOperation to consider this case and bring back the previous pushdown behavior.

Why are the changes needed?

fix perf regression

Does this PR introduce any user-facing change?

no

How was this patch tested?

new tests

@github-actions github-actions bot added the SQL label Nov 21, 2022
@cloud-fan
Copy link
Contributor Author

cc @wangyum @viirya

Seq("parquet", "").foreach { useV1SourceList =>
withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> useV1SourceList) {
val scan = spark.read.parquet(pathStr)
val df = scan.where($"id" > 5 && rand() > 0.5)
Copy link
Member

@viirya viirya Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from description the case should be WHERE rand() > 0.5 AND id = 5?

@cloud-fan
Copy link
Contributor Author

thanks for review, merging to master!

@cloud-fan cloud-fan closed this in 0cdbda1 Nov 22, 2022
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…nondeterministic predicates

### What changes were proposed in this pull request?

This PR fixes a regression caused by apache#38511 . For `FROM t WHERE rand() > 0.5 AND col = 1`, we can still push down `col = 1` because we don't guarantee the predicates evaluation order within a `Filter`.

This PR updates `ScanOperation` to consider this case and bring back the previous pushdown behavior.

### Why are the changes needed?

fix perf regression

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

Closes apache#38746 from cloud-fan/filter.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
…nondeterministic predicates

### What changes were proposed in this pull request?

This PR fixes a regression caused by apache#38511 . For `FROM t WHERE rand() > 0.5 AND col = 1`, we can still push down `col = 1` because we don't guarantee the predicates evaluation order within a `Filter`.

This PR updates `ScanOperation` to consider this case and bring back the previous pushdown behavior.

### Why are the changes needed?

fix perf regression

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

Closes apache#38746 from cloud-fan/filter.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…nondeterministic predicates

### What changes were proposed in this pull request?

This PR fixes a regression caused by apache#38511 . For `FROM t WHERE rand() > 0.5 AND col = 1`, we can still push down `col = 1` because we don't guarantee the predicates evaluation order within a `Filter`.

This PR updates `ScanOperation` to consider this case and bring back the previous pushdown behavior.

### Why are the changes needed?

fix perf regression

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

Closes apache#38746 from cloud-fan/filter.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants