Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

This PR fixes predicate pushdown in row-level operations in Spark 3.2. Previously, we would not extract filters and MERGE conditions such as t.id = s.id and t.dep IN ('hr') would not be pushed down.

@github-actions github-actions bot added the spark label Feb 1, 2022
@aokolnychyi
Copy link
Contributor Author

@aokolnychyi aokolnychyi force-pushed the fix-predicate-pushdown-merge branch from 76317e5 to 383d2d6 Compare February 1, 2022 19:08
Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, one small comment.

And just for my understanding (as not super familiar), this issue seems to affects only 'conjunctive predicates' , ie with AND. Was reading method 'splitCojunctivePredicate' and we never push down OR's in any case?


val (scan, output) = PushDownUtils.pruneColumns(
scanBuilder, relation, relation.output, Seq.empty)
val (scan, output) = PushDownUtils.pruneColumns(scanBuilder, relation, relation.output, Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, is it necessary change , Seq.empty => Nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it so that it can fit on one line just like the new filter pushdown logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aokolnychyi @szehon-ho any reason for not passing pushedFilters here instead of Nil?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually nvm, this is only to prune columns.

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @aokolnychyi !!!

tableAttrs: Seq[AttributeReference]): (Seq[Filter], Seq[Expression]) = {

val tableAttrSet = AttributeSet(tableAttrs)
val filters = splitConjunctivePredicates(cond).filter(_.references.subsetOf(tableAttrSet))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this what was preventing pushdown before? We weren't filtering out expressions that referenced columns outside of the table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we did not split the condition before into parts and did not remove filters that referenced both tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szehon-ho ˆˆ

This comment provides a little bit more info to answer your question above.
We treated t.id = s.id and t.dep IN ('hr') as a single predicate that couldn't be converted as it referenced both tables. Instead, we now split it into parts and convert whatever we can (i.e. t.dep IN ('hr') in this case).


Snapshot mergeSnapshot = table.currentSnapshot();
String deletedDataFilesCount = mergeSnapshot.summary().get(SnapshotSummary.DELETED_FILES_PROP);
Assert.assertEquals("Must overwrite only 1 file", "1", deletedDataFilesCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tests use the listener to check the expressions that were pushed down directly. Should we do that in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I missed that. Could you point me to an example?

@aokolnychyi aokolnychyi merged commit 5edf20e into apache:master Feb 1, 2022
@aokolnychyi
Copy link
Contributor Author

Thanks for reviewing, @singhpk234 @szehon-ho @rdblue!

@jackye1995 jackye1995 added this to the Iceberg 0.13.1 Release milestone Feb 10, 2022
amogh-jahagirdar pushed a commit to amogh-jahagirdar/iceberg that referenced this pull request Feb 10, 2022
samarthjain pushed a commit to samarthjain/incubator-iceberg that referenced this pull request Apr 6, 2022
vanliu-tx pushed a commit to BKBASE-Plugin/iceberg that referenced this pull request May 11, 2022
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants