-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-38182][SQL] Fix NoSuchElementException if pushed filter does not contain any references #35487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yaohua628
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! LGTM
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
Outdated
Show resolved
Hide resolved
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise.
|
thank you @HyukjinKwon , addressed the comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use rand() > 0.5, can we trigger the bug without disabling the optimizer rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can not trigger this bug since we can not push down indeterministic filter. see code in FileSourceStrategy
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
Lines 157 to 173 in e0bc977
| val normalizedFilters = DataSourceStrategy.normalizeExprs( | |
| filters.filter(_.deterministic), l.output) | |
| val partitionColumns = | |
| l.resolve( | |
| fsRelation.partitionSchema, fsRelation.sparkSession.sessionState.analyzer.resolver) | |
| val partitionSet = AttributeSet(partitionColumns) | |
| // this partitionKeyFilters should be the same with the ones being executed in | |
| // PruneFileSourcePartitions | |
| val partitionKeyFilters = DataSourceStrategy.getPushedDownFilters(partitionColumns, | |
| normalizedFilters) | |
| // subquery expressions are filtered out because they can't be used to prune buckets or pushed | |
| // down as data filters, yet they would be executed | |
| val normalizedFiltersWithoutSubqueries = | |
| normalizedFilters.filterNot(SubqueryExpression.hasSubquery) |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you resolve the conflict, @ulysses-you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, if you don't mind, could you add a minimal test coverage at FileIndexSuite.scala instead of SQLQuerySuite, @ulysses-you ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @dongjoon-hyun for the reminder !
befe064 to
7607033
Compare
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @ulysses-you and all.
Merged to master.
|
thank you all ! |
What changes were proposed in this pull request?
skip non-references filter during binding metadata-based filiter
Why are the changes needed?
this issue is from #35055.
reproduce:
and the error msg:
Does this PR introduce any user-facing change?
yes, a bug fix
How was this patch tested?
add a new test