[SPARK-38182][SQL] Fix NoSuchElementException if pushed filter does not contain any references #35487

ulysses-you · 2022-02-11T05:49:44Z

What changes were proposed in this pull request?

skip non-references filter during binding metadata-based filiter

Why are the changes needed?

this issue is from #35055.

reproduce:

CREATE TABLE t (c1 int) USING PARQUET;

SET spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.BooleanSimplification;

SELECT * FROM t WHERE c1 = 1 AND 2 > 1;

and the error msg:

java.util.NoSuchElementException: next on empty iterator
	at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
	at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
	at scala.collection.mutable.LinkedHashSet$$anon$1.next(LinkedHashSet.scala:89)
	at scala.collection.IterableLike.head(IterableLike.scala:109)
	at scala.collection.IterableLike.head$(IterableLike.scala:108)
	at org.apache.spark.sql.catalyst.expressions.AttributeSet.head(AttributeSet.scala:69)
	at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.$anonfun$listFiles$3(PartitioningAwareFileIndex.scala:85)
	at scala.Option.map(Option.scala:230)
	at org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.listFiles(PartitioningAwareFileIndex.scala:84)
	at org.apache.spark.sql.execution.FileSourceScanExec.selectedPartitions$lzycompute(DataSourceScanExec.scala:249)

Does this PR introduce any user-facing change?

yes, a bug fix

How was this patch tested?

add a new test

ulysses-you · 2022-02-11T05:51:04Z

cc @Yaohua628 @cloud-fan

Yaohua628

Thanks for the fix! LGTM

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

HyukjinKwon

LGTM otherwise.

ulysses-you · 2022-02-11T06:15:26Z

thank you @HyukjinKwon , addressed the comment

cloud-fan · 2022-02-11T07:31:14Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

If we use rand() > 0.5, can we trigger the bug without disabling the optimizer rule?

It can not trigger this bug since we can not push down indeterministic filter. see code in FileSourceStrategy

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala

Lines 157 to 173 in e0bc977

val normalizedFilters = DataSourceStrategy.normalizeExprs(

filters.filter(_.deterministic), l.output)

val partitionColumns =

l.resolve(

fsRelation.partitionSchema, fsRelation.sparkSession.sessionState.analyzer.resolver)

val partitionSet = AttributeSet(partitionColumns)

// this partitionKeyFilters should be the same with the ones being executed in

// PruneFileSourcePartitions

val partitionKeyFilters = DataSourceStrategy.getPushedDownFilters(partitionColumns,

normalizedFilters)

// subquery expressions are filtered out because they can't be used to prune buckets or pushed

// down as data filters, yet they would be executed

val normalizedFiltersWithoutSubqueries =

normalizedFilters.filterNot(SubqueryExpression.hasSubquery)

dongjoon-hyun

Could you resolve the conflict, @ulysses-you ?

dongjoon-hyun · 2022-02-17T07:06:10Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

And, if you don't mind, could you add a minimal test coverage at FileIndexSuite.scala instead of SQLQuerySuite, @ulysses-you ?

thank you @dongjoon-hyun for the reminder !

…rences

dongjoon-hyun

+1, LGTM. Thank you, @ulysses-you and all.
Merged to master.

ulysses-you · 2022-02-18T01:26:16Z

thank you all !

github-actions bot added the SQL label Feb 11, 2022

Yaohua628 approved these changes Feb 11, 2022

View reviewed changes

HyukjinKwon reviewed Feb 11, 2022

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala Outdated Show resolved Hide resolved

HyukjinKwon reviewed Feb 11, 2022

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala Outdated Show resolved Hide resolved

HyukjinKwon approved these changes Feb 11, 2022

View reviewed changes

cloud-fan reviewed Feb 11, 2022

View reviewed changes

cloud-fan approved these changes Feb 11, 2022

View reviewed changes

dongjoon-hyun reviewed Feb 17, 2022

View reviewed changes

ulysses-you added 3 commits February 17, 2022 16:16

Fix NoSuchElementException if pushed filter does not contain any refe…

4cc5d07

…rences

address comment

569679e

FileIndexSuite

7607033

ulysses-you force-pushed the SPARK-38182 branch from befe064 to 7607033 Compare February 17, 2022 08:18

dongjoon-hyun approved these changes Feb 17, 2022

View reviewed changes

dongjoon-hyun closed this in 724bc31 Feb 17, 2022

ulysses-you deleted the SPARK-38182 branch February 18, 2022 01:26

	val normalizedFilters = DataSourceStrategy.normalizeExprs(
	filters.filter(_.deterministic), l.output)

	val partitionColumns =
	l.resolve(
	fsRelation.partitionSchema, fsRelation.sparkSession.sessionState.analyzer.resolver)
	val partitionSet = AttributeSet(partitionColumns)

	// this partitionKeyFilters should be the same with the ones being executed in
	// PruneFileSourcePartitions
	val partitionKeyFilters = DataSourceStrategy.getPushedDownFilters(partitionColumns,
	normalizedFilters)

	// subquery expressions are filtered out because they can't be used to prune buckets or pushed
	// down as data filters, yet they would be executed
	val normalizedFiltersWithoutSubqueries =
	normalizedFilters.filterNot(SubqueryExpression.hasSubquery)

[SPARK-38182][SQL] Fix NoSuchElementException if pushed filter does not contain any references #35487

[SPARK-38182][SQL] Fix NoSuchElementException if pushed filter does not contain any references #35487

Uh oh!

Conversation

ulysses-you commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ulysses-you commented Feb 11, 2022

Uh oh!

Yaohua628 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

ulysses-you commented Feb 11, 2022

Uh oh!

cloud-fan Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

ulysses-you Feb 11, 2022

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Feb 17, 2022

Choose a reason for hiding this comment

Uh oh!

ulysses-you Feb 17, 2022

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

ulysses-you commented Feb 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ulysses-you commented Feb 11, 2022 •

edited

Loading