Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 20, 2020

What changes were proposed in this pull request?

Always translate date values of pushed down filters to java.sql.Date independently from the SQL config spark.sql.datetime.java8API.enabled.

Why are the changes needed?

  1. For backward compatibility with existing implementations of datasources. For example, the following exception is thrown by ORC datasource when spark.sql.datetime.java8API.enabled is set to true:
Wrong value class java.time.LocalDate for DATE.EQUALS leaf
java.lang.IllegalArgumentException: Wrong value class java.time.LocalDate for DATE.EQUALS leaf
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.checkLiteralType(SearchArgumentImpl.java:192)
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.<init>(SearchArgumentImpl.java:75)
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$BuilderImpl.equals(SearchArgumentImpl.java:352)
	at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:229)
  1. Before the changes, date filters are not pushed down to Parquet datasource when spark.sql.datetime.java8API.enabled is true.

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Added a test to ParquetFilterSuite and to OrcFilterSuite.

@SparkQA
Copy link

SparkQA commented Apr 20, 2020

Test build #121516 has finished for PR 28272 at commit 3dca84d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2020

Test build #121537 has finished for PR 28272 at commit 9ce9a34.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-31489][SPARK-31488][SQL] Translate date values of pushed down filters to java.sql.Date [SPARK-31489][SPARK-31488][SQL][test-hive1.2] Translate date values of pushed down filters to java.sql.Date Apr 20, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 20, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 20, 2020

Test build #121538 has finished for PR 28272 at commit 32fb0ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121543 has finished for PR 28272 at commit d52fe37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121544 has finished for PR 28272 at commit d52fe37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121545 has finished for PR 28272 at commit ff2ca3f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-31489][SPARK-31488][SQL][test-hive1.2] Translate date values of pushed down filters to java.sql.Date [SPARK-31489][SPARK-31488][SQL] Translate date values of pushed down filters to java.sql.Date Apr 21, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 21, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121569 has finished for PR 28272 at commit ff2ca3f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 21, 2020

jenkins, retest this, please

}
}

test("filter pushdown - local date") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just update test("filter pushdown - date") to test with DATETIME_JAVA8API_ENABLED on and off, so that we have less duplicated code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

test("filter pushdown - local date") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

test("filter pushdown - local date") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private def translateLeafNodeFilter(predicate: Expression): Option[Filter] = predicate match {
case expressions.EqualTo(PushableColumn(name), Literal(v, t)) =>
Some(sources.EqualTo(name, convertToScala(v, t)))
Some(sources.EqualTo(name, convertToScala(v, t, false)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we're treating this as a temp fix for Spark 3.0?
Looks like ideally we should support Java 8 datetime instances for this interface as well when spark.sql.datetime.java8API.enabled is enabled. It could cause more confusion. In addition, seems like spark.sql.datetime.java8API.enabled is disabled by default, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's problematic to let the java8 config also control the value type inside Filter, as it can break existing DS v1 implementations. It's a bit unfortunate that we don't document clearly what the value type can be for Filter, but if we do, it's not user-friendly to say "the value type depends on xxx config". This just makes it harder to implement data source filter pushdown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Taking into account #23811 (comment), the flag won't be enabled by default in the near future.

@SparkQA
Copy link

SparkQA commented Apr 21, 2020

Test build #121574 has finished for PR 28272 at commit ff2ca3f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-31489][SPARK-31488][SQL] Translate date values of pushed down filters to java.sql.Date [SPARK-31489][SPARK-31488][SQL][test-hive1.2] Translate date values of pushed down filters to java.sql.Date Apr 21, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 21, 2020

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Apr 22, 2020

Test build #121594 has finished for PR 28272 at commit 2973dd7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 22, 2020

Test build #121593 has finished for PR 28272 at commit d7b2ece.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk changed the title [SPARK-31489][SPARK-31488][SQL][test-hive1.2] Translate date values of pushed down filters to java.sql.Date [SPARK-31489][SPARK-31488][SQL] Translate date values of pushed down filters to java.sql.Date Apr 22, 2020
@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 22, 2020

@cloud-fan @HyukjinKwon @dongjoon-hyun Please, take a look at the PR if you have time.

dongjoon-hyun pushed a commit that referenced this pull request Apr 26, 2020
…` values in ORC

### What changes were proposed in this pull request?
Convert `java.time.LocalDate` to `java.sql.Date` in pushed down filters to ORC datasource when Java 8 time API enabled.

Closes #28272

### Why are the changes needed?
The changes fix the exception raised while pushing date filters when `spark.sql.datetime.java8API.enabled` is set to `true`:
```
Wrong value class java.time.LocalDate for DATE.EQUALS leaf
java.lang.IllegalArgumentException: Wrong value class java.time.LocalDate for DATE.EQUALS leaf
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.checkLiteralType(SearchArgumentImpl.java:192)
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$PredicateLeafImpl.<init>(SearchArgumentImpl.java:75)
	at org.apache.hadoop.hive.ql.io.sarg.SearchArgumentImpl$BuilderImpl.equals(SearchArgumentImpl.java:352)
	at org.apache.spark.sql.execution.datasources.orc.OrcFilters$.buildLeafSearchArgument(OrcFilters.scala:229)
```

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
Added tests to `OrcFilterSuite`.

Closes #28261 from MaxGekk/orc-date-filter-pushdown.

Authored-by: Max Gekk <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit bd139bd)
Signed-off-by: Dongjoon Hyun <[email protected]>
@MaxGekk MaxGekk deleted the fix-translateLeafNodeFilter branch June 5, 2020 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants