-
Notifications
You must be signed in to change notification settings - Fork 3k
Hive: Fix predicate pushdown for Date #2254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The manifest and residual evaluator expect the following java types for each data type
I think we will need to do something very similar to wrapping the record into an
That should handle conversion between types used by iceberg-data v/s the ones used by the evaluators.
|
That works! Thanks for the pointer. Updated the PR |
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerLocalScan.java
Outdated
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java
Show resolved
Hide resolved
…they were correct in the first place Also addressed Marton's comment
|
@shardulm94: Could you please review as a committer, so I can merge? |
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerLocalScan.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerLocalScan.java
Outdated
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergFilterFactory.java
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
…ly. Not doing anything helps to highlihgt issues faster if there is a problem with it
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergFilterFactory.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerTimezone.java
Outdated
Show resolved
Hide resolved
shardulm94
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pvary LGTM! Thanks for fixing this!
|
Thanks @shardulm94 and @marton-bod for the review! |
|
@shardulm94, @marton-bod: Added the fix for timestamp.withZone in #2278 |
| // We have to use the LocalDateTime to get the micros. See the comment above. | ||
| private static long microsFromTimestamp(Timestamp timestamp) { | ||
| return DateTimeUtil.microsFromInstant(timestamp.toInstant()); | ||
| return DateTimeUtil.microsFromTimestamp(timestamp.toLocalDateTime()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pvary This change breaks test:
./gradlew clean :iceberg-mr:test --tests org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory
...
> Task :iceberg-mr:test FAILED
org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory > testTimestampType FAILED
java.lang.AssertionError: expected:<1349154977123456> but was:<1349129777123456>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory.assertPredicatesMatch(TestHiveIcebergFilterFactory.java:268)
at org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory.testTimestampType(TestHiveIcebergFilterFactory.java:248)
16 tests completed, 1 failed
When run in non-UTC environments. I assume the test may need to change to adjust to the changes being made in #2278 to handle predicate pushdown for Timestamp.withZone().
I'm surprised this is not caught by the CI checks, but maybe the CI runs in UTC - is there a way that we can run the tests in a few additional Timezones to validate?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even more surprised to see that, since I run my tests in CET and they are running without problem. What timezone are you using? Are you using stock Hive?
Thanks,
Peter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in PST, if you replace https://github.com/apache/iceberg/blob/master/mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergFilterFactory.java#L240-L242 for:
TimeZone defaultTz = TimeZone.getDefault();
try {
TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"));
UnboundPredicate actual = (UnboundPredicate) HiveIcebergFilterFactory.generateFilterExpression(arg);
assertPredicatesMatch(expected, actual);
} finally {
TimeZone.setDefault(defaultTz);
}
to set the TimeZone, you should be able to repro - conversely if I use "UTC" instead of "America/Los_Angeles" the test pass.
I'm running the unit test out of the master branch, with:
./gradlew clean :iceberg-mr:test --tests org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, if I use "CET" it also fails:
org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory > testTimestampType FAILED
java.lang.AssertionError: expected:<1349154977123456> but was:<1349162177123456>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory.assertPredicatesMatch(TestHiveIcebergFilterFactory.java:266)
at org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory.testTimestampType(TestHiveIcebergFilterFactory.java:246)
16 tests completed, 1 failed
For CET, you can see the difference between the expected value and the actual value is exactly 2 hrs in microseconds (7.2 10^9) - the actual value is ahead of the expected one (which is in UTC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe intellij made fun of me, or I messed up with my settings. Will definitely check this out soon.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have some time, could you please check that with #2278 the tests run correctly on you side?
In the meantime I try to find out what is the process of reverting changes.
Thanks,
Peter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @pvary - test works now.
This reverts commit 23735d1.
Hive: Fix predicate pushdown for Date (apache#2254)
When Hive query
SELECT * from date_test WHERE d_date='1998-02-19'contains a date literal as a predicated Iceberg filter fails with the following exception:There are 2 issue:
GenericRecord.get()expect the value to be as an Integer, but it gets LocalDate instead.