Skip to content

Conversation

@pvary
Copy link
Contributor

@pvary pvary commented Feb 19, 2021

When Hive query SELECT * from date_test WHERE d_date='1998-02-19' contains a date literal as a predicated Iceberg filter fails with the following exception:

Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.IllegalStateException: Not an instance of java.lang.Integer: 2130-01-20
        at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:499)
        at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:307)
        at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:878)
        at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559)
        at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:551)
        at org.apache.iceberg.mr.hive.TestHiveShell.executeStatement(TestHiveShell.java:143)
        ... 62 more
Caused by: java.io.IOException: java.lang.IllegalStateException: Not an instance of java.lang.Integer: 2130-01-20
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
        at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494)
        ... 67 more
Caused by: java.lang.IllegalStateException: Not an instance of java.lang.Integer: 2130-01-20
        at org.apache.iceberg.data.GenericRecord.get(GenericRecord.java:123)
        at org.apache.iceberg.Accessors$PositionAccessor.get(Accessors.java:71)
        at org.apache.iceberg.Accessors$PositionAccessor.get(Accessors.java:58)
        at org.apache.iceberg.expressions.BoundReference.eval(BoundReference.java:39)
        at org.apache.iceberg.expressions.Evaluator$EvalVisitor.eq(Evaluator.java:132)
        at org.apache.iceberg.expressions.Evaluator$EvalVisitor.eq(Evaluator.java:52)
        at org.apache.iceberg.expressions.ExpressionVisitors$BoundVisitor.predicate(ExpressionVisitors.java:249)
        at org.apache.iceberg.expressions.ExpressionVisitors.visitEvaluator(ExpressionVisitors.java:346)
        at org.apache.iceberg.expressions.Evaluator$EvalVisitor.eval(Evaluator.java:57)
        at org.apache.iceberg.expressions.Evaluator$EvalVisitor.access$100(Evaluator.java:52)
        at org.apache.iceberg.expressions.Evaluator.eval(Evaluator.java:49)
        at org.apache.iceberg.mr.mapreduce.IcebergInputFormat$IcebergRecordReader.lambda$applyResidualFiltering$0(IcebergInputFormat.java:288)
        at org.apache.iceberg.io.CloseableIterable$3.shouldKeep(CloseableIterable.java:82)
        at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:67)
        at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:50)
        at org.apache.iceberg.mr.mapreduce.IcebergInputFormat$IcebergRecordReader.nextKeyValue(IcebergInputFormat.java:202)
        at org.apache.iceberg.mr.mapred.MapredIcebergInputFormat$MapredIcebergRecordReader.next(MapredIcebergInputFormat.java:104)
        at org.apache.iceberg.mr.mapred.MapredIcebergInputFormat$MapredIcebergRecordReader.next(MapredIcebergInputFormat.java:81)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:488)
        ... 71 more

There are 2 issue:

  1. Literal conversion is problematic as Hive uses Date with local timezone, so we have to fix the conversion
  2. When comparing the Record with the literal GenericRecord.get() expect the value to be as an Integer, but it gets LocalDate instead.

@shardulm94
Copy link
Contributor

The manifest and residual evaluator expect the following java types for each data type

I think we will need to do something very similar to wrapping the record into an InternalRecordWrapper like here

InternalRecordWrapper wrapper = new InternalRecordWrapper(recordSchema.asStruct());

That should handle conversion between types used by iceberg-data v/s the ones used by the evaluators.

@pvary
Copy link
Contributor Author

pvary commented Feb 19, 2021

I think we will need to do something very similar to wrapping the record into an InternalRecordWrapper like here

InternalRecordWrapper wrapper = new InternalRecordWrapper(recordSchema.asStruct());

That works! Thanks for the pointer. Updated the PR

…they were correct in the first place

Also addressed Marton's comment
@pvary
Copy link
Contributor Author

pvary commented Feb 22, 2021

@shardulm94: Could you please review as a committer, so I can merge?
Thanks,
Peter

Peter Vary added 2 commits February 25, 2021 12:07
…ly. Not doing anything helps to highlihgt issues faster if there is a problem with it
Copy link
Contributor

@shardulm94 shardulm94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pvary LGTM! Thanks for fixing this!

@pvary pvary merged commit 23735d1 into apache:master Feb 26, 2021
@pvary
Copy link
Contributor Author

pvary commented Feb 26, 2021

Thanks @shardulm94 and @marton-bod for the review!

@pvary pvary deleted the fixdate branch February 26, 2021 11:51
@pvary
Copy link
Contributor Author

pvary commented Feb 26, 2021

@shardulm94, @marton-bod: Added the fix for timestamp.withZone in #2278

// We have to use the LocalDateTime to get the micros. See the comment above.
private static long microsFromTimestamp(Timestamp timestamp) {
return DateTimeUtil.microsFromInstant(timestamp.toInstant());
return DateTimeUtil.microsFromTimestamp(timestamp.toLocalDateTime());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pvary This change breaks test:

./gradlew clean :iceberg-mr:test --tests org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory
...
> Task :iceberg-mr:test FAILED

org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory > testTimestampType FAILED
    java.lang.AssertionError: expected:<1349154977123456> but was:<1349129777123456>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:834)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:144)
        at org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory.assertPredicatesMatch(TestHiveIcebergFilterFactory.java:268)
        at org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory.testTimestampType(TestHiveIcebergFilterFactory.java:248)

16 tests completed, 1 failed

When run in non-UTC environments. I assume the test may need to change to adjust to the changes being made in #2278 to handle predicate pushdown for Timestamp.withZone().

I'm surprised this is not caught by the CI checks, but maybe the CI runs in UTC - is there a way that we can run the tests in a few additional Timezones to validate?

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even more surprised to see that, since I run my tests in CET and they are running without problem. What timezone are you using? Are you using stock Hive?

Thanks,
Peter

Copy link
Contributor

@edgarRd edgarRd Feb 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in PST, if you replace https://github.com/apache/iceberg/blob/master/mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergFilterFactory.java#L240-L242 for:

    TimeZone defaultTz = TimeZone.getDefault();
    try {
      TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles"));
      UnboundPredicate actual = (UnboundPredicate) HiveIcebergFilterFactory.generateFilterExpression(arg);
      assertPredicatesMatch(expected, actual);
    } finally {
      TimeZone.setDefault(defaultTz);
    }

to set the TimeZone, you should be able to repro - conversely if I use "UTC" instead of "America/Los_Angeles" the test pass.

I'm running the unit test out of the master branch, with:

./gradlew clean :iceberg-mr:test --tests org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, if I use "CET" it also fails:

org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory > testTimestampType FAILED
    java.lang.AssertionError: expected:<1349154977123456> but was:<1349162177123456>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:834)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:144)
        at org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory.assertPredicatesMatch(TestHiveIcebergFilterFactory.java:266)
        at org.apache.iceberg.mr.hive.TestHiveIcebergFilterFactory.testTimestampType(TestHiveIcebergFilterFactory.java:246)

16 tests completed, 1 failed

For CET, you can see the difference between the expected value and the actual value is exactly 2 hrs in microseconds (7.2 10^9) - the actual value is ahead of the expected one (which is in UTC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe intellij made fun of me, or I messed up with my settings. Will definitely check this out soon.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have some time, could you please check that with #2278 the tests run correctly on you side?
In the meantime I try to find out what is the process of reverting changes.

Thanks,
Peter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgarRd: Pushed the fix here #2283. Could you please confirm, that this issue is fixed?

Thanks,
Peter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @pvary - test works now.

pvary pushed a commit to pvary/iceberg that referenced this pull request Feb 27, 2021
bkahloon added a commit to bkahloon/iceberg that referenced this pull request Feb 27, 2021
Hive: Fix predicate pushdown for Date (apache#2254)
aokolnychyi pushed a commit that referenced this pull request Mar 24, 2021
@aokolnychyi aokolnychyi added this to the Java 0.11.1 Release milestone Mar 30, 2021
coolderli pushed a commit to coolderli/iceberg that referenced this pull request Apr 26, 2021
stevenzwu pushed a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants