-
Notifications
You must be signed in to change notification settings - Fork 3k
API: Fix timestamp(9) with identity partitioning. #13746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Fix timestamp(9) with identity partitioning. #13746
Conversation
|
@ebyhr, I think this fixes your timestamp nanos issue. |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find, and thanks for the minimal repro test. It makes sense that the unbound predicate that's produced needs to be based off of the literal to preserve the fact that it's a timestamp nano; with extracting the value, the previous logic would surface a predicate based on a long which would then be interpreted as micros and incorrectly undergo a conversion to nanos.
ebyhr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue Thank you for opening this PR! I internally verified that this change fixes our issue.
| org.apache.iceberg.Schema schema = | ||
| new org.apache.iceberg.Schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The package org.apache.iceberg looks redundant as this class already imports Schema.
This fixes the bug described in #11775 (comment), which was caused by incorrectly constructing a new predicate using a literal value instead of the literal itself.
Here's a test that reproduces the problem:
What's happening is the projection will first bind the original predicate because it needs a bound ID reference rather than a name reference. That ID is used to find the partition fields that can project the predicate. The binding process produces the expected TimestampNanoLiteral(1658837594123456789L). The bug is in the Identity transform's projection code, which needs to produce a new predicate that is unbound and uses a reference for the partition field's name (the partition name does not have to match). When it constructs the new unbound predicate, it passes the underlying value rather than the unchanged literal.
Updating that line to pass the literal instead of the value fixes the problem because it doesn't lose the context that the value was already a nanosecond timestamp value.