Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Aug 5, 2025

This fixes the bug described in #11775 (comment), which was caused by incorrectly constructing a new predicate using a literal value instead of the literal itself.

Here's a test that reproduces the problem:

  @Test
  public void testStringToTimestampNanosLiteral() {
    Schema schema = new Schema(
        Types.NestedField.required(1, "id", Types.LongType.get()),
        Types.NestedField.optional(2, "ts", Types.TimestampNanoType.withoutZone()));

    PartitionSpec spec = PartitionSpec.builderFor(schema).identity("ts").build();

    Expression expr = Expressions.equal("ts", "2022-07-26T12:13:14.123456789");
    Expression projected = Projections.inclusive(spec).project(expr);

    Binder.bind(schema.asStruct(), projected);
  }

What's happening is the projection will first bind the original predicate because it needs a bound ID reference rather than a name reference. That ID is used to find the partition fields that can project the predicate. The binding process produces the expected TimestampNanoLiteral(1658837594123456789L). The bug is in the Identity transform's projection code, which needs to produce a new predicate that is unbound and uses a reference for the partition field's name (the partition name does not have to match). When it constructs the new unbound predicate, it passes the underlying value rather than the unchanged literal.

Updating that line to pass the literal instead of the value fixes the problem because it doesn't lose the context that the value was already a nanosecond timestamp value.

@rdblue rdblue added this to the Iceberg 1.10.0 milestone Aug 5, 2025
@github-actions github-actions bot added the API label Aug 5, 2025
@rdblue rdblue requested a review from stevenzwu August 5, 2025 21:44
@rdblue
Copy link
Contributor Author

rdblue commented Aug 5, 2025

@ebyhr, I think this fixes your timestamp nanos issue.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find, and thanks for the minimal repro test. It makes sense that the unbound predicate that's produced needs to be based off of the literal to preserve the fact that it's a timestamp nano; with extracting the value, the previous logic would surface a predicate based on a long which would then be interpreted as micros and incorrectly undergo a conversion to nanos.

Copy link
Contributor

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue Thank you for opening this PR! I internally verified that this change fixes our issue.

Comment on lines +96 to +97
org.apache.iceberg.Schema schema =
new org.apache.iceberg.Schema(
Copy link
Contributor

@ebyhr ebyhr Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The package org.apache.iceberg looks redundant as this class already imports Schema.

@danielcweeks danielcweeks merged commit 64a7ca5 into apache:main Aug 6, 2025
43 checks passed
@rdblue rdblue deleted the fix-timestamp-nano-with-identity-partition branch August 6, 2025 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants