Core: Fix numeric overflow of timestamp nano literal #11775

ebyhr · 2024-12-13T06:09:32Z

The previous logic leads to overflow when we pass timestamp nanos long value.

amogh-jahagirdar · 2024-12-16T17:12:37Z

api/src/main/java/org/apache/iceberg/expressions/Literals.java

        case TIMESTAMP_NANO:
-          // assume micros and convert to nanos to match the behavior in the timestamp case above
-          return new TimestampLiteral(value()).to(type);
+          return (Literal<T>) new TimestampNanoLiteral(value());


This change seems correct to me, the previous behavior for this was to assume the value was in microseconds and then pass that through to TimestampLiteral but that can overflow and does not actually represent a nanosecond timestamp!

FWIW I still think this is correct but it's worth getting other's perspective on this since we are changing one of the assumptions of how value is interpreted when the type to convert to is nanoseconds.

CC @nastra @epgif @jacobmarble @rdblue thoughts?

I think both before and after this change is correct. In Iceberg we had the assumption that everything is in microseconds. But this doesn't hold anymore now we have nano's. I do think the version after the change is more correct and more closely aligns with my expectations. If we can make sure that folks are not using this yet, I think this change is a good one 👍

I had a chat with @rdblue who reviewed the PR that introduced this, and it is actually on purpose. Spark always passes in microseconds, changing this would break this assumption with Spark. So I think we have to revert this line. That said, I do think we need to check (and raise an error) when it overflows. Easiest way of doing this is by converting it to nano's, and convert is back to micro's and check if it still the same value.

Thank you for sharing the context. What about other query engines? Actually, I found this issue when I was trying to support nanosecond precision in Trino Iceberg connector. As you may know, the max precision in Trino is picos (12).

@stevenzwu I don't think we want to discuss the best practice for partitioning. Iceberg spec allows such a partition spec, right? It's just one of our test scenarios at Starburst, not a customer's table definition.

sure. But at least the broken behavior using string literal with identity partition transformation on timestamp_ns column has minimal/no impact in practice, since it is not really a viable setup with production workload. the string literal problem can be tracked separately than the long literal issue.

Come back to this PR/issue, we need to decide on what to do with long literal for timestamp_ns. there are two options (1) derive/imply the precision (micro or nano) based the timestamp column type, which is the current approach this PR (2) always assume micro second precision for consistent interpretation behavior regardless of column timestamp precision, which is the current behavior and where Ryan coming from.

It seems that there is no consensus on adopting the behavior change of option 1. To move forward without changing current behavior and causing confusions to users, we can adopt these two steps

for the immediate step, fail the long literal (assumed micro precision) to nano ts with an explicit and more friendly error msg (suggested by Russel). For now, users should use the string literal with hour/day/week/month transformation on timestamp_nano partition column.

for the longer term, we can expose public methods for engines/clients to construct TimestampLiteralNano directly. Then long literal can used correctly with timestamp nano type, e.g. ts < timestamp_nano(1230219000123123)

What about these two options of new public API for literal construction in Expressions class? With the new pubic constructor, would it be enough for engines to support nano timestamp?

public static Literal<TimestampNanoLiteral> litTimestampNano(long value)

With this API, clients/engines could construct literals like this when they know the long value contains timestamp in nano.

Literal<?> columnLiteral; if (icebergType.typeId() == TypeID.TIMESTAMP_NANO && columnValue instanceof Long) { columnLiteral = Expressions.litNanoTimestamp(columnValue); } else { columnLiteral = Expressions.lit(columnValue); }

public static <T> Literal<T> lit(T value, TypeID type)

Basically move the above if-else inside this method, which assumes the long value for nano timestamp type has the precision of nano.

Adding such method is insufficient if I understand correctly. For instance, the problematic place in Trino depends on Expressions#in(java.lang.String, T...) taking values, not literals.

I just opened #13747 with fixes for the problems raised by this thread. It adds millis, micros, and nanos methods to Expressions that construct timestamp literals from long values. I also updated Literals.from so that these literals can be passed when creating in expressions (see the test).

api/src/test/java/org/apache/iceberg/types/TestConversions.java

api/src/test/java/org/apache/iceberg/expressions/TestTimestampLiteralConversions.java

github-actions · 2025-03-15T00:15:44Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2025-04-26T00:16:22Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2025-05-03T00:16:30Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

ajantha-bhat · 2025-05-28T04:48:18Z

We faced this at Dremio too : #13160

Looking forward to this fix.

api/src/test/java/org/apache/iceberg/expressions/TestTimestampLiteralConversions.java

rdblue · 2025-08-05T21:34:20Z

Following up on the bug described in this comment here instead of in that thread because the thread discusses solutions to handling nanos, not just the overflow problem.

Here's a test that reproduces the problem:

  @Test
  public void testStringToTimestampNanosLiteral() {
    Schema schema = new Schema(
        Types.NestedField.required(1, "id", Types.LongType.get()),
        Types.NestedField.optional(2, "ts", Types.TimestampNanoType.withoutZone()));

    PartitionSpec spec = PartitionSpec.builderFor(schema).identity("ts").build();

    Expression expr = Expressions.equal("ts", "2022-07-26T12:13:14.123456789");
    Expression projected = Projections.inclusive(spec).project(expr);

    Binder.bind(schema.asStruct(), projected);
  }

What's happening is the projection will first bind the original predicate because it needs a bound ID reference rather than a name reference. That ID is used to find the partition fields that can project the predicate. The binding process produces the expected TimestampNanoLiteral(1658837594123456789L). The bug is in the Identity transform's projection code, which needs to produce a new predicate that is unbound and uses a reference for the partition field's name (the partition name does not have to match). When it constructs the new unbound predicate, it passes the underlying value rather than the unchanged literal.

Updating that line to pass the literal instead of the value fixes the problem because it doesn't lose the context that the value was already a nanosecond timestamp value. I'll open a PR to fix this. Hopefully it can make it into 1.10.

rdblue · 2025-08-05T21:45:47Z

Opened #13746.

rdblue · 2025-08-06T16:06:49Z

This was addressed by #13746 and #13747. I'll close this.

github-actions bot added the API label Dec 13, 2024

ebyhr force-pushed the ebi/timestamp-ns branch from 5228ca6 to efe1d14 Compare December 13, 2024 06:15

amogh-jahagirdar reviewed Dec 16, 2024

View reviewed changes

amogh-jahagirdar requested a review from nastra December 17, 2024 00:40

github-actions bot added stale and removed stale labels Mar 15, 2025

github-actions bot added the stale label Apr 26, 2025

github-actions bot closed this May 3, 2025

Fokko reopened this May 6, 2025

Fokko removed the stale label May 6, 2025

ebyhr requested a review from Fokko May 7, 2025 02:38

ebyhr mentioned this pull request May 28, 2025

Literals.LongLiteral conversion issue to TimestampNanoLiteral #13160

Closed

3 tasks

stevenzwu added this to the Iceberg 1.10.0 milestone May 28, 2025

stevenzwu reviewed May 29, 2025

View reviewed changes

api/src/test/java/org/apache/iceberg/expressions/TestTimestampLiteralConversions.java Outdated Show resolved Hide resolved

ebyhr added 3 commits June 16, 2025 12:59

Core: Fix numeric overflow of timestamp nano literal

6b27f04

fixup! Core: Fix numeric overflow of timestamp nano literal

cd056e1

fixup! Core: Fix numeric overflow of timestamp nano literal

c49f691

ebyhr force-pushed the ebi/timestamp-ns branch from c058d3b to b5fb63c Compare June 16, 2025 04:00

fixup! Core: Fix numeric overflow of timestamp nano literal

ca34be0

ebyhr force-pushed the ebi/timestamp-ns branch from b5fb63c to ca34be0 Compare June 16, 2025 06:13

stevenzwu modified the milestones: Iceberg 1.10.0, Iceberg 1.11.0 Jul 8, 2025

rdblue mentioned this pull request Aug 5, 2025

API: Fix timestamp(9) with identity partitioning. #13746

Merged

rdblue mentioned this pull request Aug 5, 2025

API: Add expression factory methods for timestamp literals. #13747

Merged

rdblue closed this Aug 6, 2025

Core: Fix numeric overflow of timestamp nano literal #11775

Core: Fix numeric overflow of timestamp nano literal #11775

Uh oh!

Conversation

ebyhr commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amogh-jahagirdar Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

Fokko Dec 17, 2024

Choose a reason for hiding this comment

Uh oh!

Fokko Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

ebyhr Dec 19, 2024

Choose a reason for hiding this comment

Uh oh!

ebyhr Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ebyhr Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

rdblue Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 15, 2025

Uh oh!

github-actions bot commented Apr 26, 2025

Uh oh!

github-actions bot commented May 3, 2025

Uh oh!

ajantha-bhat commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rdblue commented Aug 5, 2025

Uh oh!

rdblue commented Aug 5, 2025

Uh oh!

rdblue commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ebyhr commented Dec 13, 2024 •

edited

Loading

ebyhr Jun 20, 2025 •

edited

Loading

stevenzwu Jun 23, 2025 •

edited

Loading

stevenzwu Jun 26, 2025 •

edited

Loading

ajantha-bhat commented May 28, 2025 •

edited

Loading