Skip to content

Conversation

@jacobmarble
Copy link
Contributor

Helps #8657

@github-actions github-actions bot added the API label Sep 26, 2023
@jacobmarble
Copy link
Contributor Author

I'm a new contributor to the project. It appears that the API component is a dependency of all other components, so I'm starting the feature there. My next PR would similarly modify the Core component.

@jacobmarble
Copy link
Contributor Author

Maintainers: I've fixed the CI check in my first commit; please run the workflows again.

.put(TimeType.get().toString(), TimeType.get())
.put(TimestampType.withZone().toString(), TimestampType.withZone())
.put(TimestampType.withoutZone().toString(), TimestampType.withoutZone())
.put(TimestampnsType.withZone().toString(), TimestampnsType.withZone())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General point: Should this be a new type, or should Timestamp instead have a precision field?

One thing that would speak for the latter is that with /without time zone is also not an own type but instead a field of timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JFinis we discussed timestamp[MILLIS|MICROS|NANOS] in the proposal/design doc, but decided against it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobmarble I don't see any discussion or any rationale in the document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could argue that so far the design was to separate different timestamps through fields, not new types. This deviates from the design so far. I would like to understand the rationale for this deviation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JFinis there are resolved comments in the doc, visible without being the document owner, but you'll have to click the "open comment history" button to see them.

The document history is not visible to anyone but the owner, so I've attached PDFs at two stages: my original draft (7-sep-2023) and the doc as understood at the last community sync (19-sep-2023).

At the very end of the last community sync, I understood that we had enough consensus to start writing code.

Nanosecond Timestamps in Apache Iceberg 7-sep-2023.pdf
Nanosecond Timestamps in Apache Iceberg 19-sep-2023.pdf

the design was to separate different timestamps through fields, not new types

Help me out, where do you read that?

Copy link
Contributor

@JFinis JFinis Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now I see the discussions! That being said, I don't see a discussion that discusses whether a new type should be used vs. a new field in the timestamp type. Can you point me to that discussion? Maybe I'm missing the forest for the trees here. Or the discussion was removed due to the text to which it refers to being removed?

I see that it was a suggestion in your old versions. Therefore, I would like to understand the rationale of the person who was against it. Can you maybe tag them here? Then they can chime in with their reasoning.

the design was to separate different timestamps through fields, not new types

Help me out, where do you read that?

I read that in the code itself: There is only one timestamp type with an adjust-to-utc flag. Instead, there could have been two types, timestamp and timestamptz. Thus, the person doing the initial design decided for a flag instead of two different types.

And now you add another dimension in which timestamps can be different. Thus, the design closest to the initial design would be to have yet another (enum) field on the timestamp type instead of a new type. This would also help with the clumsy backward compatible naming of timestamp (without any suffix) and now timestamp_ns, where one type has a suffix while the other one hasn't. With a new field, everything would just stay timestamp and the field would implicitly be Millis, if it's not set.

So at least for me, this points towards having the field. Though, maybe there is a disadvantage of the field approach I don't see here.

Copy link
Contributor Author

@jacobmarble jacobmarble Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a discussion that discusses whether a new type should be used vs. a new field in the timestamp type

There is only one timestamp type with an adjust-to-utc flag. Instead, there could have been two types, timestamp and timestamptz.

clumsy backward compatible naming of timestamp (without any suffix) and now timestamp_ns, where one type has a suffix while the other one hasn't.

maybe there is a disadvantage of the field approach I don't see here.

We may be talking past each other wrt the Iceberg types vs the Java implementation of those types.

  • The design considers existing Iceberg types timestamp and timestamptz (microsecond units), then adds new types timestamp_ns and timestamptz_ns (nanosecond units).

    • @rdblue guided the design decision away from (1) parameterized type names timestamp[MILLIS|MICROS|NANOS] and (2) adding milliseconds at the same time we add nanoseconds.
  • This proposal for Java implementation of the design considers existing Java type TimestampType (including field private final boolean adjustToUTC), then adds new type TimestampnsType (including field private final boolean adjustToUTC).

    • The concern @JFinis raises is that Java TimestampType already represents two Iceberg types, so it would make sense that this be further extended to represent four timestamp types.
    • Do I understand you correctly?
    • If so: I did start this change by extending Java type TimestampType with a new field, but backed out and started over with a new Java type. I was overwhelmed by the variety of cases where the micro/nano branch would be needed. In contrast, adding a new type made it easier for me to self-audit and test, and removed any confusion to a user of the API.
    • As a side note, I also considered refactoring the Java implementation of Iceberg types date and time to be variants of TimestampType. Again, readability of this PR won in my mind.
    • The approach could certainly be revisited.

With a new field, everything would just stay timestamp and the field would implicitly be Millis, if it's not set.

nit: Currently, timestamp and timestamptz represent microseconds, not milliseconds. And yes, I had proposed that timestamp continue to refer to microseconds for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JFinis PTAL

It was decided in the spec PR that we should add milliseconds, after all.

In this PR, I've split the work into three commits:

  • add class Timestampns
  • add class Timestampms
  • remove the above classes, replace them with a new field in Timestamp

@Fokko
Copy link
Contributor

Fokko commented Sep 29, 2023

@jacobmarble Could you create a PR to update the specification first? https://github.com/apache/iceberg/blob/master/format/spec.md

@jacobmarble
Copy link
Contributor Author

@jacobmarble Could you create a PR to update the specification first? https://github.com/apache/iceberg/blob/master/format/spec.md

Done: #8683

@jacobmarble jacobmarble force-pushed the jgm-timestamp-nanos branch 2 times, most recently from 5c9e95f to 8603594 Compare October 9, 2023 17:22
@jacobmarble jacobmarble changed the title API: add type Timestampns API: add nanosecond variant of Timestamp type Oct 16, 2023
Helps apache#8657

This change adds field `ChronoUnit unit` to `TimestampType`, such that
`TimestampType` now represents four specified types:
- `timestamp` (existing)
- `timestamptz` (existing)
- `timestamp_ns` (new apache#8683)
- `timestamptz_ns` (new apache#8683)
@jacobmarble jacobmarble deleted the jgm-timestamp-nanos branch October 31, 2023 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants