-
Notifications
You must be signed in to change notification settings - Fork 3k
API: add nanosecond variant of Timestamp type #8658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm a new contributor to the project. It appears that the API component is a dependency of all other components, so I'm starting the feature there. My next PR would similarly modify the Core component. |
|
Maintainers: I've fixed the CI check in my first commit; please run the workflows again. |
| .put(TimeType.get().toString(), TimeType.get()) | ||
| .put(TimestampType.withZone().toString(), TimestampType.withZone()) | ||
| .put(TimestampType.withoutZone().toString(), TimestampType.withoutZone()) | ||
| .put(TimestampnsType.withZone().toString(), TimestampnsType.withZone()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General point: Should this be a new type, or should Timestamp instead have a precision field?
One thing that would speak for the latter is that with /without time zone is also not an own type but instead a field of timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JFinis we discussed timestamp[MILLIS|MICROS|NANOS] in the proposal/design doc, but decided against it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacobmarble I don't see any discussion or any rationale in the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could argue that so far the design was to separate different timestamps through fields, not new types. This deviates from the design so far. I would like to understand the rationale for this deviation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JFinis there are resolved comments in the doc, visible without being the document owner, but you'll have to click the "open comment history" button to see them.
The document history is not visible to anyone but the owner, so I've attached PDFs at two stages: my original draft (7-sep-2023) and the doc as understood at the last community sync (19-sep-2023).
At the very end of the last community sync, I understood that we had enough consensus to start writing code.
Nanosecond Timestamps in Apache Iceberg 7-sep-2023.pdf
Nanosecond Timestamps in Apache Iceberg 19-sep-2023.pdf
the design was to separate different timestamps through fields, not new types
Help me out, where do you read that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, now I see the discussions! That being said, I don't see a discussion that discusses whether a new type should be used vs. a new field in the timestamp type. Can you point me to that discussion? Maybe I'm missing the forest for the trees here. Or the discussion was removed due to the text to which it refers to being removed?
I see that it was a suggestion in your old versions. Therefore, I would like to understand the rationale of the person who was against it. Can you maybe tag them here? Then they can chime in with their reasoning.
the design was to separate different timestamps through fields, not new types
Help me out, where do you read that?
I read that in the code itself: There is only one timestamp type with an adjust-to-utc flag. Instead, there could have been two types, timestamp and timestamptz. Thus, the person doing the initial design decided for a flag instead of two different types.
And now you add another dimension in which timestamps can be different. Thus, the design closest to the initial design would be to have yet another (enum) field on the timestamp type instead of a new type. This would also help with the clumsy backward compatible naming of timestamp (without any suffix) and now timestamp_ns, where one type has a suffix while the other one hasn't. With a new field, everything would just stay timestamp and the field would implicitly be Millis, if it's not set.
So at least for me, this points towards having the field. Though, maybe there is a disadvantage of the field approach I don't see here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a discussion that discusses whether a new type should be used vs. a new field in the timestamp type
There is only one timestamp type with an adjust-to-utc flag. Instead, there could have been two types, timestamp and timestamptz.
clumsy backward compatible naming of timestamp (without any suffix) and now timestamp_ns, where one type has a suffix while the other one hasn't.
maybe there is a disadvantage of the field approach I don't see here.
We may be talking past each other wrt the Iceberg types vs the Java implementation of those types.
-
The design considers existing Iceberg types
timestampandtimestamptz(microsecond units), then adds new typestimestamp_nsandtimestamptz_ns(nanosecond units).- @rdblue guided the design decision away from (1) parameterized type names
timestamp[MILLIS|MICROS|NANOS]and (2) adding milliseconds at the same time we add nanoseconds.
- @rdblue guided the design decision away from (1) parameterized type names
-
This proposal for Java implementation of the design considers existing Java type
TimestampType(including fieldprivate final boolean adjustToUTC), then adds new typeTimestampnsType(including fieldprivate final boolean adjustToUTC).- The concern @JFinis raises is that Java
TimestampTypealready represents two Iceberg types, so it would make sense that this be further extended to represent four timestamp types. - Do I understand you correctly?
- If so: I did start this change by extending Java type
TimestampTypewith a new field, but backed out and started over with a new Java type. I was overwhelmed by the variety of cases where the micro/nano branch would be needed. In contrast, adding a new type made it easier for me to self-audit and test, and removed any confusion to a user of the API. - As a side note, I also considered refactoring the Java implementation of Iceberg types
dateandtimeto be variants ofTimestampType. Again, readability of this PR won in my mind. - The approach could certainly be revisited.
- The concern @JFinis raises is that Java
With a new field, everything would just stay timestamp and the field would implicitly be Millis, if it's not set.
nit: Currently, timestamp and timestamptz represent microseconds, not milliseconds. And yes, I had proposed that timestamp continue to refer to microseconds for backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JFinis PTAL
It was decided in the spec PR that we should add milliseconds, after all.
In this PR, I've split the work into three commits:
- add class
Timestampns - add class
Timestampms - remove the above classes, replace them with a new field in
Timestamp
|
@jacobmarble Could you create a PR to update the specification first? https://github.com/apache/iceberg/blob/master/format/spec.md |
Done: #8683 |
5c9e95f to
8603594
Compare
Helps apache#8657 This change adds field `ChronoUnit unit` to `TimestampType`, such that `TimestampType` now represents four specified types: - `timestamp` (existing) - `timestamptz` (existing) - `timestamp_ns` (new apache#8683) - `timestamptz_ns` (new apache#8683)
56debfb to
6d3d4cd
Compare
Helps #8657