-
Notifications
You must be signed in to change notification settings - Fork 0
Add support for writing timestamps without timezone. #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This also adds AssertJ to testCompile in all modules so assertions can be used elsewhere.
* Spec: Add identifier-field-ids to schema. * Spec: Add section for partition evolution. * Spec: Add schemas list and current-schema-id to table metadata. * Spec: Add key_metadata to manifest list. * Spec: Add schema-id to Snapshot metadata.
`.withFailMessage(..)` was mistakenly used and was therefore overriding the actual error reporting, making debugging difficult.
…#2689) * support custom target name in partition spec builder * address the comments.
…org (apache#2709) Co-authored-by: tgooch <[email protected]>
Param scanAllFiles Used to check whether all the data files should be processed, or only added files.Here we should replace scanAllFiles to !scanAllFiles.
Co-authored-by: Ryan Blue <[email protected]>
Co-authored-by: tgooch <[email protected]>
| public static final String HANDLE_TIMESTAMP_WITHOUT_TIMEZONE_SESSION_PROPERTY = | ||
| "spark.sql.iceberg.handle-timestamp-without-timezone"; | ||
|
|
||
| public static final String STORE_TIMESTAMP_WITHOUT_TIMEZONE_SESSION_PROPERTY = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to just use one property for both reading and writing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there was incorrect name of this property, changed to READ_TIMESTAMP_AS_TIMESTAMP_WITHOUT_TIMEZONE.
This property is responsible for handling how we will represent spark TimestampType type in iceberg. By default spark TimestampType type will be converted to Types.TimestampType.withZone() iceberg type, but if we set READ_TIMESTAMP_AS_TIMESTAMP_WITHOUT_TIMEZONE to true, spark timestamp type will be converted to Types.TimestampType.withoutZone()
| */ | ||
| public static boolean hasTimestampWithoutZone(Schema schema) { | ||
| return TypeUtil.find(schema, t -> | ||
| t.typeId().equals(Type.TypeID.TIMESTAMP) && !((Types.TimestampType) t).shouldAdjustToUTC() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just check against TimestampType.withoutZone() which returns the singleton instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, changed the code
| class SparkTypeToType extends SparkTypeVisitor<Type> { | ||
| private final StructType root; | ||
| private int nextId = 0; | ||
| private final boolean useTimestampWithoutZone; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this flag is for, I think it's probably safer to always return timestamptype.withZone from here, and handle the mismatch outside of this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this logic to SparkFixupTimestampType.java
|
|
||
| public class SparkUtil { | ||
|
|
||
| public static final String HANDLE_TIMESTAMP_WITHOUT_TIMEZONE_FLAG = "spark-handle-timestamp-without-timezone"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Acutally we could probably just use the same session property here, so we just have one property
spark.sql.iceberg.convert-timestamp-without-timezone
Or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return SparkOrcValueWriters.decimal(primitive.getPrecision(), primitive.getScale()); | ||
| case TIMESTAMP_INSTANT: | ||
| case TIMESTAMP: | ||
| return SparkOrcValueWriters.timestampTz(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need corresponding code in the ParquetWriter as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no
|
|
||
| private StructType lazyType() { | ||
| if (type == null) { | ||
| Preconditions.checkArgument(readTimestampWithoutZone || !SparkUtil.hasTimestampWithoutZone(lazySchema()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually cancel what I said about those other spots, this seems like a great place to check for whether we are allowed to do the TZ conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so we don't need any changes here?
…ted level should be one of the following: 6, 8. `
…comments. Fixed code formatting
e869a15 to
bc316c4
Compare
I thinks I need to close this PR, main PR is apache#2757 |
Add
spark.sql.iceberg.store-timestamp-without-zonespark config to indicate which iceberg type (Types.TimestampType.withZone() or Types.TimestampType.withoutZone()) will be used for sparkTimestampTypetype