-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Add a util to read and write partition stats #9170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6132093 to
f507698
Compare
f507698 to
899910d
Compare
| import org.apache.iceberg.types.Types; | ||
|
|
||
| public class PartitionEntry implements IndexedRecord { | ||
| private PartitionData partitionData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields are based on the spec.
https://github.com/apache/iceberg/blob/main/format/spec.md#partition-statistics-file
| return new PartitionEntry(); | ||
| } | ||
|
|
||
| public PartitionEntry build() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using the builder instead of Immutables because these objects will be mutable in a partition stats map. Will be adding the update functions during impl.
| throw new IllegalArgumentException("getting schema for an unpartitioned table"); | ||
| } | ||
|
|
||
| return new org.apache.iceberg.Schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields are based on the spec.
https://github.com/apache/iceberg/blob/main/format/spec.md#partition-statistics-file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Even the optional and required fields decision is based on the spec.
| import org.apache.iceberg.avro.AvroSchemaUtil; | ||
| import org.apache.iceberg.types.Types; | ||
|
|
||
| public class PartitionEntry implements IndexedRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extends IndexedRecord to use the existing Parquet/ORC Avro reader and writer from the iceberg-parquet and iceberg-orc module.
| } | ||
|
|
||
| private static void validateFormat(String filePath) { | ||
| if (!filePath.toLowerCase().endsWith(PARQUET_SUFFIX)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other formats will be supported in the follow up PR
|
@aokolnychyi: Can this PR be reviewed? I know, I need to rework or analyze about the final spark action to collect the partition stats. So, Please take a look. |
|
closing this in favour of #10176 which has an end to end solution. |
PartitionEntryto hold the entries of partition statsiceberg-datamodule.Engines will use these generic writers and readers.
TODO: Support ORC and Avro format as partition stats format in the follow up PR.
Fixes: #8455, #8456