Skip to content

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Nov 28, 2023

  • Introduce a class PartitionEntry to hold the entries of partition stats
  • Add a Util to read and write Parquet partition stats file in iceberg-data module.
    Engines will use these generic writers and readers.

TODO: Support ORC and Avro format as partition stats format in the follow up PR.

Fixes: #8455, #8456

import org.apache.iceberg.types.Types;

public class PartitionEntry implements IndexedRecord {
private PartitionData partitionData;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return new PartitionEntry();
}

public PartitionEntry build() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the builder instead of Immutables because these objects will be mutable in a partition stats map. Will be adding the update functions during impl.

throw new IllegalArgumentException("getting schema for an unpartitioned table");
}

return new org.apache.iceberg.Schema(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Even the optional and required fields decision is based on the spec.

import org.apache.iceberg.avro.AvroSchemaUtil;
import org.apache.iceberg.types.Types;

public class PartitionEntry implements IndexedRecord {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extends IndexedRecord to use the existing Parquet/ORC Avro reader and writer from the iceberg-parquet and iceberg-orc module.

}

private static void validateFormat(String filePath) {
if (!filePath.toLowerCase().endsWith(PARQUET_SUFFIX)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other formats will be supported in the follow up PR

@ajantha-bhat
Copy link
Member Author

@aokolnychyi: Can this PR be reviewed?

I know, I need to rework or analyze about the final spark action to collect the partition stats.
But this PR is independent of that and it is as per the spec.

So, Please take a look.

@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented Jul 31, 2024

closing this in favour of #10176 which has an end to end solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce PartitionEntry class to represent stats per partition

1 participant