Core: Enhance PartitionsTable.Partition #8501

ajantha-bhat · 2023-09-05T17:06:02Z

PartitionsTable.Partition will be used between Partitions metadata table and partition stats reader-writer.
Hence, move it to a separate class and extend it with Avro's IndexedRecord (for partition stats writing).

Derived from #8488

Fixes #8455

amogh-jahagirdar

Thanks @ajantha-bhat very exciting to see all the work being done on this front! Just had a few comments/questions.

Also if I'm not mistaken the major pending aspect spec-wise is a conclusion on whether we maintain accurate record counts as opposed to eq/position delete counts right?

amogh-jahagirdar · 2023-09-06T01:59:51Z

core/src/main/java/org/apache/iceberg/Partition.java

+import org.apache.iceberg.avro.AvroSchemaUtil;
+import org.apache.iceberg.types.Types;
+
+public class Partition implements IndexedRecord {


Curious does Partition actually need to be public? I was going through the overall PR if I'm not mistaken this would only be in core?

If you see the PR #8503, partition stats reader and writer in the Util directly work with Partition as input and output.

So, when the query engine wants to utilize partition stats for query planning, partition stats reader gives an iterator of Partition. Hence, this has to be public for engine integration module to utilize these stats.

I see, in the end engine integrations would use Partition so it has to be public.

core/src/main/java/org/apache/iceberg/Partition.java

amogh-jahagirdar · 2023-09-06T02:19:12Z

core/src/main/java/org/apache/iceberg/Partition.java

+    return lastUpdatedSnapshotId;
+  }
+
+  synchronized void update(ContentFile<?> file, Snapshot snapshot) {


Nit: why the change to make it synchronized, are we expecting to update the same Partition reference from multiple threads?

It depends on engine integration. Some engine like Dremio, writes manifest files and datafiles concurrently for a single snapshot operations. In that case we need a synchronization as there can be multiple data files belong to same partition.

Sure, I guess I was thinking that only after writing the data files (even if the data files are written concurrently) would we update the partition stats (serially, going through each of the written data files) but as you said this is really an engine implementation detail. I think it is fine to keep this synchronized.

ajantha-bhat · 2023-09-06T04:58:11Z

Also if I'm not mistaken the major pending aspect spec-wise is a conclusion on whether we maintain accurate record counts as opposed to eq/position delete counts right?

Yes, that we will conclude soon and get things moving.

PartitionsTable.Partition will be used between Partitions metadata table and partition stats reader-writer. Hence, move it to a separate class and extend it with Avro's IndexedRecord (for partition stats writing).

github-actions bot added the core label Sep 5, 2023

This was referenced Sep 5, 2023

Core: Add a util to read and write partition stats #8503

Closed

Core: Write partition stats during write operation #8488

Closed

amogh-jahagirdar reviewed Sep 6, 2023

View reviewed changes

Core: Enhance PartitionsTable.Partition

34470b1

PartitionsTable.Partition will be used between Partitions metadata table and partition stats reader-writer. Hence, move it to a separate class and extend it with Avro's IndexedRecord (for partition stats writing).

ajantha-bhat force-pushed the p_enhance branch from 1b8d4ac to 34470b1 Compare September 6, 2023 05:31

ajantha-bhat closed this Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Core: Enhance PartitionsTable.Partition #8501

Core: Enhance PartitionsTable.Partition #8501

Uh oh!

ajantha-bhat commented Sep 5, 2023

Uh oh!

amogh-jahagirdar left a comment •

edited

Loading

Uh oh!

amogh-jahagirdar Sep 6, 2023

Uh oh!

ajantha-bhat Sep 6, 2023

Uh oh!

amogh-jahagirdar Sep 6, 2023

Uh oh!

Uh oh!

amogh-jahagirdar Sep 6, 2023

Uh oh!

ajantha-bhat Sep 6, 2023

Uh oh!

amogh-jahagirdar Sep 6, 2023 •

edited

Loading

Uh oh!

ajantha-bhat commented Sep 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Core: Enhance PartitionsTable.Partition #8501

Core: Enhance PartitionsTable.Partition #8501

Uh oh!

Conversation

ajantha-bhat commented Sep 5, 2023

Uh oh!

amogh-jahagirdar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amogh-jahagirdar Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat Sep 6, 2023

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Sep 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amogh-jahagirdar left a comment •

edited

Loading

amogh-jahagirdar Sep 6, 2023 •

edited

Loading