Adding specId for partitions metadata table #4516

szlta · 2022-04-06T15:47:59Z

As per #4292, PartitionsTable should call Partitioning.partitionType() to gather the partition struct in its schema. Also the result should show specId too, as the partition key might not be considered a unique key on its own without specId in V2 format. (E.g a partition column removed then re-added along with another column..)

This is just the initial change so that I can see what tests would need fixing and to check with community if the new schema is okay like this:

spec_id
partition
record_count
file_count

I think in other meta tables the spec_id precedes the partition key in the output, so I prepended the existing schema here with it. I'm not sure if this counts as a backward-incompatible change though. What do you think @aokolnychyi ?

szehon-ho

Had one doubt on this change, looks good otherwise.

Yea when adding the spec_id to files table (#3015) , @rdblue had mentioned its ok to insert new field into the middle near partition , as nobody should be depending on the field order, so I am guessing its ok in this table?

szehon-ho · 2022-04-07T17:27:24Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

    private int fileCount;

    Partition(StructLike key) {
+      this.specId = 0;


I'm wondering one thing , didn't test it myself.

If we have two partitions with different spec id but different key, it will show up with a random one. Could this case happen (ie, adding and removing a field in partition spec should give a new spec id but can still produce the same partition value?)? If so, we may need to add spec-id to key?

I believe you meant "different spec id but same key"?
Normally if you remove a field from partition spec and then re-add it, you will end up with a former spec and Iceberg will not create a new spec, just to point to the older one. In some cases though, if you re-add the column along with some other columns (forming the new spec together), then you could end up with the problem described here: #3411 (comment). That's something we have to solve too, but in another PR, as the scope of that actually covers other metadata tables too that use Partitioning.partitionType()

I see, yea you are right, seems Iceberg re-uses the spec id if we revert the spec to a former one.

rdblue · 2022-04-10T21:19:47Z

core/src/main/java/org/apache/iceberg/PartitionsTable.java

-        Types.NestedField.required(1, "partition", table.spec().partitionType()),
-        Types.NestedField.required(2, "record_count", Types.LongType.get()),
-        Types.NestedField.required(3, "file_count", Types.IntegerType.get())
+        Types.NestedField.required(1, "spec_id", Types.IntegerType.get()),


It is fine to add a new field, but you should not change the IDs of the other fields.

Okay, in this case spec_id will be appended to the schema then.

Hm I thought it means, we just need to keep the id of the other fields, but could append in any position? ex,

Types.NestedField.required(4, "spec_id", Types.IntegerType.get()), Types.NestedField.required(1, "partition", table.spec().partitionType()), Types.NestedField.required(2, "record_count", Types.LongType.get()), Types.NestedField.required(3, "file_count", Types.IntegerType.get())

Though I think its not a big deal, and we can put it in the end.

szehon-ho

It looks ok to me.

szehon-ho · 2022-04-12T19:00:30Z

Thanks @szlta for change and @rdblue for taking a look, let's work on the subsequent issue.

szlta · 2022-04-13T08:06:59Z

Thanks a lot for the review @szehon-ho and @rdblue

szehon-ho · 2022-04-14T16:29:47Z

I tried to test these cases, while this fixes the PartitionsTable schema, some of the partition entries may have problems (as current spec is still used in a few other places). I made #4560 pr as a follow up (@szlta sorry I was not sure if you were working on going to work on this or something else).

(cherry picked from commit b3a27c6)

initial change

a5a92a9

github-actions bot added the core label Apr 6, 2022

Spark test fixes

fc40183

github-actions bot added the spark label Apr 7, 2022

szehon-ho reviewed Apr 7, 2022

View reviewed changes

rdblue reviewed Apr 10, 2022

View reviewed changes

reordering spec_id

db6e8a0

szlta force-pushed the specIdToPartitionsTable branch from 44cf239 to db6e8a0 Compare April 11, 2022 13:31

szehon-ho approved these changes Apr 12, 2022

View reviewed changes

szehon-ho merged commit b3a27c6 into apache:master Apr 12, 2022

szehon-ho mentioned this pull request Apr 14, 2022

Core: Fix Partitions table for evolved partition specs #4560

Merged

szehon-ho mentioned this pull request Jun 21, 2022

StructLikeWrapper equals method is broken #5064

Closed

lvyanquan mentioned this pull request Aug 29, 2022

Doc: Update doc to display the results of the table partitions query #5662

Merged

sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023

Core: Add specId for partitions metadata table (apache#4516)

8de651c

(cherry picked from commit b3a27c6)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding specId for partitions metadata table #4516

Adding specId for partitions metadata table #4516

Uh oh!

szlta commented Apr 6, 2022

Uh oh!

szehon-ho left a comment •

edited

Loading

Uh oh!

szehon-ho Apr 7, 2022

Uh oh!

szlta Apr 11, 2022

Uh oh!

szehon-ho Apr 12, 2022

Uh oh!

rdblue Apr 10, 2022

Uh oh!

szlta Apr 11, 2022

Uh oh!

szehon-ho Apr 11, 2022

Uh oh!

szehon-ho left a comment

Uh oh!

szehon-ho commented Apr 12, 2022

Uh oh!

szlta commented Apr 13, 2022

Uh oh!

szehon-ho commented Apr 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adding specId for partitions metadata table #4516

Adding specId for partitions metadata table #4516

Uh oh!

Conversation

szlta commented Apr 6, 2022

Uh oh!

szehon-ho left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 7, 2022

Choose a reason for hiding this comment

Uh oh!

szlta Apr 11, 2022

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 12, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Apr 10, 2022

Choose a reason for hiding this comment

Uh oh!

szlta Apr 11, 2022

Choose a reason for hiding this comment

Uh oh!

szehon-ho Apr 11, 2022

Choose a reason for hiding this comment

Uh oh!

szehon-ho left a comment

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Apr 12, 2022

Uh oh!

szlta commented Apr 13, 2022

Uh oh!

szehon-ho commented Apr 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

szehon-ho left a comment •

edited

Loading