Skip to content

Conversation

@karuppayya
Copy link
Contributor

@karuppayya karuppayya commented Jul 31, 2022

Starting with schema:

{A: 1, B: 2, C:3}

And then a user drops C, resulting in:

{A: 1, B: 2} Highest Field ID = 3

Then do a mergeSchema with

{A, B, D}

This will end up assigning (highestId({A:1, B:2}) + 1) to D (3)
The table would then have schemas :

Schemas {
{A: 1, B: 2, C: 3}
{A: 1, B: 2}
{A: 1, B :2, D: 3}
}

@github-actions github-actions bot added the API label Jul 31, 2022
@rdblue rdblue changed the title [ICEBERG-5394] Assign the right field ids when merging schema API: Assign the right field ids when merging schema, #5394 Jul 31, 2022
@rdblue
Copy link
Contributor

rdblue commented Jul 31, 2022

@karuppayya, can you add a description to this PR that documents the problem and how this fixes it?

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me.

@RussellSpitzer
Copy link
Member

RussellSpitzer commented Aug 1, 2022

@rdblue Just to note this is still actually incorrect as we are using the table.currentSchema.highestId and not table.allschemas.highestId. But that fix is going to take a bit more refactoring. I originally guessed the issue was related to dropped fields when this got reported to us.

Imagine I have
{A: 1, B: 2, C:3}

And then a user drops C so I have
{A: 1, B: 2} Highest Field ID = 3

Then we do a mergeSchema with
{A, B, D}

I will end up assigning (highestId({A:1, B:2}) + 1) to D (3)
The table would then have schemas :

Schemas {
{A: 1, B: 2, C: 3}
{A: 1, B: 2}
{A: 1, B :2, D: 3}
}

I still think we should merge this fix first as it corrects monotonically changing tables

@RussellSpitzer
Copy link
Member

@rdblue
Copy link
Contributor

rdblue commented Aug 1, 2022

Re-running failed CI jobs

@rdblue rdblue merged commit e05f2bb into apache:master Aug 1, 2022
@rdblue rdblue added this to the Iceberg 0.14.1 Release milestone Aug 1, 2022
@rdblue
Copy link
Contributor

rdblue commented Aug 1, 2022

Merged. Thanks, @karuppayya! I've also marked this for inclusion in the next patch release.

abmo-x pushed a commit to abmo-x/iceberg that referenced this pull request Aug 2, 2022
rdblue pushed a commit to rdblue/iceberg that referenced this pull request Sep 2, 2022
rdblue pushed a commit that referenced this pull request Sep 3, 2022
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mergeSchema fails with java.lang.IllegalArgumentException: Multiple entries with same key: 1=col1 and 7=col2

3 participants