Skip to content

Conversation

@ebyhr
Copy link
Contributor

@ebyhr ebyhr commented Apr 9, 2025

#12593 made next-row-id field required in my understanding.

@github-actions github-actions bot added the Specification Issues that may introduce spec changes. label Apr 9, 2025
@manuzhang
Copy link
Member

Do we need to notify the dev list?

@manuzhang manuzhang changed the title Doc: Make next-row-id required in v3 Spec: Make next-row-id required in v3 Apr 10, 2025
@ebyhr
Copy link
Contributor Author

ebyhr commented Apr 10, 2025

@manuzhang
Copy link
Member

Okay, this is a follow-up of #12580.

@danielcweeks
Copy link
Contributor

Hold on, I'm not certain this is required. When a table is first converted to v3 this can be null until the row-ids are assigned. @rdblue can you weigh in here as well?

@rdblue
Copy link
Contributor

rdblue commented Apr 10, 2025

I would hold off on this for a little bit. I'm looking more deeply into v1/v2 conversion to v3 and I think Dan is probably right. We will very likely have a case where the next-row-id of the table is not yet known just after the conversion. So even though the metadata file is v3, before the next version writes a new snapshot (and manifest list) the number of existing rows is not precisely known.

I'll update this when I know more since I'm actively working on it.

@rdblue
Copy link
Contributor

rdblue commented Apr 14, 2025

After working on the implementation and posting a PR to update the spec for how we handle upgrades, I think that this PR is correct and that next-row-id can be required in v3. Rather than filling in null until row IDs are assigned, the strategy I think is correct for upgrading tables is to assign IDs for an entire branch in the first snapshot written after the upgrade to v3. When upgrading, all existing snapshots have no IDs so next-version-id starts at 0. Then after the upgrade the next commit assigns IDs by rewriting the manifest list and then updates next-version-id by the size of the table: sum(record_count for all existing or added data files).

+1 to making the field required.

@manuzhang
Copy link
Member

I'm wondering whether we have a test to verify the upgrade path.

@rdblue
Copy link
Contributor

rdblue commented Apr 17, 2025

@ebyhr, I cherry-picked this commit into #12781 with the other spec changes that I just raised a vote for on the dev list so that we have this specifically included in a vote. When we go to merge, I can either merge this PR and then mine or make sure you're a co-author of the other PR. Whatever you'd prefer.

@rdblue
Copy link
Contributor

rdblue commented Apr 17, 2025

I'm wondering whether we have a test to verify the upgrade path.

There are extensive tests in #12672. See TestRowLineageAssignment.

@ebyhr
Copy link
Contributor Author

ebyhr commented Apr 21, 2025

Closing as #12781 has been merged. Thank you for the cherry-pick.

@ebyhr ebyhr closed this Apr 21, 2025
@ebyhr ebyhr deleted the ebi/docs-row-lineage branch April 21, 2025 03:53
@rdblue
Copy link
Contributor

rdblue commented Apr 21, 2025

Merged as part of #12781.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Specification Issues that may introduce spec changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants