-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Spec: Update row lineage requirements for upgrading tables #12781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d46c6f9
0db277a
46527d4
d0c9704
261e8f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -404,8 +404,6 @@ On read, if `_last_updated_sequence_number` is `null` it is assigned the `sequen | |||||
|
|
||||||
| When `null`, a row's `_row_id` field is assigned to the `first_row_id` from its containing data file plus the row position in that data file (`_pos`). A data file's `first_row_id` field is assigned using inheritance and is documented in [First Row ID Inheritance](#first-row-id-inheritance). A manifest's `first_row_id` is assigned when writing the manifest list for a snapshot and is documented in [First Row ID Assignment](#first-row-id-assignment). A snapshot's `first-row-id` is set to the table's `next-row-id` and is documented in [Snapshot Row IDs](#snapshot-row-ids). | ||||||
|
|
||||||
| Values for `_row_id` and `_last_updated_sequence_number` are either read from the data file or assigned at read time. As a result on read, rows in a table always have non-null values for these fields when lineage is enabled. | ||||||
|
|
||||||
| When an existing row is moved to a different data file for any reason, writers should write `_row_id` and `_last_updated_sequence_number` according to the following rules: | ||||||
|
|
||||||
| 1. The row's existing non-null `_row_id` must be copied into the new data file | ||||||
|
|
@@ -418,7 +416,7 @@ Engines may model operations as deleting/inserting rows or as modifications to r | |||||
|
|
||||||
| This example demonstrates how `_row_id` and `_last_updated_sequence_number` are assigned for a snapshot. This starts with a table with a `next-row-id` of 1000. | ||||||
|
|
||||||
| Writing a new append snapshot would create snapshot metadata with `first-row-id` assigned to the table's `next-row-id`: | ||||||
| Writing a new append snapshot creates snapshot metadata with `first-row-id` assigned to the table's `next-row-id`: | ||||||
|
|
||||||
| ```json | ||||||
| { | ||||||
|
|
@@ -428,17 +426,21 @@ Writing a new append snapshot would create snapshot metadata with `first-row-id` | |||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| The snapshot's manifest list would contain existing manifests, plus new manifests with an assigned `first_row_id` based on the `added_rows_count` of previously listed added manifests: | ||||||
| The snapshot's manifest list will contain existing manifests, plus new manifests that are each assigned a `first_row_id` based on the `added_rows_count` and `existing_rows_count` of preceding new manifests: | ||||||
|
|
||||||
| | `manifest_path` | `added_rows_count` | `existing_rows_count` | `first_row_id` | | ||||||
| |-----------------|--------------------|-----------------------|--------------------| | ||||||
| | ... | ... | ... | ... | | ||||||
| | existing | 75 | 0 | 925 | | ||||||
| | added1 | 100 | 25 | 1000 | | ||||||
| | added2 | 0 | 100 | 1100 | | ||||||
| | added3 | 125 | 25 | 1100 | | ||||||
| | added2 | 0 | 100 | 1125 | | ||||||
| | added3 | 125 | 25 | 1225 | | ||||||
|
|
||||||
| The existing manifests are written with the `first_row_id` assigned when the manifests were added to the table. | ||||||
|
|
||||||
| The first added manifest, `added1`, is assigned the same `first_row_id` as the snapshot and each of the remaining added manifests are assigned a `first_row_id` based on the number of rows in preceding manifests that were assigned a `first_row_id`. | ||||||
|
|
||||||
| The first added file, `added1`, is assigned the same `first_row_id` as the snapshot and the following manifests are assigned `first_row_id` based on the number of rows added by the previously listed manifests. The second file, `added2`, does not change the `first_row_id` of the next manifest because it contains no added data files. | ||||||
| Note that the second file, `added2`, changes the `first_row_id` of the next manifest even though it contains no added data files because any data file without a `first_row_id` could be assigned one, even if it has existing status. This is optional if the writer knows that existing data files in the manifest have assigned `first_row_id` values. | ||||||
|
|
||||||
| Within `added1`, the first added manifest, each data file's `first_row_id` follows a similar pattern: | ||||||
|
|
||||||
|
|
@@ -450,21 +452,24 @@ Within `added1`, the first added manifest, each data file's `first_row_id` follo | |||||
|
|
||||||
| The `first_row_id` of the EXISTING file `data1` was already assigned, so the file metadata was copied into manifest `added1`. | ||||||
|
|
||||||
| Files `data2` and `data3` are written with `null` for `first_row_id` and are assigned `first_row_id` at read time based on the manifest's `first_row_id` and the `record_count` of previously listed ADDED files in this manifest: (1,000 + 0) and (1,000 + 50). | ||||||
| Files `data2` and `data3` are written with `null` for `first_row_id` and are assigned `first_row_id` at read time based on the manifest's `first_row_id` and the `record_count` of previous files without `first_row_id` in this manifest: (1,000 + 0) and (1,000 + 50). | ||||||
|
|
||||||
| The snapshot then populates the total number of `added-rows` based on the sum of all added rows in the manifests: 100 (50 + 50) | ||||||
|
|
||||||
| When the new snapshot is committed, the table's `next-row-id` must also be updated (even if the new snapshot is not in the main branch). Because 225 rows were added (`added1`: 100 + `added2`: 0 + `added3`: 125), the new value is 1,000 + 225 = 1,225: | ||||||
| When the new snapshot is committed, the table's `next-row-id` must also be updated (even if the new snapshot is not in the main branch). Because 375 rows were in data files in manifests that were assigned a `first_row_id` (`added1` 100+25, `added2` 0+100, `added3` 125+25) the new value is 1,000 + 375 = 1,375. | ||||||
|
|
||||||
|
|
||||||
| ##### Row Lineage for Upgraded Tables | ||||||
|
|
||||||
| Any snapshot without the field `first-row-id` does not have any lineage information and values for `_row_id` and `_last_updated_sequence_number` cannot be assigned accurately. | ||||||
| When a table is upgraded to v3, its `next-row-id` is initailized to 0 and existing snapshots are not modified (that is, `first-row-id` remains unset or null). For such snapshots without `first-row-id`, `first_row_id` values for data files and data manifests are null, and values for `_row_id` are read as null for all rows. When `first_row_id` is null, inherited row ID values are also null. | ||||||
|
|
||||||
| Snapshots that are created after upgrading to v3 must set the snapshot's `first-row-id` and assign row IDs to existing and added files in the snapshot. When writing the manifest list, all data manifests must be assigned a `first_row_id`, which assigns a `first_row_id` to all data files via inheritance. | ||||||
|
|
||||||
| Note that: | ||||||
|
|
||||||
| All files that were added before upgrading to v3 must propagate null for all row-lineage related | ||||||
| fields. The values for `_row_id` and `_last_updated_sequence_number` must always return null and when these rows are copied, | ||||||
| null must be explicitly written. After this point, rows are treated as if they were just created | ||||||
| and assigned `row_id` and `_last_updated_sequence_number` as if they were new rows. | ||||||
| * Snapshots created before upgrading to v3 do not have row IDs. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should add a note that, when reading from these snapshots all rows should return
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I did above in the first paragraph because it is part of the specification rather than a consequence of the rules:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we've had some confusion here (at least amongst folks I've talked to) about when "reading as null" and "reading null as something else". |
||||||
| * After upgrading, new snapshots in different branches will assign disjoint ID ranges to existing data files, based on the table's `next-row-id` when the snapshot is committed. For a data file in multiple branches, a writer may write the `first_row_id` from another branch or may assign a new `first_row_id` to the data file (to avoid large metadata rewrites). | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if i understand, this means if same data file exists on different branch, can have same first_row_id? It seems the two sentence contradict (first sentence specifies disjoint Id ranges). Would it be more clear:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. During upgrade it is possible that an existing row can exist on two different branches with different row ids, after upgrade this will not be possible for new rows.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What this is saying is that when another branch is updated, all of the files in that branch must be assigned IDs by the v3 snapshot, and unless the writer does some additional work to find and write the row IDs for the same data file in other branches, the IDs will be assigned for the branch. This also says (the last sentence) that the writer can choose to do that extra work, find the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think makes sense. I still read the two sentence as contradict (as the first sentence specifies 'disjoint', but second sentence says we can optionally re-use the other branch first_row_id), hence my suggestion if it makes sense |
||||||
| * Existing rows will inherit `_last_updated_sequence_number` from their containing data file. | ||||||
|
|
||||||
|
|
||||||
| ### Partitioning | ||||||
|
|
@@ -689,9 +694,11 @@ When reading v1 manifests with no sequence number column, sequence numbers for a | |||||
|
|
||||||
| When adding a new data file, its `first_row_id` field is set to `null` because it is not assigned until the snapshot is successfully committed. | ||||||
|
|
||||||
| When reading, the `first_row_id` is assigned by replacing `null` with the manifest's `first_row_id` plus the sum of `record_count` for all added data files that preceded the file in the manifest. | ||||||
| When reading, the `first_row_id` is assigned by replacing `null` with the manifest's `first_row_id` plus the sum of `record_count` for all data files that preceded the file in the manifest that also had a null `first_row_id`. | ||||||
|
|
||||||
| The `first_row_id` is only inherited for added data files. The inherited value must be written into the data file metadata for existing and deleted entries. The value of `first_row_id` for delete files is always `null`. | ||||||
| The inherited value of `first_row_id` must be written into data file metadata when creating existing and deleted entries. The value of `first_row_id` for delete files is always `null`. | ||||||
|
|
||||||
| Any null (unassigned) `first_row_id` must be assigned via inheritance, even if the data file is existing. This ensures that row IDs are assigned to existing data files in upgraded tables in the first commit after upgrading to v3. | ||||||
|
|
||||||
| ### Snapshots | ||||||
|
|
||||||
|
|
@@ -708,7 +715,6 @@ A snapshot consists of the following fields: | |||||
| | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` as a _required_ field (see below) | | ||||||
| | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | ||||||
| | | | _required_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | | ||||||
| | | | _required_ | **`added-rows`** | Sum of the [`added_rows_count`](#manifest-lists) from all manifests added in this snapshot. | | ||||||
|
|
||||||
|
|
||||||
| The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: | ||||||
|
|
@@ -732,12 +738,10 @@ Valid snapshots are stored as a list in table metadata. For serialization, see A | |||||
|
|
||||||
| #### Snapshot Row IDs | ||||||
|
|
||||||
| A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on each commit attempt. If a commit is retried, the `first-row-id` must be reassigned. If a commit contains no new rows, `first-row-id` should be omitted. | ||||||
| A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on each commit attempt. If a commit is retried, the `first-row-id` must be reassigned based on the table's current `next-row-id`. The `first-row-id` field is required even if a commit does not assign any ID space. | ||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Always writing
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: this seems a good clarification, if i understand this correctly, the first-row-id could be the same value even on the next attempt (if next-row-id is not changed), before it seemed like it implied it needed to be different. |
||||||
|
|
||||||
| The snapshot's `first-row-id` is the starting `first_row_id` assigned to manifests in the snapshot's manifest list. | ||||||
|
|
||||||
| The snapshot's `added-rows` is the sum of all the [`added_rows_count`](#manifest-lists) in all added manifests. | ||||||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @RussellSpitzer, I've removed In this PR, row ID range assignment is based on the total number of existing or added rows in new manifests. That leaves room for any data files that are missing
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's fine, we only had this added in order to pass the information from the snapshot into the table metadata. I believe now that logic has moved we don't have that issue. |
||||||
|
|
||||||
|
|
||||||
| ### Manifest Lists | ||||||
|
|
||||||
|
|
@@ -786,9 +790,11 @@ Notes: | |||||
|
|
||||||
| #### First Row ID Assignment | ||||||
|
|
||||||
| When adding a new data manifest file, its `first_row_id` field is assigned the value of the snapshot's `first_row_id` plus the sum of `added_rows_count` for all data manifests that preceded the manifest in the manifest list. | ||||||
| The `first_row_id` for existing manifests must be preserved when writing a new manifest list. The value of `first_row_id` for delete manifests is always `null`. The `first_row_id` is only assigned for data manifests that do not have a `first_row_id`. Assignment must account for data files that will be assigned `first_row_id` values when the manifest is read. | ||||||
|
|
||||||
| The `first_row_id` is only assigned for new data manifests. Values for existing manifests must be preserved when writing a new manifest list. The value of `first_row_id` for delete manifests is always `null`. | ||||||
| The first manifest without a `first_row_id` is assigned a value that is greater than or equal to the `first_row_id` of the snapshot. Subsequent manifests without a `first_row_id` are assigned one based on the previous manifest to be assigned a `first_row_id`. Each assigned `first_row_id` must increase by the row count of all files that will be assigned a `first_row_id` via inheritance in the last assigned manifest. That is, each `first_row_id` must be greater than or equal to the last assigned `first_row_id` plus the total record count of data files with a null `first_row_id` in the last assigned manifest. | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Tried to simplify this a bit, not sure if I succeeded
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The second sentence is intended to clarify how to interpret the requirement (the "must be") by stating that the The complication I'm trying to avoid is the distinction between the manifest that precedes the one being assigned a
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I think the Last sentence is clear, but the second sentence just sounds more complicated to me. I'm fine with this as is though, i think the examples make this clear |
||||||
|
|
||||||
| A simple and valid approach is to estimate the number of rows in data files that will be assigned a `first_row_id` using the the manifest's `added_rows_count` and `existing_rows_count`: `first_row_id = last_assigned.first_row_id + last_assigned.added_rows_count + last_assigned.existing_rows_count`. | ||||||
|
|
||||||
| ### Scan Planning | ||||||
|
|
||||||
|
|
@@ -907,11 +913,13 @@ Table metadata consists of the following fields: | |||||
| | | _optional_ | _optional_ | **`refs`** | A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a `main` branch reference pointing to the `current-snapshot-id` even if the `refs` map is null. | | ||||||
| | _optional_ | _optional_ | _optional_ | **`statistics`** | A list (optional) of [table statistics](#table-statistics). | | ||||||
| | _optional_ | _optional_ | _optional_ | **`partition-statistics`** | A list (optional) of [partition statistics](#partition-statistics). | | ||||||
| | | | _optional_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). | | ||||||
| | | | _required_ | **`next-row-id`** | A `long` higher than all assigned row IDs; the next snapshot's `first-row-id`. See [Row Lineage](#row-lineage). | | ||||||
|
|
||||||
| For serialization details, see Appendix C. | ||||||
|
|
||||||
| When a new snapshot is added, the table's `next-row-id` should be updated to the previous `next-row-id` plus the sum of `record_count` for all data files added in the snapshot (this is also equal to the sum of `added_rows_count` for all manifests added in the snapshot). This ensures that `next-row-id` is always higher than any assigned row ID in the table. | ||||||
| When a new snapshot is added, the table's `next-row-id` should be increased by the sum of `record_count` for all data files that will be assigned a `first_row_id` via inheritance in the snapshot. The `next-row-id` must always be higher than any assigned row ID in the table. | ||||||
|
|
||||||
| A simple and valid approach is estimate of the number of rows in data files that will be assigned a `first_row_id` using the manifests' `added_rows_count` and `existing_rows_count`. Using the last assigned manifest, this is `next-row-id = last_assigned.first_row_id + last_assigned.added_rows_count + last_assigned.existing_rows_count`. | ||||||
|
|
||||||
| #### Table Statistics | ||||||
|
|
||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but we have spaces before and after + in all the other examples