From f4ba2b89cdaccdaabf2ce7f6d05c6244f99fe5ed Mon Sep 17 00:00:00 2001 From: Russell Spitzer Date: Wed, 15 Jan 2025 10:50:07 -0600 Subject: [PATCH 1/4] Spec: Add added-rows field to Snapshot --- format/spec.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/format/spec.md b/format/spec.md index 6b80e876ed43..2a966d5b7059 100644 --- a/format/spec.md +++ b/format/spec.md @@ -406,6 +406,8 @@ The `first_row_id` of the EXISTING file `data1` was already assigned, so the fil Files `data2` and `data3` are written with `null` for `first_row_id` and are assigned `first_row_id` at read time based on the manifest's `first_row_id` and the `record_count` of previously listed ADDED files in this manifest: (1,000 + 0) and (1,000 + 50). +The snapshot then populates the total number of `added-rows` based on the sum of all added rows in the manifests: 100 (50 + 50) + When the new snapshot is committed, the table's `next-row-id` must also be updated (even if the new snapshot is not in the main branch). Because 225 rows were added (`added1`: 100 + `added2`: 0 + `added3`: 125), the new value is 1,000 + 225 = 1,225: @@ -654,6 +656,7 @@ A snapshot consists of the following fields: | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` (see below) | | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | +| | | _optional | **`added-rows`** | The number of newly added rows in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: @@ -681,6 +684,8 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on The snapshot's `first-row-id` is the starting `first_row_id` assigned to manifests in the snapshot's manifest list. +The snapshot's `added-rows` is the sum of all the `added_rows_count` in all added manifests. + ### Manifest Lists From 4fd9f489b4c0a1e8ec4e9b5f950c5dbf3276d539 Mon Sep 17 00:00:00 2001 From: Russell Spitzer Date: Thu, 16 Jan 2025 10:09:56 -0600 Subject: [PATCH 2/4] Tidy up formatting --- format/spec.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/format/spec.md b/format/spec.md index 2a966d5b7059..39fcbde3e14b 100644 --- a/format/spec.md +++ b/format/spec.md @@ -655,8 +655,8 @@ A snapshot consists of the following fields: | _optional_ | | | **`manifests`** | A list of manifest file locations. Must be omitted if `manifest-list` is present | | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` (see below) | | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | -| | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | -| | | _optional | **`added-rows`** | The number of newly added rows in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | +| | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | +| | | _optional_ | **`added-rows`** | The number of newly added rows in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: From 776029b8f1a62c76ed3fed79c939c0391ba098e4 Mon Sep 17 00:00:00 2001 From: Russell Spitzer Date: Thu, 16 Jan 2025 10:11:45 -0600 Subject: [PATCH 3/4] Fix Tidy --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 39fcbde3e14b..3f2fe659cab2 100644 --- a/format/spec.md +++ b/format/spec.md @@ -656,7 +656,7 @@ A snapshot consists of the following fields: | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` (see below) | | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | -| | | _optional_ | **`added-rows`** | The number of newly added rows in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | +| | | _optional_ | **`added-rows`** | The number of newly added rows in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: From f23516a92754ac43d02aa4e443dac02c0da0f68f Mon Sep 17 00:00:00 2001 From: Russell Spitzer Date: Fri, 17 Jan 2025 15:05:20 -0600 Subject: [PATCH 4/4] Add Hyperlinks to Manifest Lists --- format/spec.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/format/spec.md b/format/spec.md index 3f2fe659cab2..5549e70013d5 100644 --- a/format/spec.md +++ b/format/spec.md @@ -656,7 +656,8 @@ A snapshot consists of the following fields: | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` (see below) | | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | | | _optional_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | -| | | _optional_ | **`added-rows`** | The number of newly added rows in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | +| | | _optional_ | **`added-rows`** | Sum of the [`added_rows_count`](#manifest-lists) from all manifests added in this snapshot. Required if [Row Lineage](#row-lineage) is enabled | + The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: @@ -684,7 +685,7 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on The snapshot's `first-row-id` is the starting `first_row_id` assigned to manifests in the snapshot's manifest list. -The snapshot's `added-rows` is the sum of all the `added_rows_count` in all added manifests. +The snapshot's `added-rows` is the sum of all the [`added_rows_count`](#manifest-lists) in all added manifests. ### Manifest Lists