From 981634eeb14b57209fa7c01c6f26b4bcae27ceb6 Mon Sep 17 00:00:00 2001 From: Steven Wu Date: Wed, 10 Sep 2025 20:16:08 -0700 Subject: [PATCH 1/5] Spec: bring back added-rows in snapshot fields --- format/spec.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 029b4f3821ef..bbe30275e063 100644 --- a/format/spec.md +++ b/format/spec.md @@ -754,9 +754,9 @@ A snapshot consists of the following fields: | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` as a _required_ field (see below) | | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | | | _required_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | +| | | _required_ | **`added-rows`** | The upper bound of the number of rows with assigned row IDs, see [Row Lineage](#row-lineage) | | | | _optional_ | **`key-id`** | ID of the encryption key that encrypts the manifest list key metadata | - The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: * `append` -- Only data files were added and no files were removed. @@ -782,6 +782,10 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on The snapshot's `first-row-id` is the starting `first_row_id` assigned to manifests in the snapshot's manifest list. +The snapshot's `added-rows` captures the upper bound of the number of rows with assigned row IDs. +It can be used safely to increment the table's `next-row-id` during a commit. +It can be more than the number of rows added in this snapshot and include some existing rows, +see [Row Lieange Example](#row-lineage-example). ### Manifest Lists From ddc9050aeea3e4f31c00a0376721805c177a47be Mon Sep 17 00:00:00 2001 From: Steven Zhen Wu Date: Thu, 11 Sep 2025 07:44:33 -0700 Subject: [PATCH 2/5] Update format/spec.md Co-authored-by: Yuya Ebihara --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index bbe30275e063..aef94a4344bf 100644 --- a/format/spec.md +++ b/format/spec.md @@ -785,7 +785,7 @@ The snapshot's `first-row-id` is the starting `first_row_id` assigned to manifes The snapshot's `added-rows` captures the upper bound of the number of rows with assigned row IDs. It can be used safely to increment the table's `next-row-id` during a commit. It can be more than the number of rows added in this snapshot and include some existing rows, -see [Row Lieange Example](#row-lineage-example). +see [Row Lineage Example](#row-lineage-example). ### Manifest Lists From e1e4ed3e9f15a20b0394a5234f5b9d8848266866 Mon Sep 17 00:00:00 2001 From: Steven Wu Date: Thu, 11 Sep 2025 10:23:06 -0700 Subject: [PATCH 3/5] Update Javadoc for Snapshot#addedRows() to clarify its purpose --- api/src/main/java/org/apache/iceberg/Snapshot.java | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/api/src/main/java/org/apache/iceberg/Snapshot.java b/api/src/main/java/org/apache/iceberg/Snapshot.java index 097806639b24..9bbb57ed4824 100644 --- a/api/src/main/java/org/apache/iceberg/Snapshot.java +++ b/api/src/main/java/org/apache/iceberg/Snapshot.java @@ -185,12 +185,14 @@ default Long firstRowId() { } /** - * The total number of newly added rows in this snapshot. It should be the summation of {@link - * ManifestFile#ADDED_ROWS_COUNT} for every manifest added in this snapshot. + * The upper bound of number of rows with assigned row IDs in this snapshot. It can be used safely + * to increment the table's `next-row-id` during a commit. It can be more than the number of rows + * added in this snapshot and include some existing rows. * *

This field is optional but is required when the table version supports row lineage. * - * @return the total number of new rows in this snapshot or null if the value was not stored. + * @return the upper bound of number of rows with assigned row IDs in this snapshot or null if the + * value was not stored. */ default Long addedRows() { return null; From 06b55799cae0f322c07e39a1dd05197edf6d6f96 Mon Sep 17 00:00:00 2001 From: Steven Wu Date: Fri, 12 Sep 2025 22:14:11 -0700 Subject: [PATCH 4/5] align the table rows --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index aef94a4344bf..4bb6265f7c86 100644 --- a/format/spec.md +++ b/format/spec.md @@ -754,7 +754,7 @@ A snapshot consists of the following fields: | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` as a _required_ field (see below) | | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | | | _required_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | -| | | _required_ | **`added-rows`** | The upper bound of the number of rows with assigned row IDs, see [Row Lineage](#row-lineage) | +| | | _required_ | **`added-rows`** | The upper bound of the number of rows with assigned row IDs, see [Row Lineage](#row-lineage) | | | | _optional_ | **`key-id`** | ID of the encryption key that encrypts the manifest list key metadata | The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: From 02893f7d1d7b1648f7f763187cb03fa43c56154b Mon Sep 17 00:00:00 2001 From: Steven Wu Date: Fri, 12 Sep 2025 22:15:42 -0700 Subject: [PATCH 5/5] whitespaces --- format/spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/spec.md b/format/spec.md index 4bb6265f7c86..cc048dd56a21 100644 --- a/format/spec.md +++ b/format/spec.md @@ -754,7 +754,7 @@ A snapshot consists of the following fields: | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` as a _required_ field (see below) | | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | | | _required_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | -| | | _required_ | **`added-rows`** | The upper bound of the number of rows with assigned row IDs, see [Row Lineage](#row-lineage) | +| | | _required_ | **`added-rows`** | The upper bound of the number of rows with assigned row IDs, see [Row Lineage](#row-lineage) | | | | _optional_ | **`key-id`** | ID of the encryption key that encrypts the manifest list key metadata | The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: