diff --git a/api/src/main/java/org/apache/iceberg/Snapshot.java b/api/src/main/java/org/apache/iceberg/Snapshot.java index 097806639b24..9bbb57ed4824 100644 --- a/api/src/main/java/org/apache/iceberg/Snapshot.java +++ b/api/src/main/java/org/apache/iceberg/Snapshot.java @@ -185,12 +185,14 @@ default Long firstRowId() { } /** - * The total number of newly added rows in this snapshot. It should be the summation of {@link - * ManifestFile#ADDED_ROWS_COUNT} for every manifest added in this snapshot. + * The upper bound of number of rows with assigned row IDs in this snapshot. It can be used safely + * to increment the table's `next-row-id` during a commit. It can be more than the number of rows + * added in this snapshot and include some existing rows. * *

This field is optional but is required when the table version supports row lineage. * - * @return the total number of new rows in this snapshot or null if the value was not stored. + * @return the upper bound of number of rows with assigned row IDs in this snapshot or null if the + * value was not stored. */ default Long addedRows() { return null; diff --git a/format/spec.md b/format/spec.md index 029b4f3821ef..cc048dd56a21 100644 --- a/format/spec.md +++ b/format/spec.md @@ -754,9 +754,9 @@ A snapshot consists of the following fields: | _optional_ | _required_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` as a _required_ field (see below) | | _optional_ | _optional_ | _optional_ | **`schema-id`** | ID of the table's current schema when the snapshot was created | | | | _required_ | **`first-row-id`** | The first `_row_id` assigned to the first row in the first data file in the first manifest, see [Row Lineage](#row-lineage) | +| | | _required_ | **`added-rows`** | The upper bound of the number of rows with assigned row IDs, see [Row Lineage](#row-lineage) | | | | _optional_ | **`key-id`** | ID of the encryption key that encrypts the manifest list key metadata | - The snapshot summary's `operation` field is used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: * `append` -- Only data files were added and no files were removed. @@ -782,6 +782,10 @@ A snapshot's `first-row-id` is assigned to the table's current `next-row-id` on The snapshot's `first-row-id` is the starting `first_row_id` assigned to manifests in the snapshot's manifest list. +The snapshot's `added-rows` captures the upper bound of the number of rows with assigned row IDs. +It can be used safely to increment the table's `next-row-id` during a commit. +It can be more than the number of rows added in this snapshot and include some existing rows, +see [Row Lineage Example](#row-lineage-example). ### Manifest Lists