diff --git a/site/docs/develop/kernel.md b/site/docs/develop/kernel.md index 6758b8af36a..3472f604aae 100644 --- a/site/docs/develop/kernel.md +++ b/site/docs/develop/kernel.md @@ -28,9 +28,10 @@ a so-called commit operation instructs Nessie to record the new state in a Nessi carries the `Content` object(s). `IcebergTable` contains the _current_ and _global_ pointer to Iceberg's table metadata plus the -ID of the snapshot defined in the table metadata. Since Iceberg's table metadata manages information -that must be consistent across all branches in Nessie, it is stored as so-called _global state_. -The value of the snapshot-ID is stored per Nessie named reference (branch or tag). +IDs of the snapshot, schema, partition spec, sort order defined in the table metadata. +- Since Iceberg's table metadata manages information that must be consistent across all branches in Nessie, it is stored as so-called _global state_. +- The value of the snapshot-ID, schema-ID, partition-spec-ID, sort-order-ID is stored per Nessie named reference (branch or tag). +For more information, please refer the spec [On Reference State vs Global State](spec.md#on-reference-state-vs-global-state) Updating _global-state_ and _on-reference-state_ are technically operations against two different entities in Nessie's backend database. Classic, relational databases (usually) come with a @@ -191,7 +192,8 @@ conditional updates to multiple rows/records is either not supported at all or e Nessie differentiates between content types that do require so called _global-state_ and those that do not. Apache Iceberg is currently the only content type that supports global state: -the pointer to the Iceberg "Table Metadata" is tracked as "global state" and the Iceberg snapshot ID +the pointer to the Iceberg "Table Metadata" is tracked as "global state" and +the Iceberg snapshot ID, schema ID, partition spec ID, sort order ID is tracker per _Nessie named reference_. For _Nessie commits_, which are atomic, this means that Nessie has to update both the global-state and the on-reference-state for the Iceberg table. While this is not an issue with a relational/transactional database, it is an issue in a key-value store. diff --git a/site/docs/develop/spec.md b/site/docs/develop/spec.md index 86ea63ddad8..0abde8012bb 100644 --- a/site/docs/develop/spec.md +++ b/site/docs/develop/spec.md @@ -43,6 +43,9 @@ same physical table but with different state of the data and potentially differe table formats like Apache Iceberg require Nessie to refer to a single _Global State_, in case of Iceberg the _table metadata_. This _Global State_ is not versioned in Nessie, because it has to contain enough information to resolve all information in all Nessie commits. +the IDs of the _Iceberg snapshot_, _Iceberg schema_, _Iceberg partition spec_, _Iceberg sort order_ +within the Iceberg _table metadata_ is stored per Nessie named reference(branch or tag) +as so-called _on-reference-state_. !!! note The term _all information in all Nessie commits_ used above precisely means all information @@ -103,16 +106,14 @@ a new Iceberg snapshot. Any Nessie commit refers to a particular Iceberg snapsho table, which translates to the state of an Iceberg table for a particular Nessie commit. Nessie needs to track Iceberg's _table metadata_ as so called _Global State_ within Nessie to -ensure that schema evolution works as expected. +ensure that table evolution and other operations like delete work as expected. The Nessie `IcebergTable` object passed to Nessie in a [_Put operation_](#put-operation) therefore consists of -1. the pointer to the Iceberg _table metadata_ and -2. the ID of the _Iceberg snapshot_ within the Iceberg _table metadata_. - -The pointer to the Iceberg table is recorded as _Global State_ and the ID of the Iceberg snapshot -is recorded within the _Put operation_ in a Nessie commit. +1. the pointer to the Iceberg _table metadata_ (so called _Global State_) and +2. the IDs of the _Iceberg snapshot_, _Iceberg schema_, _Iceberg partition spec_, _Iceberg sort order_ +within the Iceberg _table metadata_. (so called _On Reference State_) !!! note This model puts a strong restriction on the Iceberg table. All metadata JSON documents must be