Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions site/docs/develop/kernel.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,10 @@ a so-called commit operation instructs Nessie to record the new state in a Nessi
carries the `Content` object(s).

`IcebergTable` contains the _current_ and _global_ pointer to Iceberg's table metadata plus the
ID of the snapshot defined in the table metadata. Since Iceberg's table metadata manages information
that must be consistent across all branches in Nessie, it is stored as so-called _global state_.
The value of the snapshot-ID is stored per Nessie named reference (branch or tag).
IDs of the snapshot, schema, partition spec, sort order defined in the table metadata.
- Since Iceberg's table metadata manages information that must be consistent across all branches in Nessie, it is stored as so-called _global state_.
- The value of the snapshot-ID, schema-ID, partition-spec-ID, sort-order-ID is stored per Nessie named reference (branch or tag).
For more information, please refer the spec [On Reference State vs Global State](spec.md#on-reference-state-vs-global-state)

Updating _global-state_ and _on-reference-state_ are technically operations against two different
entities in Nessie's backend database. Classic, relational databases (usually) come with a
Expand Down Expand Up @@ -191,7 +192,8 @@ conditional updates to multiple rows/records is either not supported at all or e

Nessie differentiates between content types that do require so called _global-state_ and those
that do not. Apache Iceberg is currently the only content type that supports global state:
the pointer to the Iceberg "Table Metadata" is tracked as "global state" and the Iceberg snapshot ID
the pointer to the Iceberg "Table Metadata" is tracked as "global state" and
the Iceberg snapshot ID, schema ID, partition spec ID, sort order ID
is tracker per _Nessie named reference_. For _Nessie commits_, which are atomic, this means that
Nessie has to update both the global-state and the on-reference-state for the Iceberg table. While
this is not an issue with a relational/transactional database, it is an issue in a key-value store.
Expand Down
13 changes: 7 additions & 6 deletions site/docs/develop/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ same physical table but with different state of the data and potentially differe
table formats like Apache Iceberg require Nessie to refer to a single _Global State_, in case of
Iceberg the _table metadata_. This _Global State_ is not versioned in Nessie, because it has to
contain enough information to resolve all information in all Nessie commits.
the IDs of the _Iceberg snapshot_, _Iceberg schema_, _Iceberg partition spec_, _Iceberg sort order_
within the Iceberg _table metadata_ is stored per Nessie named reference(branch or tag)
as so-called _on-reference-state_.

!!! note
The term _all information in all Nessie commits_ used above precisely means all information
Expand Down Expand Up @@ -103,16 +106,14 @@ a new Iceberg snapshot. Any Nessie commit refers to a particular Iceberg snapsho
table, which translates to the state of an Iceberg table for a particular Nessie commit.

Nessie needs to track Iceberg's _table metadata_ as so called _Global State_ within Nessie to
ensure that schema evolution works as expected.
ensure that table evolution and other operations like delete work as expected.

The Nessie `IcebergTable` object passed to Nessie in a [_Put operation_](#put-operation) therefore
consists of

1. the pointer to the Iceberg _table metadata_ and
2. the ID of the _Iceberg snapshot_ within the Iceberg _table metadata_.

The pointer to the Iceberg table is recorded as _Global State_ and the ID of the Iceberg snapshot
is recorded within the _Put operation_ in a Nessie commit.
1. the pointer to the Iceberg _table metadata_ (so called _Global State_) and
2. the IDs of the _Iceberg snapshot_, _Iceberg schema_, _Iceberg partition spec_, _Iceberg sort order_
within the Iceberg _table metadata_. (so called _On Reference State_)

!!! note
This model puts a strong restriction on the Iceberg table. All metadata JSON documents must be
Expand Down