diff --git a/site/docs/metadata.md b/site/docs/metadata.md new file mode 100644 index 000000000000..c83a7858d8e1 --- /dev/null +++ b/site/docs/metadata.md @@ -0,0 +1,134 @@ +# Metadata Tables + +This page describes the internal metadata tables maintained by Iceberg. Please refer to [definitions page](terms.md) +for more information on terms and definitions and the [specifications page](spec.md) for more information on Iceberg's +table specification. Complete metadata table schema can be found on the [Spark Queries page](spark-queries.md#metadata-table-schema). + +| Name | Description | +| --------------------------------------------------| ------------| +| [`AllDataFilesTable`](#AllDataFilesTable) | Contains rows representing all of the data files in the table. Each row will contain metadata as well as path information stored by the Iceberg. This differs from the `DataFilesTable` because it contains all files currently referenced by any existing Snapshot from this table rather than just the current one. +| [`AllEntriesTable`](#AllEntriesTable) | Contains a table's manifest entries as rows, for both delete and data files. Please note that this table exposes internal details, like files that have been deleted. For a table of the live data files, please use `DataFilesTable`. +| [`AllManifestsTable`](#AllManifestsTable) | Contains a table's valid manifest files as rows. A valid manifest file is referenced from any snapshot currently tracked by the table. This table may contain duplicate rows. +| [`DataFilesTable`](#DataFilesTable) | Contains a table's data files as rows. +| [`HistoryTable`](#HistoryTable) | Contains a table's history as rows. History is based on the table's snapshot log, which logs each update to the table's current snapshot. +| [`ManifestEntriesTable`](#ManifestEntriesTable) | Contains a table's manifest entries as rows, for both delete and data files. Please note that this table exposes internal details, like files that have been deleted. For a table of the live data files, please use `DataFilesTable`. +| [`ManifestsTable`](#ManifestsTable) | Contains a table's manifest files as rows. +| [`PartitionsTable`](#PartitionsTable) | Contains a table's partitions as rows. +| [`SnapshotsTable`](#SnapshotsTable) | Contains a table's known snapshots as rows. This does not include snapshots that have been expired using [`ExpireSnapshots`](https://iceberg.apache.org/javadoc/master/org/apache/iceberg/ExpireSnapshots.html). + + +## Table Schema + +### 1. `AllDataFilesTable` + +| Column name | Required | Data type | Description | +|-----------------------|-----------|-------------------|-------------| +| content | | int | Contents of the file: 0=data, 1=position deletes, 2=equality deletes +| file_path | ✔️ | string | Location URI with FS scheme +| file_format | ✔️ | string | File format name: avro, orc, or parquet +| partition | ✔️ | `struct<...>` | Partition data tuple, schema based on the partition spec +| record_count | ✔️ | long | Number of records in the file +| file_size_in_bytes | ✔️ | long | Total file size in bytes +| column_sizes | ️ | `map` | Map of column id to total size on disk +| value_counts | ️ | `map` | Map of column id to total count, including null and NaN +| null_value_counts | ️ | `map` | Map of column id to null value count +| nan_value_counts | | `map` | Map of column id to number of NaN values in the column +| lower_bounds | | `map`| Map of column id to lower bound +| upper_bounds | | `map`| Map of column id to upper bound +| key_metadata | | binary | Encryption key metadata blob +| split_offsets | | `list` | Splittable offsets +| equality_ids | | `list` | Equality comparison field IDs +| sort_order_id | | int | Sort order ID + +### 2. `AllEntriesTable` + +| Column name | Required | Data type | Description | +|-------------------|----------|------------------------|-------------| +| status | ✔️ | int | Used to track additions and deletions: `0: EXISTING` `1: ADDED` `2: DELETED` +| snapshot_id | | long | Snapshot id where the file was added, or deleted if status is 2. Inherited when null. +| sequence_number | | long | Sequence number when the file was added. Inherited when null. +| data_file | ✔️ | `data_file` `struct` | File path, partition tuple, metrics, ... + +### 3. `AllManifestsTable` + +| Column name | Required | Data type | Description | +|---------------------------|----------|--------------------|-------------| +| path | ✔️ | string | Location of the manifest file +| length | ✔️ | long | Length of the manifest file +| partition_spec_id | | int | ID of a partition spec used to write the manifest; must be listed in table metadata `partition-specs` +| added_snapshot_id | | long | ID of the snapshot where the manifest file was added +| added_data_files_count | | int | Number of entries in the manifest that have status `ADDED` (1), when `null` this is assumed to be non-zero +| existing_data_files_count | | int | Number of entries in the manifest that have status `EXISTING` (0), when `null` this is assumed to be non-zero +| deleted_data_files_count | | int | Number of entries in the manifest that have status `DELETED` (2), when `null` this is assumed to be non-zero +| partition_summaries | | `list>`| Partition summary information: contains null/nan, optional lower and upper bounds + +### 4. `DataFilesTable` + +| Column name | Required | Data type | Description | +|-----------------------|-------|-------------------|-------------| +| content | | int | Contents of the file: 0=data, 1=position deletes, 2=equality deletes +| file_path | ✔️ | string | Location URI with FS scheme +| file_format | ✔️ | string | File format name: avro, orc, or parquet +| partition | ✔️ | `struct<...>` | Partition data tuple, schema based on the partition spec +| record_count | ✔️ | long | Number of records in the file +| file_size_in_bytes | ✔️ | long | Total file size in bytes +| column_sizes | ️ | `map` | Map of column id to total size on disk +| value_counts | ️ | `map` | Map of column id to total count, including null and NaN +| null_value_counts | ️ | `map` | Map of column id to null value count +| nan_value_counts | | `map` | Map of column id to number of NaN values in the column +| lower_bounds | | `map`| Map of column id to lower bound +| upper_bounds | | `map`| Map of column id to upper bound +| key_metadata | | binary | Encryption key metadata blob +| split_offsets | | `list` | Splittable offsets +| equality_ids | | `list` | Equality comparison field IDs +| sort_order_id | | int | Sort order ID + +### 5. `HistoryTable` + +| Column name | Required | Data type | Description | +|-----------------------|-----------|-----------|-------------| +| made_current_at | ✔️ | timstampz | Timestamp (with timezone) when this snapshot was promoted to current, i.e. when the first writer to this snapshot committed. +| snapshot_id | ✔️ | long | A unique ID +| parent_id | | long | ID of parent snapshot +| is_current_ancestor | ✔️ | boolean | True if if this snapshot is ancestor of current; false otherwise + +### 6. `ManifestEntriesTable` + +| Column name | Required | Data type | Description | +|-------------------|----------|------------------------|-------------| +| status | ✔️ | int | Used to track additions and deletions: `0: EXISTING` `1: ADDED` `2: DELETED` +| snapshot_id | | long | Snapshot id where the file was added, or deleted if status is 2. Inherited when null. +| sequence_number | | long | Sequence number when the file was added. Inherited when null +| data_file | ✔️ | `data_file` `struct` | File path, partition tuple, metrics, ... + +### 7. `ManifestsTable` + +| Column name | Required | Data type | Description | +|---------------------------|----------|--------------------|-------------| +| path | ✔️ | string | Location of the manifest file +| length | ✔️ | long | Length of the manifest file +| partition_spec_id | ✔️ | int | ID of a partition spec used to write the manifest; must be listed in table metadata `partition-specs` +| added_snapshot_id | ✔️ | long | ID of the snapshot where the manifest file was added +| added_data_files_count | ✔️ | int | Number of entries in the manifest that have status `ADDED` (1), when `null` this is assumed to be non-zero +| existing_data_files_count | ✔️ | int | Number of entries in the manifest that have status `EXISTING` (0), when `null` this is assumed to be non-zero +| deleted_data_files_count | ✔️ | int | Number of entries in the manifest that have status `DELETED` (2), when `null` this is assumed to be non-zero +| partition_summaries | ✔️ | `list>`| Partition summary information: contains null/nan, optional lower and upper bounds + +### 8. `PartitionsTable` + +| Column name | Required | Data type | Description | +|---------------|----------|----------------|-------------| +| partition | ✔️ | `struct<...>` | The table partition spec determined by partition type +| record_count | ✔️ | long | Aggregated number of records in this partition +| file_count | ✔️ | int | Total number of data files in this partition + +### 9. `SnapshotsTable` + +| Column name | Required | Data type | Description | +|---------------------------|-------------------------------|-------------| +| committed_at | ✔️ | timestampz | Commit timestamp with timezone +| snapshot_id | ✔️ | long | A unique ID +| parent_id | | long | The snapshot ID of the snapshot's parent. Omitted for any snapshot with no parent +| operation | | string | Used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: `append`, `replace`, `overwrite`, `delete` +| manifest_list | | string | The location of a manifest list for this snapshot that tracks manifest files with additional meadata +| summary | | `map` | A string map that summarizes the snapshot changes | diff --git a/site/docs/spark-queries.md b/site/docs/spark-queries.md index f7a78b566beb..644a15e7f2f5 100644 --- a/site/docs/spark-queries.md +++ b/site/docs/spark-queries.md @@ -234,6 +234,189 @@ SELECT * FROM prod.db.table.manifests +----------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+---------------------------------+ ``` +### Metadata Table Schema + +1. `AllDataFilesTable` + +```json +table { + 134: content: optional int (Contents of the file: 0=data, 1=position deletes, 2=equality deletes) + 100: file_path: required string (Location URI with FS scheme) + 101: file_format: required string (File format name: avro, orc, or parquet) + 102: partition: required struct<1000: data_bucket: optional int> (Partition data tuple, schema based on the partition spec) + 103: record_count: required long (Number of records in the file) + 104: file_size_in_bytes: required long (Total file size in bytes) + 108: column_sizes: optional map (Map of column id to total size on disk) + 109: value_counts: optional map (Map of column id to total count, including null and NaN) + 110: null_value_counts: optional map (Map of column id to null value count) + 137: nan_value_counts: optional map (Map of column id to number of NaN values in the column) + 125: lower_bounds: optional map (Map of column id to lower bound) + 128: upper_bounds: optional map (Map of column id to upper bound) + 131: key_metadata: optional binary (Encryption key metadata blob) + 132: split_offsets: optional list (Splittable offsets) + 135: equality_ids: optional list (Equality comparison field IDs) + 140: sort_order_id: optional int (Sort order ID) +} +``` + +2. `AllEntriesTable` + +```json +table { + 0: status: required int + 1: snapshot_id: optional long + 3: sequence_number: optional long + 2: data_file: required struct< + 134: content: optional int (Contents of the file: 0=data, 1=position deletes, 2=equality deletes), + 100: file_path: required string (Location URI with FS scheme), + 101: file_format: required string (File format name: avro, orc, or parquet), + 102: partition: required struct<1000: data_bucket: optional int> (Partition data tuple, schema based on the partition spec), + 103: record_count: required long (Number of records in the file), + 104: file_size_in_bytes: required long (Total file size in bytes), + 108: column_sizes: optional map (Map of column id to total size on disk), + 109: value_counts: optional map (Map of column id to total count, including null and NaN), + 110: null_value_counts: optional map (Map of column id to null value count), + 137: nan_value_counts: optional map (Map of column id to number of NaN values in the column), + 125: lower_bounds: optional map (Map of column id to lower bound), + 128: upper_bounds: optional map (Map of column id to upper bound), + 131: key_metadata: optional binary (Encryption key metadata blob), + 132: split_offsets: optional list (Splittable offsets), + 135: equality_ids: optional list (Equality comparison field IDs), + 140: sort_order_id: optional int (Sort order ID) + > +} +``` + +3. `AllManifestsTable` + +```json +table { + 1: path: required string + 2: length: required long + 3: partition_spec_id: optional int + 4: added_snapshot_id: optional long + 5: added_data_files_count: optional int + 6: existing_data_files_count: optional int + 7: deleted_data_files_count: optional int + 8: partition_summaries: optional list< + struct< + 10: contains_null: required boolean, + 11: contains_nan: required boolean, + 12: lower_bound: optional string, + 13: upper_bound: optional string + > + > +} +``` + +4. `DataFilesTable` + +```json +table { + 134: content: optional int (Contents of the file: 0=data, 1=position deletes, 2=equality deletes) + 100: file_path: required string (Location URI with FS scheme) + 101: file_format: required string (File format name: avro, orc, or parquet) + 102: partition: required struct<1000: data_bucket: optional int> (Partition data tuple, schema based on the partition spec) + 103: record_count: required long (Number of records in the file) + 104: file_size_in_bytes: required long (Total file size in bytes) + 108: column_sizes: optional map (Map of column id to total size on disk) + 109: value_counts: optional map (Map of column id to total count, including null and NaN) + 110: null_value_counts: optional map (Map of column id to null value count) + 137: nan_value_counts: optional map (Map of column id to number of NaN values in the column) + 125: lower_bounds: optional map (Map of column id to lower bound) + 128: upper_bounds: optional map (Map of column id to upper bound) + 131: key_metadata: optional binary (Encryption key metadata blob) + 132: split_offsets: optional list (Splittable offsets) + 135: equality_ids: optional list (Equality comparison field IDs) + 140: sort_order_id: optional int (Sort order ID) +} +``` + +5. `HistoryTable` + +```java +private static final Schema HISTORY_SCHEMA = new Schema( + Types.NestedField.required(1, "made_current_at", Types.TimestampType.withZone()), + Types.NestedField.required(2, "snapshot_id", Types.LongType.get()), + Types.NestedField.optional(3, "parent_id", Types.LongType.get()), + Types.NestedField.required(4, "is_current_ancestor", Types.BooleanType.get()) +); +``` + +6. `ManifestEntriesTable` + +```json +table { + 0: status: required int + 1: snapshot_id: optional long + 3: sequence_number: optional long + 2: data_file: required struct< + 134: content: optional int (Contents of the file: 0=data, 1=position deletes, 2=equality deletes), + 100: file_path: required string (Location URI with FS scheme), + 101: file_format: required string (File format name: avro, orc, or parquet), + 102: partition: required struct<1000: data_bucket: optional int> (Partition data tuple, schema based on the partition spec), + 103: record_count: required long (Number of records in the file), + 104: file_size_in_bytes: required long (Total file size in bytes), + 108: column_sizes: optional map (Map of column id to total size on disk), + 109: value_counts: optional map (Map of column id to total count, including null and NaN), + 110: null_value_counts: optional map (Map of column id to null value count), + 137: nan_value_counts: optional map (Map of column id to number of NaN values in the column), + 125: lower_bounds: optional map (Map of column id to lower bound), + 128: upper_bounds: optional map (Map of column id to upper bound), + 131: key_metadata: optional binary (Encryption key metadata blob), + 132: split_offsets: optional list (Splittable offsets), + 135: equality_ids: optional list (Equality comparison field IDs), + 140: sort_order_id: optional int (Sort order ID) + > +} +``` + +7. `ManifestsTable` + +```json +table { + 1: path: required string + 2: length: required long + 3: partition_spec_id: required int + 4: added_snapshot_id: required long + 5: added_data_files_count: required int + 6: existing_data_files_count: required int + 7: deleted_data_files_count: required int + 8: partition_summaries: required list< + struct< + 10: contains_null: required boolean, + 11: contains_nan: required boolean, + 12: lower_bound: optional string, + 13: upper_bound: optional string + > + > +} +``` + +8. `PartitionsTable` + +```java +this.schema = new Schema( + Types.NestedField.required(1, "partition", table.spec().partitionType()), + Types.NestedField.required(2, "record_count", Types.LongType.get()), + Types.NestedField.required(3, "file_count", Types.IntegerType.get()) +); +``` + +9. `SnapshotsTable` + +```java +private static final Schema SNAPSHOT_SCHEMA = new Schema( + Types.NestedField.required(1, "committed_at", Types.TimestampType.withZone()), + Types.NestedField.required(2, "snapshot_id", Types.LongType.get()), + Types.NestedField.optional(3, "parent_id", Types.LongType.get()), + Types.NestedField.optional(4, "operation", Types.StringType.get()), + Types.NestedField.optional(5, "manifest_list", Types.StringType.get()), + Types.NestedField.optional(6, "summary", + Types.MapType.ofRequired(7, 8, Types.StringType.get(), Types.StringType.get())) +); +``` + ## Inspecting with DataFrames Metadata tables can be loaded in Spark 2.4 or Spark 3 using the DataFrameReader API: diff --git a/site/docs/spec.md b/site/docs/spec.md index 6bcfd379d7f8..f6b9321d0efa 100644 --- a/site/docs/spec.md +++ b/site/docs/spec.md @@ -375,7 +375,7 @@ A snapshot consists of the following fields: | _optional_ | _optional_ | **`parent-snapshot-id`** | The snapshot ID of the snapshot's parent. Omitted for any snapshot with no parent | | | _required_ | **`sequence-number`** | A monotonically increasing long that tracks the order of changes to a table | | _required_ | _required_ | **`timestamp-ms`** | A timestamp when the snapshot was created, used for garbage collection and table inspection | -| _optional_ | _required_ | **`manifest-list`** | The location of a manifest list for this snapshot that tracks manifest files with additional meadata | +| _optional_ | _required_ | **`manifest-list`** | The location of a manifest list for this snapshot that tracks manifest files with additional metadata | | _optional_ | | **`manifests`** | A list of manifest file locations. Must be omitted if `manifest-list` is present | | _optional_ | _required_ | **`summary`** | A string map that summarizes the snapshot changes, including `operation` (see below) | diff --git a/site/mkdocs.yml b/site/mkdocs.yml index f7746bb50501..eaf27fd0e463 100644 --- a/site/mkdocs.yml +++ b/site/mkdocs.yml @@ -49,12 +49,13 @@ nav: - How to Release: how-to-release.md - Tables: - Configuration: configuration.md - - Schemas: schemas.md - - Partitioning: partitioning.md - - Table evolution: evolution.md - Maintenance: maintenance.md + - Metadata: metadata.md + - Partitioning: partitioning.md - Performance: performance.md - Reliability: reliability.md + - Schemas: schemas.md + - Table evolution: evolution.md - Spark: - Getting Started: getting-started.md - Configuration: spark-configuration.md