-
Notifications
You must be signed in to change notification settings - Fork 3k
Adding documentation for metadata tables #3159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| # Metadata Tables | ||
|
|
||
| This page describes the internal metadata tables maintained by Iceberg. Please refer to [definitions page](terms.md) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: prefer to change line on full sentence.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: the definitions page |
||
| for more information on terms and definitions and the [specifications page](spec.md) for more information on Iceberg's | ||
| table specification. Complete metadata table schema can be found on the [Spark Queries page](spark-queries.md#metadata-table-schema). | ||
|
|
||
| | Name | Description | | ||
| | --------------------------------------------------| ------------| | ||
| | [`AllDataFilesTable`](#AllDataFilesTable) | Contains rows representing all of the data files in the table. Each row will contain metadata as well as path information stored by the Iceberg. This differs from the `DataFilesTable` because it contains all files currently referenced by any existing Snapshot from this table rather than just the current one. | ||
| | [`AllEntriesTable`](#AllEntriesTable) | Contains a table's manifest entries as rows, for both delete and data files. Please note that this table exposes internal details, like files that have been deleted. For a table of the live data files, please use `DataFilesTable`. | ||
| | [`AllManifestsTable`](#AllManifestsTable) | Contains a table's valid manifest files as rows. A valid manifest file is referenced from any snapshot currently tracked by the table. This table may contain duplicate rows. | ||
| | [`DataFilesTable`](#DataFilesTable) | Contains a table's data files as rows. | ||
| | [`HistoryTable`](#HistoryTable) | Contains a table's history as rows. History is based on the table's snapshot log, which logs each update to the table's current snapshot. | ||
| | [`ManifestEntriesTable`](#ManifestEntriesTable) | Contains a table's manifest entries as rows, for both delete and data files. Please note that this table exposes internal details, like files that have been deleted. For a table of the live data files, please use `DataFilesTable`. | ||
| | [`ManifestsTable`](#ManifestsTable) | Contains a table's manifest files as rows. | ||
| | [`PartitionsTable`](#PartitionsTable) | Contains a table's partitions as rows. | ||
| | [`SnapshotsTable`](#SnapshotsTable) | Contains a table's known snapshots as rows. This does not include snapshots that have been expired using [`ExpireSnapshots`](https://iceberg.apache.org/javadoc/master/org/apache/iceberg/ExpireSnapshots.html). | ||
|
|
||
|
|
||
| ## Table Schema | ||
|
|
||
| ### <a id="AllDataFilesTable"></a> 1. `AllDataFilesTable` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should use the actual names of the tables, instead of the class name, like files, manifests, entries, etc.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the use of this HTML tag?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Jack on the section names. |
||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |-----------------------|-----------|-------------------|-------------| | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. missing a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need IDs? I'm not sure those are valuable to users. |
||
| | content | | int | Contents of the file: 0=data, 1=position deletes, 2=equality deletes | ||
| | file_path | ✔️ | string | Location URI with FS scheme | ||
| | file_format | ✔️ | string | File format name: avro, orc, or parquet | ||
| | partition | ✔️ | `struct<...>` | Partition data tuple, schema based on the partition spec | ||
| | record_count | ✔️ | long | Number of records in the file | ||
| | file_size_in_bytes | ✔️ | long | Total file size in bytes | ||
| | column_sizes | ️ | `map<int, long>` | Map of column id to total size on disk | ||
| | value_counts | ️ | `map<int, long>` | Map of column id to total count, including null and NaN | ||
| | null_value_counts | ️ | `map<int, long>` | Map of column id to null value count | ||
| | nan_value_counts | | `map<int, long>` | Map of column id to number of NaN values in the column | ||
| | lower_bounds | | `map<int, binary>`| Map of column id to lower bound | ||
| | upper_bounds | | `map<int, binary>`| Map of column id to upper bound | ||
| | key_metadata | | binary | Encryption key metadata blob | ||
| | split_offsets | | `list<long>` | Splittable offsets | ||
| | equality_ids | | `list<int>` | Equality comparison field IDs | ||
| | sort_order_id | | int | Sort order ID | ||
|
|
||
| ### <a id="AllEntriesTable"></a> 2. `AllEntriesTable` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Anchors are generated automatically, so no need to add them in the markdown. |
||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |-------------------|----------|------------------------|-------------| | ||
| | status | ✔️ | int | Used to track additions and deletions: `0: EXISTING` `1: ADDED` `2: DELETED` | ||
| | snapshot_id | | long | Snapshot id where the file was added, or deleted if status is 2. Inherited when null. | ||
| | sequence_number | | long | Sequence number when the file was added. Inherited when null. | ||
| | data_file | ✔️ | `data_file` `struct` | File path, partition tuple, metrics, ... | ||
|
|
||
| ### <a id="AllManifestsTable"></a> 3. `AllManifestsTable` | ||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |---------------------------|----------|--------------------|-------------| | ||
| | path | ✔️ | string | Location of the manifest file | ||
| | length | ✔️ | long | Length of the manifest file | ||
| | partition_spec_id | | int | ID of a partition spec used to write the manifest; must be listed in table metadata `partition-specs` | ||
| | added_snapshot_id | | long | ID of the snapshot where the manifest file was added | ||
| | added_data_files_count | | int | Number of entries in the manifest that have status `ADDED` (1), when `null` this is assumed to be non-zero | ||
| | existing_data_files_count | | int | Number of entries in the manifest that have status `EXISTING` (0), when `null` this is assumed to be non-zero | ||
| | deleted_data_files_count | | int | Number of entries in the manifest that have status `DELETED` (2), when `null` this is assumed to be non-zero | ||
| | partition_summaries | | `list<struct<...>>`| Partition summary information: contains null/nan, optional lower and upper bounds | ||
|
|
||
| ### <a id="DataFilesTable"></a> 4. `DataFilesTable` | ||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |-----------------------|-------|-------------------|-------------| | ||
| | content | | int | Contents of the file: 0=data, 1=position deletes, 2=equality deletes | ||
| | file_path | ✔️ | string | Location URI with FS scheme | ||
| | file_format | ✔️ | string | File format name: avro, orc, or parquet | ||
| | partition | ✔️ | `struct<...>` | Partition data tuple, schema based on the partition spec | ||
| | record_count | ✔️ | long | Number of records in the file | ||
| | file_size_in_bytes | ✔️ | long | Total file size in bytes | ||
| | column_sizes | ️ | `map<int, long>` | Map of column id to total size on disk | ||
| | value_counts | ️ | `map<int, long>` | Map of column id to total count, including null and NaN | ||
| | null_value_counts | ️ | `map<int, long>` | Map of column id to null value count | ||
| | nan_value_counts | | `map<int, long>` | Map of column id to number of NaN values in the column | ||
| | lower_bounds | | `map<int, binary>`| Map of column id to lower bound | ||
| | upper_bounds | | `map<int, binary>`| Map of column id to upper bound | ||
| | key_metadata | | binary | Encryption key metadata blob | ||
| | split_offsets | | `list<long>` | Splittable offsets | ||
| | equality_ids | | `list<int>` | Equality comparison field IDs | ||
| | sort_order_id | | int | Sort order ID | ||
|
|
||
| ### <a id="HistoryTable"></a> 5. `HistoryTable` | ||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |-----------------------|-----------|-----------|-------------| | ||
| | made_current_at | ✔️ | timstampz | Timestamp (with timezone) when this snapshot was promoted to current, i.e. when the first writer to this snapshot committed. | ||
| | snapshot_id | ✔️ | long | A unique ID | ||
| | parent_id | | long | ID of parent snapshot | ||
| | is_current_ancestor | ✔️ | boolean | True if if this snapshot is ancestor of current; false otherwise | ||
|
|
||
| ### <a id="ManifestEntriesTable"></a> 6. `ManifestEntriesTable` | ||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |-------------------|----------|------------------------|-------------| | ||
| | status | ✔️ | int | Used to track additions and deletions: `0: EXISTING` `1: ADDED` `2: DELETED` | ||
| | snapshot_id | | long | Snapshot id where the file was added, or deleted if status is 2. Inherited when null. | ||
| | sequence_number | | long | Sequence number when the file was added. Inherited when null | ||
| | data_file | ✔️ | `data_file` `struct` | File path, partition tuple, metrics, ... | ||
|
|
||
| ### <a id="ManifestsTable"></a> 7. `ManifestsTable` | ||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |---------------------------|----------|--------------------|-------------| | ||
| | path | ✔️ | string | Location of the manifest file | ||
| | length | ✔️ | long | Length of the manifest file | ||
| | partition_spec_id | ✔️ | int | ID of a partition spec used to write the manifest; must be listed in table metadata `partition-specs` | ||
| | added_snapshot_id | ✔️ | long | ID of the snapshot where the manifest file was added | ||
| | added_data_files_count | ✔️ | int | Number of entries in the manifest that have status `ADDED` (1), when `null` this is assumed to be non-zero | ||
| | existing_data_files_count | ✔️ | int | Number of entries in the manifest that have status `EXISTING` (0), when `null` this is assumed to be non-zero | ||
| | deleted_data_files_count | ✔️ | int | Number of entries in the manifest that have status `DELETED` (2), when `null` this is assumed to be non-zero | ||
| | partition_summaries | ✔️ | `list<struct<...>>`| Partition summary information: contains null/nan, optional lower and upper bounds | ||
|
|
||
| ### <a id="PartitionsTable"></a> 8. `PartitionsTable` | ||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |---------------|----------|----------------|-------------| | ||
| | partition | ✔️ | `struct<...>` | The table partition spec determined by partition type | ||
| | record_count | ✔️ | long | Aggregated number of records in this partition | ||
| | file_count | ✔️ | int | Total number of data files in this partition | ||
|
|
||
| ### <a id="SnapshotsTable"></a> 9. `SnapshotsTable` | ||
|
|
||
| | Column name | Required | Data type | Description | | ||
| |---------------------------|-------------------------------|-------------| | ||
| | committed_at | ✔️ | timestampz | Commit timestamp with timezone | ||
| | snapshot_id | ✔️ | long | A unique ID | ||
| | parent_id | | long | The snapshot ID of the snapshot's parent. Omitted for any snapshot with no parent | ||
| | operation | | string | Used by some operations, like snapshot expiration, to skip processing certain snapshots. Possible `operation` values are: `append`, `replace`, `overwrite`, `delete` | ||
| | manifest_list | | string | The location of a manifest list for this snapshot that tracks manifest files with additional meadata | ||
| | summary | | `map<string, string>` | A string map that summarizes the snapshot changes | | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this documentation? Much of this is already covered in the Spark queries page, which this links to for docs on how to query the metadata tables.
Who is the intended audience for these docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is not a Spark only feature, the intention is to make it a top level documentation, and have proper tables for the metadata table schema instead of showing Java code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a pretty good argument against adding this, though. Documenting what tables exist without an engine is confusing to anyone looking for how to view metadata tables. Just saying that these tables exist and what their schemas are doesn't help a user coming to the docs.
If this were part of a document on the API that explained how to use the metadata tables and was targeted at engine developers, I think it would be more valuable. But I think placing it under tables will just cause confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree that is a confusion point. So do you recommend having one section for each engine around system tables? My current thought is to have this page introducing the Iceberg schema, linking to related sections in each engine page for examples of using system tables.
One of my intention to add it was because this is a very important feature not exist in other similar products, and it provides huge benefits for users to build data management capabilities around such information. Tables like
manifestsandfilesalso support optimizations like predicate pushdown and file pruning, which essentially solves the big metadata issue. So I feel it's a pity to hide such information too deeply.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think for now we should include this for each engine. They do expose the tables differently (like Trino's
$syntax) so I don't see a lot of value in splitting docs into common and engine-specific. That just makes it harder to find what you're looking for.