diff --git a/format/spec.md b/format/spec.md index 1154cb74484e..62d3a889d336 100644 --- a/format/spec.md +++ b/format/spec.md @@ -665,9 +665,37 @@ Table metadata consists of the following fields: | _optional_ | _required_ | **`sort-orders`**| A list of sort orders, stored as full sort order objects. | | _optional_ | _required_ | **`default-sort-order-id`**| Default sort order id of the table. Note that this could be used by writers, but is not used when reading because reads use the specs stored in manifest files. | | | _optional_ | **`refs`** | A map of snapshot references. The map keys are the unique snapshot reference names in the table, and the map values are snapshot reference objects. There is always a `main` branch reference pointing to the `current-snapshot-id` even if the `refs` map is null. | +| _optional_ | _optional_ | **`statistics`** | A list (optional) of [table statistics](#table-statistics). | For serialization details, see Appendix C. +#### Table statistics + +Table statistics files are valid [Puffin files](../puffin-spec). Statistics are informational. A reader can choose to +ignore statistics information. Statistics support is not required to read the table correctly. A table can contain +many statistics files associated with different table snapshots. + +Statistics files metadata within `statistics` table metadata field is a struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +|----|----|------------|------|-------------| +| _required_ | _required_ | **`snapshot-id`** | `string` | ID of the Iceberg table's snapshot the statistics were computed from. | +| _required_ | _required_ | **`statistics-path`** | `string` | Path of the statistics file. See [Puffin file format](../puffin-spec). | +| _required_ | _required_ | **`file-size-in-bytes`** | `long` | Size of the statistics file. | +| _required_ | _required_ | **`file-footer-size-in-bytes`** | `long` | Total size of the statistics file's footer (not the footer payload size). See [Puffin file format](../puffin-spec) for footer definition. | +| _optional_ | _optional_ | **`key-metadata`** | Base64-encoded implementation-specific key metadata for encryption. | +| _required_ | _required_ | **`blob-metadata`** | `list` (see below) | A list of the blob metadata for statistics contained in the file with structure described below. | + +Blob metadata is a struct with the following fields: + +| v1 | v2 | Field name | Type | Description | +|----|----|------------|------|-------------| +| _required_ | _required_ | **`type`** | `string` | Type of the blob. Matches Blob type in the Puffin file. | +| _required_ | _required_ | **`snapshot-id`** | `long` | ID of the Iceberg table's snapshot the blob was computed from. | +| _required_ | _required_ | **`sequence-number`** | `long` | Sequence number of the Iceberg table's snapshot the blob was computed from. | +| _required_ | _required_ | **`fields`** | `list` | Ordered list of fields, given by field ID, on which the statistic was calculated. | +| _optional_ | _optional_ | **`properties`** | `map` | Additional properties associated with the statistic. Subset of Blob properties in the Puffin file. | + #### Commit Conflict Resolution and Retry