Skip to content

Commit 365c7ec

Browse files
committed
address the comments
1 parent 76985e1 commit 365c7ec

File tree

1 file changed

+15
-7
lines changed

1 file changed

+15
-7
lines changed

site/docs/spec.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ Partition specs capture the transform from table data to partition values. This
170170
| **`month`** | Extract a date or timestamp month, as months from 1970-01-01 | `date`, `timestamp(tz)` | `int` |
171171
| **`day`** | Extract a date or timestamp day, as days from 1970-01-01 | `date`, `timestamp(tz)` | `date` |
172172
| **`hour`** | Extract a timestamp hour, as hours from 1970-01-01 00:00:00 | `timestamp(tz)` | `int` |
173+
| **`alwaysNull`** | Always produces `null` (the void transform) | Any | `void` |
173174

174175
All transforms must return `null` for a `null` input value.
175176

@@ -646,18 +647,25 @@ Each partition field in the fields list is stored as an object. See the table fo
646647

647648
In some cases partition specs are stored using only the field list instead of the object format that includes the spec ID, like the deprecated `partition-spec` field in table metadata. The object format should be used unless otherwise noted in this spec.
648649

649-
#### Partition Field ID handling
650+
#### Partition Field ID Handling
650651

651-
A partition field id is an integer (starting at 1000) used to identify a partition field.
652+
A partition field ID is an integer used to identify a partition field.
653+
These IDs should not conflict with the IDs required for the other fields in data files.
654+
To avoid that, iceberg assigns partition field IDs starting at 1000.
652655

653-
Since iceberg release 0.8.0, partition fields are present in every partition field of partition specs in a table metadata.
656+
The requirements below are for different versions of tables:
654657

655-
* For backward compatibility, if field ids are missing in a table metadata, iceberg will sequentially generate ids for each field starting at 1000 based on its position in the list of fields.
656-
* For forward compatibility, if field ids are not supported, iceberg will ignore field ids.
658+
* For v1 tables, partition field metadata should include a field id for each partition field, but this is not required.
659+
* For v2 tables, partition field metadata must include a field id for each partition field. Partition field IDs are unique across partition specs to support the partition spec evolution for a given table.
660+
661+
To remove partition fields from the partition spec in an existing v1 table, it is recommended not removing fields but replacing their transforms with `alwaysNull`.
662+
Otherwise, partition spec evolution will break because a partition field ID might be assigned to multiple different partition fields during partition spec evolution for a given table.
657663

658-
Additionally, in table metadata format v2, partition fields are required to have unique field IDs to support partition spec evolution.
664+
For compatibility between v1 and v2 tables:
665+
666+
* For backward compatibility, if field ids are missing in a table metadata, iceberg will sequentially generate ids for each field starting at 1000 based on its position in the list of fields.
667+
* For forward compatibility, if field ids are not supported but present in the metadata, old versions of the reference implementation will ignore those field ids and then regenerate an auto-increment field id starting at 1000 for every partition field.
659668

660-
For tables without partition field IDs, iceberg will generate an auto-increment unique field id starting at 1000 for every partition field.
661669

662670
### Table Metadata and Snapshots
663671

0 commit comments

Comments
 (0)