diff --git a/docs/common/format/spec.md b/docs/common/format/spec.md index 0aa198df9482..51aaee71554b 100644 --- a/docs/common/format/spec.md +++ b/docs/common/format/spec.md @@ -787,7 +787,7 @@ Manifests hold the same statistics for delete files and data files. For delete f Values should be stored in Avro using the Avro types and logical type annotations in the table below. -Optional fields, array elements, and map values must be wrapped in an Avro `union` with `null`. This is the only union type allowed in Iceberg data files. +Optional fields, array elements, and map values must be wrapped in an Avro `union` with `null`. Optional fields must always set the Avro field default value to null. @@ -809,10 +809,12 @@ Maps with non-string keys must use an array representation with the `map` logica |**`uuid`**|`{ "type": "fixed",`
  `"size": 16,`
  `"logicalType": "uuid" }`|| |**`fixed(L)`**|`{ "type": "fixed",`
  `"size": L }`|| |**`binary`**|`bytes`|| -|**`struct`**|`record`|| +|**`struct`**|`record`, or `union`|| |**`list`**|`array`|| |**`map`**|`array` of key-value records, or `map` when keys are strings (optional).|Array storage must use logical type name `map` and must store elements that are 2-field records. The first field is a non-null key and the second field is the value.| +Notes: +1. Complex union type (`union` type which has more than one non-null schemas) is read as 'struct' in Iceberg type. For example, `[type1, type2]` in Avro is read as `required struct<1: tag: required int, 2: field0: optional type1, 3: field1: optional type2>` in Iceberg. `type1` and `type2` can be any allowed schemas in Avro `union`. Single union type (`union` type which has only one schema) is read as corresponding schema type in Iceberg type. For example, `[type]` in Avro is read as `type` in Iceberg. `type` can be any allowed schema in Avro `union`. **Field IDs** @@ -880,7 +882,7 @@ Lists must use the [3-level representation](https://github.com/apache/parquet-fo | **`uuid`** | `binary` | `iceberg.binary-type`=`UUID` | | | **`fixed(L)`** | `binary` | `iceberg.binary-type`=`FIXED` & `iceberg.length`=`L` | The length would not be checked by the ORC reader and should be checked by the adapter. | | **`binary`** | `binary` | | | -| **`struct`** | `struct` | | | +| **`struct`** | `struct` or `union` | | | | **`list`** | `array` | | | | **`map`** | `map` | | | @@ -899,6 +901,8 @@ Iceberg would build the desired reader schema with their schema evolution rules |`struct`|`struct`|`struct`|`struct`| |`struct>`|`struct>`|`struct>`|`struct>`| +2. Complex union type (`union` type which has more than one schemas) is read as 'struct' in Iceberg type. For example, `union` in ORC is read as `required struct<1: tag: required int, 2: field0: optional type1, 3: field1: optional type2>` in Iceberg. `type1` and `type2` can be any allowed schemas in ORC `union`. Single union type (`union` type which has only one schema) is read as corresponding schema type in Iceberg type. For example, `union` in ORC is read as `type` in Iceberg. `type` can be any allowed schema in ORC `union`. + ## Appendix B: 32-bit Hash Requirements The 32-bit hash implementation is 32-bit Murmur3 hash, x86 variant, seeded with 0.