diff --git a/site/docs/spark.md b/site/docs/spark.md index 2bbdad932b40..300849043826 100644 --- a/site/docs/spark.md +++ b/site/docs/spark.md @@ -106,6 +106,8 @@ CREATE TABLE prod.db.sample ( USING iceberg ``` +Iceberg will convert the column type in Spark to corresponding Iceberg type. Please check the section of [type compatibility on creating table](#spark-type-to-iceberg-type) for details. + Table create commands, including CTAS and RTAS, support the full range of Spark create clauses, including: * `PARTITION BY (partition-expressions)` to configure partitioning @@ -728,3 +730,62 @@ spark.read.format("iceberg").load("db.table.files").show(truncate = false) // Hadoop path table spark.read.format("iceberg").load("hdfs://nn:8020/path/to/table#files").show(truncate = false) ``` + +## Type compatibility + +Spark and Iceberg support different set of types. Iceberg does the type conversion automatically, but not for all combinations, +so you may want to understand the type conversion in Iceberg in prior to design the types of columns in your tables. + +### Spark type to Iceberg type + +This type conversion table describes how Spark types are converted to the Iceberg types. The conversion applies on both creating Iceberg table and writing to Iceberg table via Spark. + +| Spark | Iceberg | Notes | +|-----------------|-------------------------|-------| +| boolean | boolean | | +| short | integer | | +| byte | integer | | +| integer | integer | | +| long | long | | +| float | float | | +| double | double | | +| date | date | | +| timestamp | timestamp with timezone | | +| char | string | | +| varchar | string | | +| string | string | | +| binary | binary | | +| decimal | decimal | | +| struct | struct | | +| array | list | | +| map | map | | + +!!! Note + The table is based on representing conversion during creating table. In fact, broader supports are applied on write. Here're some points on write: + + * Iceberg numeric types (`integer`, `long`, `float`, `double`, `decimal`) support promotion during writes. e.g. You can write Spark types `short`, `byte`, `integer`, `long` to Iceberg type `long`. + * You can write to Iceberg `fixed` type using Spark `binary` type. Note that assertion on the length will be performed. + +### Iceberg type to Spark type + +This type conversion table describes how Iceberg types are converted to the Spark types. The conversion applies on reading from Iceberg table via Spark. + +| Iceberg | Spark | Note | +|----------------------------|-------------------------|---------------| +| boolean | boolean | | +| integer | integer | | +| long | long | | +| float | float | | +| double | double | | +| date | date | | +| time | | Not supported | +| timestamp with timezone | timestamp | | +| timestamp without timezone | | Not supported | +| string | string | | +| uuid | string | | +| fixed | binary | | +| binary | binary | | +| decimal | decimal | | +| struct | struct | | +| list | array | | +| map | map | |