diff --git a/website/docs/schema_evolution.md b/website/docs/schema_evolution.md index 8fe04d6523875..6597bb31253b6 100755 --- a/website/docs/schema_evolution.md +++ b/website/docs/schema_evolution.md @@ -22,21 +22,39 @@ the previous schema (e.g., renaming a column). Furthermore, the evolved schema is queryable across high-performance engines like Presto and Spark SQL without additional overhead for column ID translations or type reconciliations. The following table summarizes the schema changes compatible with different Hudi table types. -| Schema Change | COW | MOR | Remarks | -|:---------------------------------------------------------------------------------|:---------|:--------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Add a new nullable column at root level at the end | Yes | Yes | `Yes` means that a write with evolved schema succeeds and a read following the write succeeds to read entire dataset. | -| Add a new nullable column to inner struct (at the end) | Yes | Yes | -| Add a new complex type field with default (map and array) | Yes | Yes | | -| Add a new nullable column and change the ordering of fields | No | No | Write succeeds but read fails if the write with evolved schema updated only some of the base files but not all. Currently, Hudi does not maintain a schema registry with history of changes across base files. Nevertheless, if the upsert touched all base files then the read will succeed. | -| Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col` | Yes | Yes | | -| Promote datatype from `int` to `long` for a field at root level | Yes | Yes | For other types, Hudi supports promotion as specified in [Avro schema resolution](http://avro.apache.org/docs/current/spec#Schema+Resolution). | -| Promote datatype from `int` to `long` for a nested field | Yes | Yes | -| Promote datatype from `int` to `long` for a complex type (value of map or array) | Yes | Yes | | -| Add a new non-nullable column at root level at the end | No | No | In case of MOR table with Spark data source, write succeeds but read fails. As a **workaround**, you can make the field nullable. | -| Add a new non-nullable column to inner struct (at the end) | No | No | | -| Change datatype from `long` to `int` for a nested field | No | No | | -| Change datatype from `long` to `int` for a complex type (value of map or array) | No | No | | - +The incoming schema will automatically have missing columns added with null values from the table schema. +For this we need to enable the following config +`hoodie.write.handle.missing.cols.with.lossless.type.promotion`, otherwise the pipeline will fail. Note: This particular config will also do best effort to solve some of the backward incompatible +type promotions eg., 'long' to 'int'. + +| Schema Change | COW | MOR | Remarks | +|:----------------------------------------------------------------|:----|:----|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Add a new nullable column at root level at the end | Yes | Yes | `Yes` means that a write with evolved schema succeeds and a read following the write succeeds to read entire dataset. | +| Add a new nullable column to inner struct (at the end) | Yes | Yes | | +| Add a new complex type field with default (map and array) | Yes | Yes | | +| Add a new nullable column and change the ordering of fields | No | No | Write succeeds but read fails if the write with evolved schema updated only some of the base files but not all. Currently, Hudi does not maintain a schema registry with history of changes across base files. Nevertheless, if the upsert touched all base files then the read will succeed. | +| Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col` | Yes | Yes | | +| Promote datatype for a field at root level | Yes | Yes | | +| Promote datatype for a nested field | Yes | Yes | | +| Promote datatype for a complex type (value of map or array) | Yes | Yes | | +| Add a new non-nullable column at root level at the end | No | No | In case of MOR table with Spark data source, write succeeds but read fails. As a **workaround**, you can make the field nullable. | +| Add a new non-nullable column to inner struct (at the end) | No | No | | +| Demote datatype for a field at root level | No | No | | +| Demote datatype for a nested field | No | No | | +| Demote datatype for a complex type (value of map or array) | No | No | | + +###Type Promotions + +The incoming schema will automatically have types promoted to match the table schema + +| Incoming Schema \ Table Schema | int | long | float | double | string | bytes | +|---------------------------------|-------|-------|--------|--------|---------|---------| +| int | Y | Y | Y | Y | Y | N | +| long | N | Y | Y | Y | Y | N | +| float | N | N | Y | Y | Y | N | +| double | N | N | N | Y | Y | N | +| string | N | N | N | N | Y | Y | +| bytes | N | N | N | N | Y | Y | ## Schema Evolution on read