diff --git a/website/docs/quick-start-guide.md b/website/docs/quick-start-guide.md index a31328dd9194a..c69cd62768ebd 100644 --- a/website/docs/quick-start-guide.md +++ b/website/docs/quick-start-guide.md @@ -1095,6 +1095,178 @@ Currently, the result of `show partitions` is based on the filesystem table pat ::: +## Schema evolution +Schema evolution allows users to easily change the current schema of a Hudi table to adapt to the data that is changing over time. +As of 0.11.0 release, Spark SQL(spark3.1.x and spark3.2.1) DDL support for Schema evolution has been added and is experimental. + +### Schema Evolution Scenarios +1) Columns (including nested columns) can be added, deleted, modified, and moved. +2) Partition columns cannot be evolved. +3) You cannot add, delete, or perform operations on nested columns of the Array type. + +## SparkSQL Schema Evolution and Syntax Description +Before using schema evolution, pls set `spark.sql.extensions`. For spark3.2.1 `spark.sql.catalog.spark_catalog` also need to be set. +```shell +# Spark SQL for spark 3.1.x +spark-sql --packages org.apache.hudi:hudi-spark3.1.2-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.12:3.1.2 \ +--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \ +--conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' + +# Spark SQL for spark 3.2.1 +spark-sql --packages org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.12:3.2.1 \ +--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \ +--conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \ +--conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog' + +``` +After start spark-app, pls exec `set schema.on.read.enable=true` to enable schema evolution. + +:::note +Currently, Schema evolution cannot disabled once being enabled. + + +::: + +### Adding Columns +**Syntax** +```sql +-- add columns +ALTER TABLE Table name ADD COLUMNS(col_spec[, col_spec ...]) +``` +**Parameter Description** + +| Parameter | Description | +|-----------------|------------------------------| +| tableName | Table name | +| col_spec | Column specifications, consisting of five fields, *col_name*, *col_type*, *nullable*, *comment*, and *col_position*.| + +**col_name** : name of the new column. It is mandatory.To add a sub-column to a nested column, specify the full name of the sub-column in this field. + +For example: + +1. To add sub-column col1 to a nested struct type column column users struct, set this field to users.col1. + +2. To add sub-column col1 to a nested map type column memeber map>, set this field to member.value.col1. + +**col_type** : type of the new column. + +**nullable** : whether the new column can be null. The value can be left empty. Now this field is not used in Hudi. + +**comment** : comment of the new column. The value can be left empty. + +**col_position** : position where the new column is added. The value can be *FIRST* or *AFTER* origin_col. + +1. If it is set to *FIRST*, the new column will be added to the first column of the table. + +2. If it is set to *AFTER* origin_col, the new column will be added after original column origin_col. + +3. The value can be left empty. *FIRST* can be used only when new sub-columns are added to nested columns. Do not use *FIRST* in top-level columns. There are no restrictions about the usage of *AFTER*. + +**Examples** + +```sql +alter table h0 add columns(ext0 string); +alter table h0 add columns(new_col int not null comment 'add new column' after col1); +alter table complex_table add columns(col_struct.col_name string comment 'add new column to a struct col' after col_from_col_struct); +``` + +### Altering Columns +**Syntax** +```sql +-- alter table ... alter column +ALTER TABLE Table name ALTER [COLUMN] col_old_name TYPE column_type [COMMENT] col_comment[FIRST|AFTER] column_name +``` + +**Parameter Description** + +| Parameter | Description | +|-----------------|------------------------------| +| tableName | Table name. | +| col_old_name | Name of the column to be altered.| +| column_type | Type of the target column.| +| col_comment | col_comment.| +| column_name | New position to place the target column. For example, *AFTER* **column_name** indicates that the target column is placed after **column_name**.| + + +**Examples** + +```sql +--- Changing the column type +ALTER TABLE table1 ALTER COLUMN a.b.c TYPE bigint + +--- Altering other attributes +ALTER TABLE table1 ALTER COLUMN a.b.c COMMENT 'new comment' +ALTER TABLE table1 ALTER COLUMN a.b.c FIRST +ALTER TABLE table1 ALTER COLUMN a.b.c AFTER x +ALTER TABLE table1 ALTER COLUMN a.b.c DROP NOT NULL +``` + +**column type change** + +| old_type | new_type | +|-----------------|---------------------------------| +| int | long/float/double/string/decimal| +| long | double/string/decimal | +| float | double/String/decimal | +| double | string/decimal | +| decimal | decimal/string | +| string | decimal/date | +| date | string | + +### Deleting Columns +**Syntax** +```sql +-- alter table ... drop columns +ALTER TABLE tableName DROP COLUMN|COLUMNS cols +``` + +**Examples** + +```sql +ALTER TABLE table1 DROP COLUMN a.b.c +ALTER TABLE table1 DROP COLUMNS a.b.c, x, y +``` + +### Changing Column Name +**Syntax** +```sql +-- alter table ... rename column +ALTER TABLE tableName RENAME COLUMN old_columnName TO new_columnName +``` + +**Examples** + +```sql +ALTER TABLE table1 RENAME COLUMN a.b.c TO x +``` + +### Modifying Table Properties +**Syntax** +```sql +-- alter table ... set|unset +ALTER TABLE Table name SET|UNSET tblproperties +``` + +**Examples** + +```sql +ALTER TABLE table SET TBLPROPERTIES ('table_property' = 'property_value') +ALTER TABLE table UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key') +``` + +### Changing a Table Name +**Syntax** +```sql +-- alter table ... rename +ALTER TABLE tableName RENAME TO newTableName +``` + +**Examples** + +```sql +ALTER TABLE table1 RENAME TO table2 +``` + ## Where to go from here? You can also do the quickstart by [building hudi yourself](https://github.com/apache/hudi#building-apache-hudi-from-source),