Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions website/docs/quick-start-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1095,6 +1095,178 @@ Currently, the result of `show partitions` is based on the filesystem table pat

:::

## Schema evolution
Schema evolution allows users to easily change the current schema of a Hudi table to adapt to the data that is changing over time.
As of 0.11.0 release, Spark SQL(spark3.1.x and spark3.2.1) DDL support for Schema evolution has been added and is experimental.

### Schema Evolution Scenarios
1) Columns (including nested columns) can be added, deleted, modified, and moved.
2) Partition columns cannot be evolved.
3) You cannot add, delete, or perform operations on nested columns of the Array type.

## SparkSQL Schema Evolution and Syntax Description
Before using schema evolution, pls set `spark.sql.extensions`. For spark3.2.1 `spark.sql.catalog.spark_catalog` also need to be set.
```shell
# Spark SQL for spark 3.1.x
spark-sql --packages org.apache.hudi:hudi-spark3.1.2-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.12:3.1.2 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'

# Spark SQL for spark 3.2.1
spark-sql --packages org.apache.hudi:hudi-spark3-bundle_2.12:0.11.1,org.apache.spark:spark-avro_2.12:3.2.1 \
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'

```
After start spark-app, pls exec `set schema.on.read.enable=true` to enable schema evolution.

:::note
Currently, Schema evolution cannot disabled once being enabled.


:::

### Adding Columns
**Syntax**
```sql
-- add columns
ALTER TABLE Table name ADD COLUMNS(col_spec[, col_spec ...])
```
**Parameter Description**

| Parameter | Description |
|-----------------|------------------------------|
| tableName | Table name |
| col_spec | Column specifications, consisting of five fields, *col_name*, *col_type*, *nullable*, *comment*, and *col_position*.|

**col_name** : name of the new column. It is mandatory.To add a sub-column to a nested column, specify the full name of the sub-column in this field.

For example:

1. To add sub-column col1 to a nested struct type column column users struct<name: string, age: int>, set this field to users.col1.

2. To add sub-column col1 to a nested map type column memeber map<string, struct<n: string, a: int>>, set this field to member.value.col1.

**col_type** : type of the new column.

**nullable** : whether the new column can be null. The value can be left empty. Now this field is not used in Hudi.

**comment** : comment of the new column. The value can be left empty.

**col_position** : position where the new column is added. The value can be *FIRST* or *AFTER* origin_col.

1. If it is set to *FIRST*, the new column will be added to the first column of the table.

2. If it is set to *AFTER* origin_col, the new column will be added after original column origin_col.

3. The value can be left empty. *FIRST* can be used only when new sub-columns are added to nested columns. Do not use *FIRST* in top-level columns. There are no restrictions about the usage of *AFTER*.

**Examples**

```sql
alter table h0 add columns(ext0 string);
alter table h0 add columns(new_col int not null comment 'add new column' after col1);
alter table complex_table add columns(col_struct.col_name string comment 'add new column to a struct col' after col_from_col_struct);
```

### Altering Columns
**Syntax**
```sql
-- alter table ... alter column
ALTER TABLE Table name ALTER [COLUMN] col_old_name TYPE column_type [COMMENT] col_comment[FIRST|AFTER] column_name
```

**Parameter Description**

| Parameter | Description |
|-----------------|------------------------------|
| tableName | Table name. |
| col_old_name | Name of the column to be altered.|
| column_type | Type of the target column.|
| col_comment | col_comment.|
| column_name | New position to place the target column. For example, *AFTER* **column_name** indicates that the target column is placed after **column_name**.|


**Examples**

```sql
--- Changing the column type
ALTER TABLE table1 ALTER COLUMN a.b.c TYPE bigint

--- Altering other attributes
ALTER TABLE table1 ALTER COLUMN a.b.c COMMENT 'new comment'
ALTER TABLE table1 ALTER COLUMN a.b.c FIRST
ALTER TABLE table1 ALTER COLUMN a.b.c AFTER x
ALTER TABLE table1 ALTER COLUMN a.b.c DROP NOT NULL
```

**column type change**

| old_type | new_type |
|-----------------|---------------------------------|
| int | long/float/double/string/decimal|
| long | double/string/decimal |
| float | double/String/decimal |
| double | string/decimal |
| decimal | decimal/string |
| string | decimal/date |
| date | string |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to add a transformation matrix like https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your remind, let me deal with it tomorow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiarixiaoyao @XuQianJin-Stars Can I land this first? I'd like to consolidate this with https://hudi.apache.org/docs/next/schema_evolution/


### Deleting Columns
**Syntax**
```sql
-- alter table ... drop columns
ALTER TABLE tableName DROP COLUMN|COLUMNS cols
```

**Examples**

```sql
ALTER TABLE table1 DROP COLUMN a.b.c
ALTER TABLE table1 DROP COLUMNS a.b.c, x, y
```

### Changing Column Name
**Syntax**
```sql
-- alter table ... rename column
ALTER TABLE tableName RENAME COLUMN old_columnName TO new_columnName
```

**Examples**

```sql
ALTER TABLE table1 RENAME COLUMN a.b.c TO x
```

### Modifying Table Properties
**Syntax**
```sql
-- alter table ... set|unset
ALTER TABLE Table name SET|UNSET tblproperties
```

**Examples**

```sql
ALTER TABLE table SET TBLPROPERTIES ('table_property' = 'property_value')
ALTER TABLE table UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key')
```

### Changing a Table Name
**Syntax**
```sql
-- alter table ... rename
ALTER TABLE tableName RENAME TO newTableName
```

**Examples**

```sql
ALTER TABLE table1 RENAME TO table2
```

## Where to go from here?

You can also do the quickstart by [building hudi yourself](https://github.com/apache/hudi#building-apache-hudi-from-source),
Expand Down