-
Notifications
You must be signed in to change notification settings - Fork 3k
Docs: Add WRITE LOCALLY ORDERED BY and WRITE DISTRIBUTED BY in spark-ddl.md
#3820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
In addition, I would like to know whether the partition field can be set by WRITE DISTRIBUTED BY, how we designed it |
site/docs/spark-ddl.md
Outdated
| ``` | ||
| ### `ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION` | ||
|
|
||
| Iceberg tables can be configured with a hash distribution where tuples that share the same values for clustering expressions are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirement is to distribute by partition. Hash distribution is an implementation detail. Instead, I think this should state that WRITE DISTRIBUTED BY PARTITION will guarantee that a given partition is handled by one writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okey
site/docs/spark-ddl.md
Outdated
| !!! Note | ||
| Table write order does not guarantee data order for queries. It only affects how data is written to the table. | ||
|
|
||
| Only local sorting can be set at the same time, use `LOCALLY ORDERED BY` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should first state that WRITE ORDERED BY sets a global ordering where rows are ordered across tasks, like using ORDER BY in an INSERT command. Then introduce LOCALLY ORDERED BY to order within each task but not across tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
|
Thanks, @xiaotianzhang01! |
* apache/iceberg#3723 * apache/iceberg#3732 * apache/iceberg#3749 * apache/iceberg#3766 * apache/iceberg#3787 * apache/iceberg#3796 * apache/iceberg#3809 * apache/iceberg#3820 * apache/iceberg#3878 * apache/iceberg#3890 * apache/iceberg#3892 * apache/iceberg#3944 * apache/iceberg#3976 * apache/iceberg#3993 * apache/iceberg#3996 * apache/iceberg#4008 * apache/iceberg#3758 and 3856 * apache/iceberg#3761 * apache/iceberg#2062 * apache/iceberg#3422 * remove restriction related to legacy parquet file list
Enrich the description of the syntax of
write distribution and orderingbased on alter table.