-
Notifications
You must be signed in to change notification settings - Fork 3k
Docs: update spark doc about incremental scan #3796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc: @rdblue |
site/docs/spark-queries.md
Outdated
| Currently gets only the data from `append` operation. Cannot support `replace`, `overwrite`, `delete` operations yet. | ||
| Works with both V1 and V2 format-version. | ||
|
|
||
| Incremental read is not yet supported by Spark's SQL syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove "yet" because it is unclear whether it will be supported.
site/docs/spark-queries.md
Outdated
| ``` | ||
|
|
||
| !!! Note | ||
| Currently gets only the data from `append` operation. Cannot support `replace`, `overwrite`, `delete` operations yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want this to be in a note box, it needs to be indented with 4 spaces.
site/docs/spark-queries.md
Outdated
|
|
||
| !!! Note | ||
| Currently gets only the data from `append` operation. Cannot support `replace`, `overwrite`, `delete` operations yet. | ||
| Works with both V1 and V2 format-version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part of the note or a separate paragraph? Also, could you expand this to be a complete sentence?
site/docs/spark-queries.md
Outdated
| * `end-snapshot-id` End snapshot ID used in incremental scans (inclusive) | ||
|
|
||
| ```scala | ||
| // get the data added after start-snapshot-id (10963874102873L) till end-snapshot-id (63874143573109L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "till" should be "until"
site/docs/spark-queries.md
Outdated
|
|
||
| ### Incremental read | ||
|
|
||
| To read incremental data between the snapshots, Configure below Spark read options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "To read appended data incrementally, use:"
site/docs/spark-queries.md
Outdated
| To read incremental data between the snapshots, Configure below Spark read options: | ||
|
|
||
| * `start-snapshot-id` Start snapshot ID used in incremental scans (exclusive) | ||
| * `end-snapshot-id` End snapshot ID used in incremental scans (inclusive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is optional. Omitting it will default to the current snapshot.
|
@rdblue : Thanks for the review. I have handled the comments. |
|
Thanks, @ajantha-bhat! |
* apache/iceberg#3723 * apache/iceberg#3732 * apache/iceberg#3749 * apache/iceberg#3766 * apache/iceberg#3787 * apache/iceberg#3796 * apache/iceberg#3809 * apache/iceberg#3820 * apache/iceberg#3878 * apache/iceberg#3890 * apache/iceberg#3892 * apache/iceberg#3944 * apache/iceberg#3976 * apache/iceberg#3993 * apache/iceberg#3996 * apache/iceberg#4008 * apache/iceberg#3758 and 3856 * apache/iceberg#3761 * apache/iceberg#2062 * apache/iceberg#3422 * remove restriction related to legacy parquet file list
Some users in the slack are exploring incremental read in spark and we don't have document for the same. Hence this PR.