Skip to content

Conversation

@ajantha-bhat
Copy link
Member

Some users in the slack are exploring incremental read in spark and we don't have document for the same. Hence this PR.

@github-actions github-actions bot added the docs label Dec 23, 2021
@ajantha-bhat
Copy link
Member Author

cc: @rdblue

Currently gets only the data from `append` operation. Cannot support `replace`, `overwrite`, `delete` operations yet.
Works with both V1 and V2 format-version.

Incremental read is not yet supported by Spark's SQL syntax.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove "yet" because it is unclear whether it will be supported.

```

!!! Note
Currently gets only the data from `append` operation. Cannot support `replace`, `overwrite`, `delete` operations yet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want this to be in a note box, it needs to be indented with 4 spaces.


!!! Note
Currently gets only the data from `append` operation. Cannot support `replace`, `overwrite`, `delete` operations yet.
Works with both V1 and V2 format-version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the note or a separate paragraph? Also, could you expand this to be a complete sentence?

* `end-snapshot-id` End snapshot ID used in incremental scans (inclusive)

```scala
// get the data added after start-snapshot-id (10963874102873L) till end-snapshot-id (63874143573109L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "till" should be "until"


### Incremental read

To read incremental data between the snapshots, Configure below Spark read options:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "To read appended data incrementally, use:"

To read incremental data between the snapshots, Configure below Spark read options:

* `start-snapshot-id` Start snapshot ID used in incremental scans (exclusive)
* `end-snapshot-id` End snapshot ID used in incremental scans (inclusive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optional. Omitting it will default to the current snapshot.

@ajantha-bhat
Copy link
Member Author

@rdblue : Thanks for the review. I have handled the comments.

@rdblue rdblue merged commit a4afaab into apache:master Jan 4, 2022
@rdblue
Copy link
Contributor

rdblue commented Jan 4, 2022

Thanks, @ajantha-bhat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants