-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Docs: Add information on how to read from branches and tags in Spark docs #6573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs: Add information on how to read from branches and tags in Spark docs #6573
Conversation
b1ac360 to
1fbf385
Compare
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me!
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as well, Thanks @amogh-jahagirdar !
| * `snapshot-id` selects a specific table snapshot | ||
| * `as-of-timestamp` selects the current snapshot at a timestamp, in milliseconds | ||
| * `branch` selects the head snapshot of the specified branch. Note that currently branch cannot be combined with as-of-timestamp. | ||
| * `tag` selects the snapshot associated with the specified tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to mention that tag also cannot be combined with as-of-timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or we can wait till #6575 gets merged. So that we don't have to mention it for both branch and tag. But we need to add an example in ##SQL also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree on having a SQL example once #6575 gets merged. For combining as-of-timestamp with tag I felt that was apparent since a tag can only map to a single snapshot which conflicts with passing in a timestamp, where as branch + time travel is a different case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that is a syntax change, I am waiting for more time for others to take a look. I think we can first merge this one and add that later.
| #### DataFrame | ||
|
|
||
| To select a specific table snapshot or the snapshot at some time in the DataFrame API, Iceberg supports two Spark read options: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to change two to four.
|
Thanks everyone for the review, as I said in the thread for the SQL related changes, I will wait for some more time in case there are disagreements. I will merge this in first and we can add follow up PRs at this front. |
one of my comments was not addressed in apache#6573. Hence, a follow-up PR. apache#6573 adds two more spark read options in the data frame time travel syntax.
https://github.com/apache/iceberg/pull/5150/files introduced the ability to read from branches and tags in Spark, but the docs haven't been updated. This change updates the docs and examples for reading from branches and tags.