Skip to content

Conversation

@amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Jan 12, 2023

https://github.com/apache/iceberg/pull/5150/files introduced the ability to read from branches and tags in Spark, but the docs haven't been updated. This change updates the docs and examples for reading from branches and tags.

@github-actions github-actions bot added the docs label Jan 12, 2023
@amogh-jahagirdar amogh-jahagirdar force-pushed the update-spark-branch-docs branch from b1ac360 to 1fbf385 Compare January 12, 2023 21:57
Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well, Thanks @amogh-jahagirdar !

* `snapshot-id` selects a specific table snapshot
* `as-of-timestamp` selects the current snapshot at a timestamp, in milliseconds
* `branch` selects the head snapshot of the specified branch. Note that currently branch cannot be combined with as-of-timestamp.
* `tag` selects the snapshot associated with the specified tag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to mention that tag also cannot be combined with as-of-timestamp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we can wait till #6575 gets merged. So that we don't have to mention it for both branch and tag. But we need to add an example in ##SQL also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree on having a SQL example once #6575 gets merged. For combining as-of-timestamp with tag I felt that was apparent since a tag can only map to a single snapshot which conflicts with passing in a timestamp, where as branch + time travel is a different case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that is a syntax change, I am waiting for more time for others to take a look. I think we can first merge this one and add that later.

#### DataFrame

To select a specific table snapshot or the snapshot at some time in the DataFrame API, Iceberg supports two Spark read options:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to change two to four.

@jackye1995
Copy link
Contributor

Thanks everyone for the review, as I said in the thread for the SQL related changes, I will wait for some more time in case there are disagreements. I will merge this in first and we can add follow up PRs at this front.

@jackye1995 jackye1995 merged commit 7943e94 into apache:master Jan 14, 2023
ajantha-bhat added a commit to ajantha-bhat/iceberg that referenced this pull request Jan 16, 2023
one of my comments was not addressed in apache#6573. Hence, a follow-up PR.  
apache#6573 adds two more spark read options in the data frame time travel syntax.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants