Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/spark-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ To select a specific table snapshot or the snapshot at some time in the DataFram

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to change two to four.

* `snapshot-id` selects a specific table snapshot
* `as-of-timestamp` selects the current snapshot at a timestamp, in milliseconds
* `branch` selects the head snapshot of the specified branch. Note that currently branch cannot be combined with as-of-timestamp.
* `tag` selects the snapshot associated with the specified tag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to mention that tag also cannot be combined with as-of-timestamp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we can wait till #6575 gets merged. So that we don't have to mention it for both branch and tag. But we need to add an example in ##SQL also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree on having a SQL example once #6575 gets merged. For combining as-of-timestamp with tag I felt that was apparent since a tag can only map to a single snapshot which conflicts with passing in a timestamp, where as branch + time travel is a different case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that is a syntax change, I am waiting for more time for others to take a look. I think we can first merge this one and add that later.


```scala
// time travel to October 26, 1986 at 01:21:00
Expand All @@ -143,6 +145,22 @@ spark.read
.load("path/to/table")
```

```scala
// time travel to tag historical-snapshot
spark.read
.option(SparkReadOptions.TAG, "historical-snapshot")
.format("iceberg")
.load("path/to/table")
```

```scala
// time travel to the head snapshot of audit-branch
spark.read
.option(SparkReadOptions.BRANCH, "audit-branch")
.format("iceberg")
.load("path/to/table")
```

{{< hint info >}}
Spark 3.0 and earlier versions do not support using `option` with `table` in DataFrameReader commands. All options will be silently
ignored. Do not use `table` when attempting to time-travel or use other options. See [SPARK-32592](https://issues.apache.org/jira/browse/SPARK-32592).
Expand Down