Skip to content

Add hashOnRef query param to support time travel on a named ref#1589

Merged
nastra merged 2 commits intoprojectnessie:mainfrom
nastra:hashOnRef-for-time-travel-support
Jul 14, 2021
Merged

Add hashOnRef query param to support time travel on a named ref#1589
nastra merged 2 commits intoprojectnessie:mainfrom
nastra:hashOnRef-for-time-travel-support

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Jul 13, 2021

This change is Reviewable

@nastra nastra marked this pull request as draft July 13, 2021 16:27
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have one comment (but I was so mean to annotate every occurrence ;) ) - it's about the commit-log: since we effectively add the "start hash", it should be named "start hash" and since we're already there, it wouldn't be much of an effort to add the "end hash" parameter as well (maybe need to implement a Spliterator for the "end hash" though).

@Override
public LogResponse getCommitLog(
String ref, Integer maxRecords, String pageToken, String queryExpression)
String ref, String hashOnRef, Integer maxRecords, String pageToken, String queryExpression)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess this should be startHash instead of hashOnRef. Since we're already changing this, can you add endHash as welll?:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather prefer to have this in a separate commit/PR since it's a slightly different feature

public static Stream<CommitMeta> getCommitLogStream(
@NotNull TreeApi treeApi,
@NotNull String ref,
String hashOnRef,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(similar to above about startHash instead of hashOnRef + endHash)

examples = {@ExampleObject(ref = "ref")})
@PathParam("ref")
String ref,
@Nullable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(similar to above about startHash instead of hashOnRef + endHash)

public LogResponse getCommitLog(
String ref, Integer maxRecords, String pageToken, String queryExpression)
String namedRef,
String hashOnRef,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(similar to above about startHash instead of hashOnRef + endHash)


export interface GetCommitLogRequest {
ref: string;
hashOnRef?: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(similar to above about startHash instead of hashOnRef + endHash)

@codecov
Copy link

codecov bot commented Jul 14, 2021

Codecov Report

Merging #1589 (239c8a8) into main (6b2c1c8) will increase coverage by 0.12%.
The diff coverage is 89.41%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1589      +/-   ##
============================================
+ Coverage     77.24%   77.37%   +0.12%     
- Complexity     2199     2216      +17     
============================================
  Files           289      289              
  Lines         13576    13690     +114     
  Branches       1035     1037       +2     
============================================
+ Hits          10487    10592     +105     
- Misses         2605     2616      +11     
+ Partials        484      482       -2     
Flag Coverage Δ
java 76.65% <89.30%> (+0.14%) ⬆️
python 86.61% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ark/sql/execution/datasources/v2/NessieUtils.scala 0.00% <0.00%> (ø)
...ark/sql/execution/datasources/v2/ShowLogExec.scala 0.00% <0.00%> (ø)
...n/java/org/projectnessie/client/StreamingUtil.java 50.00% <50.00%> (ø)
.../org/projectnessie/services/rest/TreeResource.java 70.28% <50.00%> (ø)
...n/java/org/projectnessie/hms/TransactionStore.java 58.62% <66.66%> (ø)
.../projectnessie/services/rest/ContentsResource.java 75.00% <66.66%> (ø)
.../org/projectnessie/services/rest/BaseResource.java 80.48% <86.95%> (+12.30%) ⬆️
...java/org/projectnessie/jaxrs/AbstractTestRest.java 95.55% <93.10%> (-0.25%) ⬇️
...va/org/projectnessie/client/ClientContentsApi.java 100.00% <100.00%> (ø)
...n/java/org/projectnessie/client/ClientTreeApi.java 96.34% <100.00%> (+0.09%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b2c1c8...239c8a8. Read the comment docs.

@nastra nastra marked this pull request as ready for review July 14, 2021 06:56
@nastra nastra requested a review from snazy July 14, 2021 09:12
WithHash<NamedRef> namedRefWithHashOrThrow(String namedRef, @Nullable String hashOnRef)
throws NessieNotFoundException {
List<WithHash<NamedRef>> collect =
store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how often is this called? This could be expensive and I don't quite understand the purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's called in 4 places (getCommitLog / getEntries / getContents / getMultipleContents) and we will eventually have a faster alternative in in the VersionStoreV2

@nastra nastra merged commit 152a867 into projectnessie:main Jul 14, 2021
@nastra nastra deleted the hashOnRef-for-time-travel-support branch July 14, 2021 10:34
snazy added a commit to snazy/iceberg that referenced this pull request Jul 15, 2021
Apply changes to Iceberg required by API changes in Nessie:
* [Re-introduce wrapper classes for query params of CommitLog/Entries](projectnessie/nessie#1595)
* [Server-side commit range filtering](projectnessie/nessie#1596)
* [Add hashOnRef query param to support time travel on a named ref](projectnessie/nessie#1589)
* [Only accept NamedRefs in REST API](projectnessie/nessie#1583)
snazy added a commit to snazy/iceberg that referenced this pull request Jul 15, 2021
Apply changes to Iceberg required by API changes in Nessie:
* [Re-introduce wrapper classes for query params of CommitLog/Entries](projectnessie/nessie#1595)
* [Server-side commit range filtering](projectnessie/nessie#1596)
* [Add hashOnRef query param to support time travel on a named ref](projectnessie/nessie#1589)
* [Only accept NamedRefs in REST API](projectnessie/nessie#1583)
rymurr pushed a commit to apache/iceberg that referenced this pull request Jul 15, 2021
* Bump Nessie to 0.8.2 + replace Gradle plugin with new JUnit extension

More changes in this PR in following commits.

Replace Gradle plugin with new JUnit extension.
See [Add JAX-RS tests and add JUnit/Jupyter extension](projectnessie/nessie#1566)

* Changes required by Nessie-API changes

Apply changes to Iceberg required by API changes in Nessie:
* [Re-introduce wrapper classes for query params of CommitLog/Entries](projectnessie/nessie#1595)
* [Server-side commit range filtering](projectnessie/nessie#1596)
* [Add hashOnRef query param to support time travel on a named ref](projectnessie/nessie#1589)
* [Only accept NamedRefs in REST API](projectnessie/nessie#1583)

* Bugfix: must send the Contents.id of the existing table

Nessie's `Contents.id` is a random ID generated when the `Contents.Key` is first used (think:
CREATE TABLE) and must not be changed. This change addresses a bug in the Iceberg-Nesie code
that caused a new id for every change.

* Throw `CommitStateUnknownException` for `renameTable` as well

Follow-up of #2515

* Fix race-condition & save one roundtrip to Nessie during "commit"

When commiting a change, the Nessie-API now returns the hash of the commit for the change.
This returned hash should then be used as the "expected hash" for the next commit.

The previous approach was to commit the change to Nessie and then do another request to
retrieve the new hash of HEAD.

This old approach is prone to a race condition, namely when another commit happens after
"this" commit but before retrieving the "new HEAD", so "this" instance would wrongly
ignore the other commit's changes during conflict checks.

See [Let VersionStore.create()+commit() return the current hash](projectnessie/nessie#1089)
minchowang pushed a commit to minchowang/iceberg that referenced this pull request Aug 2, 2021
* Bump Nessie to 0.8.2 + replace Gradle plugin with new JUnit extension

More changes in this PR in following commits.

Replace Gradle plugin with new JUnit extension.
See [Add JAX-RS tests and add JUnit/Jupyter extension](projectnessie/nessie#1566)

* Changes required by Nessie-API changes

Apply changes to Iceberg required by API changes in Nessie:
* [Re-introduce wrapper classes for query params of CommitLog/Entries](projectnessie/nessie#1595)
* [Server-side commit range filtering](projectnessie/nessie#1596)
* [Add hashOnRef query param to support time travel on a named ref](projectnessie/nessie#1589)
* [Only accept NamedRefs in REST API](projectnessie/nessie#1583)

* Bugfix: must send the Contents.id of the existing table

Nessie's `Contents.id` is a random ID generated when the `Contents.Key` is first used (think:
CREATE TABLE) and must not be changed. This change addresses a bug in the Iceberg-Nesie code
that caused a new id for every change.

* Throw `CommitStateUnknownException` for `renameTable` as well

Follow-up of apache#2515

* Fix race-condition & save one roundtrip to Nessie during "commit"

When commiting a change, the Nessie-API now returns the hash of the commit for the change.
This returned hash should then be used as the "expected hash" for the next commit.

The previous approach was to commit the change to Nessie and then do another request to
retrieve the new hash of HEAD.

This old approach is prone to a race condition, namely when another commit happens after
"this" commit but before retrieving the "new HEAD", so "this" instance would wrongly
ignore the other commit's changes during conflict checks.

See [Let VersionStore.create()+commit() return the current hash](projectnessie/nessie#1089)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants