Add hashOnRef query param to support time travel on a named ref#1589
Add hashOnRef query param to support time travel on a named ref#1589nastra merged 2 commits intoprojectnessie:mainfrom
Conversation
snazy
left a comment
There was a problem hiding this comment.
I just have one comment (but I was so mean to annotate every occurrence ;) ) - it's about the commit-log: since we effectively add the "start hash", it should be named "start hash" and since we're already there, it wouldn't be much of an effort to add the "end hash" parameter as well (maybe need to implement a Spliterator for the "end hash" though).
| @Override | ||
| public LogResponse getCommitLog( | ||
| String ref, Integer maxRecords, String pageToken, String queryExpression) | ||
| String ref, String hashOnRef, Integer maxRecords, String pageToken, String queryExpression) |
There was a problem hiding this comment.
Guess this should be startHash instead of hashOnRef. Since we're already changing this, can you add endHash as welll?:
There was a problem hiding this comment.
I would rather prefer to have this in a separate commit/PR since it's a slightly different feature
| public static Stream<CommitMeta> getCommitLogStream( | ||
| @NotNull TreeApi treeApi, | ||
| @NotNull String ref, | ||
| String hashOnRef, |
There was a problem hiding this comment.
(similar to above about startHash instead of hashOnRef + endHash)
| examples = {@ExampleObject(ref = "ref")}) | ||
| @PathParam("ref") | ||
| String ref, | ||
| @Nullable |
There was a problem hiding this comment.
(similar to above about startHash instead of hashOnRef + endHash)
| public LogResponse getCommitLog( | ||
| String ref, Integer maxRecords, String pageToken, String queryExpression) | ||
| String namedRef, | ||
| String hashOnRef, |
There was a problem hiding this comment.
(similar to above about startHash instead of hashOnRef + endHash)
|
|
||
| export interface GetCommitLogRequest { | ||
| ref: string; | ||
| hashOnRef?: string; |
There was a problem hiding this comment.
(similar to above about startHash instead of hashOnRef + endHash)
Codecov Report
@@ Coverage Diff @@
## main #1589 +/- ##
============================================
+ Coverage 77.24% 77.37% +0.12%
- Complexity 2199 2216 +17
============================================
Files 289 289
Lines 13576 13690 +114
Branches 1035 1037 +2
============================================
+ Hits 10487 10592 +105
- Misses 2605 2616 +11
+ Partials 484 482 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
| WithHash<NamedRef> namedRefWithHashOrThrow(String namedRef, @Nullable String hashOnRef) | ||
| throws NessieNotFoundException { | ||
| List<WithHash<NamedRef>> collect = | ||
| store |
There was a problem hiding this comment.
how often is this called? This could be expensive and I don't quite understand the purpose
There was a problem hiding this comment.
it's called in 4 places (getCommitLog / getEntries / getContents / getMultipleContents) and we will eventually have a faster alternative in in the VersionStoreV2
Apply changes to Iceberg required by API changes in Nessie: * [Re-introduce wrapper classes for query params of CommitLog/Entries](projectnessie/nessie#1595) * [Server-side commit range filtering](projectnessie/nessie#1596) * [Add hashOnRef query param to support time travel on a named ref](projectnessie/nessie#1589) * [Only accept NamedRefs in REST API](projectnessie/nessie#1583)
Apply changes to Iceberg required by API changes in Nessie: * [Re-introduce wrapper classes for query params of CommitLog/Entries](projectnessie/nessie#1595) * [Server-side commit range filtering](projectnessie/nessie#1596) * [Add hashOnRef query param to support time travel on a named ref](projectnessie/nessie#1589) * [Only accept NamedRefs in REST API](projectnessie/nessie#1583)
* Bump Nessie to 0.8.2 + replace Gradle plugin with new JUnit extension More changes in this PR in following commits. Replace Gradle plugin with new JUnit extension. See [Add JAX-RS tests and add JUnit/Jupyter extension](projectnessie/nessie#1566) * Changes required by Nessie-API changes Apply changes to Iceberg required by API changes in Nessie: * [Re-introduce wrapper classes for query params of CommitLog/Entries](projectnessie/nessie#1595) * [Server-side commit range filtering](projectnessie/nessie#1596) * [Add hashOnRef query param to support time travel on a named ref](projectnessie/nessie#1589) * [Only accept NamedRefs in REST API](projectnessie/nessie#1583) * Bugfix: must send the Contents.id of the existing table Nessie's `Contents.id` is a random ID generated when the `Contents.Key` is first used (think: CREATE TABLE) and must not be changed. This change addresses a bug in the Iceberg-Nesie code that caused a new id for every change. * Throw `CommitStateUnknownException` for `renameTable` as well Follow-up of #2515 * Fix race-condition & save one roundtrip to Nessie during "commit" When commiting a change, the Nessie-API now returns the hash of the commit for the change. This returned hash should then be used as the "expected hash" for the next commit. The previous approach was to commit the change to Nessie and then do another request to retrieve the new hash of HEAD. This old approach is prone to a race condition, namely when another commit happens after "this" commit but before retrieving the "new HEAD", so "this" instance would wrongly ignore the other commit's changes during conflict checks. See [Let VersionStore.create()+commit() return the current hash](projectnessie/nessie#1089)
* Bump Nessie to 0.8.2 + replace Gradle plugin with new JUnit extension More changes in this PR in following commits. Replace Gradle plugin with new JUnit extension. See [Add JAX-RS tests and add JUnit/Jupyter extension](projectnessie/nessie#1566) * Changes required by Nessie-API changes Apply changes to Iceberg required by API changes in Nessie: * [Re-introduce wrapper classes for query params of CommitLog/Entries](projectnessie/nessie#1595) * [Server-side commit range filtering](projectnessie/nessie#1596) * [Add hashOnRef query param to support time travel on a named ref](projectnessie/nessie#1589) * [Only accept NamedRefs in REST API](projectnessie/nessie#1583) * Bugfix: must send the Contents.id of the existing table Nessie's `Contents.id` is a random ID generated when the `Contents.Key` is first used (think: CREATE TABLE) and must not be changed. This change addresses a bug in the Iceberg-Nesie code that caused a new id for every change. * Throw `CommitStateUnknownException` for `renameTable` as well Follow-up of apache#2515 * Fix race-condition & save one roundtrip to Nessie during "commit" When commiting a change, the Nessie-API now returns the hash of the commit for the change. This returned hash should then be used as the "expected hash" for the next commit. The previous approach was to commit the change to Nessie and then do another request to retrieve the new hash of HEAD. This old approach is prone to a race condition, namely when another commit happens after "this" commit but before retrieving the "new HEAD", so "this" instance would wrongly ignore the other commit's changes during conflict checks. See [Let VersionStore.create()+commit() return the current hash](projectnessie/nessie#1089)
This change is