Move table-metadata-pointer back to global state, use snapshot,schema,spec,sort-order ids as on-ref state#2626
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2626 +/- ##
============================================
+ Coverage 84.63% 84.66% +0.02%
Complexity 1943 1943
============================================
Files 270 270
Lines 11164 11184 +20
Branches 807 807
============================================
+ Hits 9449 9469 +20
Misses 1399 1399
Partials 316 316
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
c6dd1cb to
02cbf25
Compare
| ObjectTypes.Content.newBuilder() | ||
| .setId(ID) | ||
| .setIcebergGlobal(IcebergGlobal.newBuilder().setIdGenerators("xyz")) | ||
| .setIcebergMetadataPointer( |
There was a problem hiding this comment.
I think if we keep this design. All the branch's snapshot id will be present in current metadata pointer of table@ref.
So, this can cause the below problems.
a) In each branch, each table will lose the snapshot history. So, time travel within the branch is not possible.
b) In each branch, each table will not know the list of all the reachable files as snapshot history is lost. So, if the table is dropped in that branch, we cannot know which files to clean for that table@ref.
c) we lose table evolution history per branch. We cannot know how schema, sort order, partition spec is evolved for that table@ref.
d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table.
e) we lose metadata-log history per branch in a table. metadata-log is used for deleting old metadata files (and fixed number of maximum metadata files at given point of time). we have an auto action for cleaning the metadata files.
Now assume branch_1 has 1 commit and branch_2 has 99 commits. If metadata files to retain is configured as 100. For a new commit happens on branch_2, we delete branch_1's metadata file even though it is referenced in Nessie branch_1.
There was a problem hiding this comment.
a) In each branch, each table will lose the snapshot history. So, time travel within the branch is not possible.
Time travel would still be possible via Nessie.
b) In each branch, each table will not know the list of all the reachable files as snapshot history is lost.
Can you elaborate why TableMetadata.snapshotLog() is required for this?
So, if the table is dropped in that branch, we cannot know which files to clean for that table@ref.
You can still figure that out by traversing the commit-log.
The only situation I can currently imagine is that you create a table on a new branch, add data and then just drop the branch. But every approach would run into this situation.
c) we lose table evolution history per branch. We cannot know how schema, sort order, partition spec is evolved for that table@ref.
It's still there via the Nessie commits + the information in the referenced snapshots, right?
d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table.
I don't think that there's an issue with exposing other references' information. Proper isolation would require way more than just thinking about it here.
What would those unexpected behaviors be?
e) we lose metadata-log history per table in a branch. metadata-log is used for deleting old metadata files (and fixed number of maximum metadata files at given point of time). we have an auto action for cleaning the metadata files.
Why would we lose that metadata-log (TableMetadata.previousFiles())?
There will be exactly one current and valid metadata (file).
There was a problem hiding this comment.
b) In each branch, each table will not know the list of all the reachable files as snapshot history is lost.
Can you elaborate why TableMetadata.snapshotLog() is required for this?
I meant Table.snapshots(), which will have all branch's snapshot. So, when I do drop table on branch. At Iceberg side, I cannot know what all snapshots to delete for that branch.
You can still figure that out by traversing the commit-log.
I thought Iceberg itself will handle the drop table. Iceberg need to block its current implementation and relay Nessie during drop table ?
There was a problem hiding this comment.
d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table.
I don't think that there's an issue with exposing other references' information. Proper isolation would require way more than just thinking about it here.
What would those unexpected behaviors be?
For example, If table in branch_1 is configured to use write.target-file-size-bytes = 10 MB, same table in all the branches will use that for writing file size. which may be unexpected for same table in other branch as they might be created for other experiments.
There was a problem hiding this comment.
e) we lose metadata-log history per table in a branch. metadata-log is used for deleting old metadata files (and fixed number of maximum metadata files at given point of time). we have an auto action for cleaning the metadata files.
Why would we lose that metadata-log (TableMetadata.previousFiles())?
There will be exactly one current and valid metadata (file).
I meant we lose per branch, not for overall table. Table in one ref now contains TableMetadata.previousFiles() which has metadata files from all the branches. So, when we add new commit in one branch, it might clean up other branch's metadata file when this configuration is enabled.
There was a problem hiding this comment.
Now assume branch_1 has 1 commit and branch_2 has 99 commits. If metadata files to retain is configured as 100. For a new commit happens on branch_2, we delete branch_1's metadata file even though it is referenced in Nessie branch_1.
You are correct ajantha. We should be setting this property at table create time and running checks to ensure it isn't changed
d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table
My thinking here is that for the time being table proeprties are global and when we come across a use case which requires table properties per branch we can adjust.
There was a problem hiding this comment.
on the topic of time travel and history generally: we are in a weird position currently as there are two timelines (iceberg via snapshots and Nessie via commits) and both are accessible via different APIs. We should strive to have 1 history API for a Nessie table ( which uses Nessie's history). So I am not overly worried about the iceberg timeline and iceberg apis to access it, the iceberg timeline may be goofy but the Nessie timeline is the source of truth. We do have to focus on extending Iceberg to ensure that certain API calls are routed to Nessie, these should be generic extension points in the iceberg metadata handling code and should make sense for all extenders of Iceberg (not just Nessie). Make sense?
dimas-b
left a comment
There was a problem hiding this comment.
Code change LGTM, but I'm not sure I understand all the implications for Iceberg to approve 🤔
As a side note, if we keep Iceberg data in opaque form, would it not be easier to just have two properties in the IcebergTable class: globalState and branchedState and delegate interpretation to the Iceberg-side Nessie code?
There was a problem hiding this comment.
I thnk we should put actual properties for each thing here rather than an opaque String (same comment as @dimas-b I think).
Off the top of my head I think we should save as actual properties:
- partition spec id
- schema id
- sequence number
but I may be missing some from V2 spec.
There was a problem hiding this comment.
@rymurr: sequence number is a global field. So no need to keep in the on_reference_state.
we just have to keep fields that are specific to current branch.
So, I think current chosen parameters are ok (If table property need to maintain for per branch, we can add that as well)
And agree on the part that we can have a class called TableCurrentState, which keeps these parameters instead of opaque string.
There was a problem hiding this comment.
we just have to keep fields that are specific to current branch.
ok cool. agreed
sequence number is a global field.
I would just like to amend this statement a bit for my own clarity. We have to save anything that can be changed w/o creating a snapshot. The real risk we face is there are iceberg transactions that don't create a snapshot. So global counters (column id, sequence number) are safe to ignore. Things like schema and partition id are not.
There was a problem hiding this comment.
Additionally I think we should create a test suite (probably along w/ the iceberg fix) to test all these assumtions
| ObjectTypes.Content.newBuilder() | ||
| .setId(ID) | ||
| .setIcebergGlobal(IcebergGlobal.newBuilder().setIdGenerators("xyz")) | ||
| .setIcebergMetadataPointer( |
There was a problem hiding this comment.
Now assume branch_1 has 1 commit and branch_2 has 99 commits. If metadata files to retain is configured as 100. For a new commit happens on branch_2, we delete branch_1's metadata file even though it is referenced in Nessie branch_1.
You are correct ajantha. We should be setting this property at table create time and running checks to ensure it isn't changed
d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table
My thinking here is that for the time being table proeprties are global and when we come across a use case which requires table properties per branch we can adjust.
| ObjectTypes.Content.newBuilder() | ||
| .setId(ID) | ||
| .setIcebergGlobal(IcebergGlobal.newBuilder().setIdGenerators("xyz")) | ||
| .setIcebergMetadataPointer( |
There was a problem hiding this comment.
on the topic of time travel and history generally: we are in a weird position currently as there are two timelines (iceberg via snapshots and Nessie via commits) and both are accessible via different APIs. We should strive to have 1 history API for a Nessie table ( which uses Nessie's history). So I am not overly worried about the iceberg timeline and iceberg apis to access it, the iceberg timeline may be goofy but the Nessie timeline is the source of truth. We do have to focus on extending Iceberg to ensure that certain API calls are routed to Nessie, these should be generic extension points in the iceberg metadata handling code and should make sense for all extenders of Iceberg (not just Nessie). Make sense?
02cbf25 to
707d690
Compare
servers/store/src/main/java/org/projectnessie/server/store/TableCommitMetaStoreWorker.java
Outdated
Show resolved
Hide resolved
|
|
||
| message IcebergGlobal { | ||
| string id_generators = 1; | ||
| message IcebergRefState { |
There was a problem hiding this comment.
In the contentAPI descriptions and some other place of code, It still mentions that on-ref-state is just the snapshot id. can we change that as well?
please look up Iceberg: snapshot-ID in the files.
707d690 to
74b7afb
Compare
ajantha-bhat
left a comment
There was a problem hiding this comment.
one small nit comment.
overall LGTM from my side.
| * object, that contains the most up-to-date part for the globally tracked part (Iceberg: | ||
| * table-metadata) plus the per-Nessie-reference/hash specific part (Iceberg: snapshot-ID). | ||
| * table-metadata) plus the per-Nessie-reference/hash specific part (Iceberg: snapshot-ID, | ||
| * schema-ID, partition-spec-ID, default-sort-order-ID). |
There was a problem hiding this comment.
nit: In the descriptions, If we are naming this variable(default-sort-order-ID) based on iceberg spec, then other three variable has to be default-spec-id, current-schema-id, current-snapshot-id
or may be instead of mentioning each field, we just link it to IcebergRefState
There was a problem hiding this comment.
nit: i thikn ID shjould be id
| * object, that contains the most up-to-date part for the globally tracked part (Iceberg: | ||
| * table-metadata) plus the per-Nessie-reference/hash specific part (Iceberg: snapshot-ID). | ||
| * table-metadata) plus the per-Nessie-reference/hash specific part (Iceberg: snapshot-ID, | ||
| * schema-ID, partition-spec-ID, default-sort-order-ID). |
There was a problem hiding this comment.
nit: i thikn ID shjould be id
|
|
||
| /** Opaque representation of Iceberg's {@code TableIdGenerators}. */ | ||
| public abstract String getIdGenerators(); | ||
| public abstract long getSnapshotId(); |
There was a problem hiding this comment.
do these need default values? particularly the shcema/specid etc? Or do they have to be specified every time
There was a problem hiding this comment.
Don't think those should have default values. It uses what Iceberg's TableMetadata says and not rely on a default value.
fced329 to
c73e21e
Compare
…,spec,sort-order ids as on-ref state This change basically reverts projectnessie#2313. It moves the Iceberg able-metadata-pointer back to Nessie's global state and snapshot id plus other relevant ids (schema-id, partition-spec-id, sort-order-id) to Nessie's on-reference state.
c73e21e to
529860b
Compare
This change basically reverts #2313. It moves the Iceberg table-metadata-pointer back to Nessie's global state and snapshot ID plus other relevant IDs (schema-ID, partition-spec-ID, sort-order-ID) to Nessie's on-reference state.
A working version of the Iceberg changes is in this branch: https://github.com/snazy/iceberg/tree/back-to-single-table-metadata