Skip to content

Move table-metadata-pointer back to global state, use snapshot,schema,spec,sort-order ids as on-ref state#2626

Merged
snazy merged 4 commits intoprojectnessie:mainfrom
snazy:back-to-single-table-metadata
Nov 12, 2021
Merged

Move table-metadata-pointer back to global state, use snapshot,schema,spec,sort-order ids as on-ref state#2626
snazy merged 4 commits intoprojectnessie:mainfrom
snazy:back-to-single-table-metadata

Conversation

@snazy
Copy link
Member

@snazy snazy commented Nov 9, 2021

This change basically reverts #2313. It moves the Iceberg table-metadata-pointer back to Nessie's global state and snapshot ID plus other relevant IDs (schema-ID, partition-spec-ID, sort-order-ID) to Nessie's on-reference state.

A working version of the Iceberg changes is in this branch: https://github.com/snazy/iceberg/tree/back-to-single-table-metadata

@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #2626 (529860b) into main (d5a50e0) will increase coverage by 0.02%.
The diff coverage is 95.16%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #2626      +/-   ##
============================================
+ Coverage     84.63%   84.66%   +0.02%     
  Complexity     1943     1943              
============================================
  Files           270      270              
  Lines         11164    11184      +20     
  Branches        807      807              
============================================
+ Hits           9449     9469      +20     
  Misses         1399     1399              
  Partials        316      316              
Flag Coverage Δ
java 84.49% <96.49%> (+0.02%) ⬆️
javascript 85.58% <ø> (ø)
python 85.81% <80.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...sie/versioned/persist/adapter/ContentAndState.java 80.00% <ø> (ø)
python/pynessie/model.py 91.90% <80.00%> (+0.14%) ⬆️
...essie/server/store/TableCommitMetaStoreWorker.java 75.00% <90.00%> (+2.17%) ⬆️
...ain/java/org/projectnessie/model/IcebergTable.java 100.00% <100.00%> (ø)
.../org/projectnessie/jaxrs/AbstractResteasyTest.java 100.00% <100.00%> (ø)
...java/org/projectnessie/jaxrs/AbstractTestRest.java 95.29% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5a50e0...529860b. Read the comment docs.

@snazy snazy force-pushed the back-to-single-table-metadata branch from c6dd1cb to 02cbf25 Compare November 9, 2021 12:33
ObjectTypes.Content.newBuilder()
.setId(ID)
.setIcebergGlobal(IcebergGlobal.newBuilder().setIdGenerators("xyz"))
.setIcebergMetadataPointer(
Copy link
Contributor

@ajantha-bhat ajantha-bhat Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we keep this design. All the branch's snapshot id will be present in current metadata pointer of table@ref.

So, this can cause the below problems.

a) In each branch, each table will lose the snapshot history. So, time travel within the branch is not possible.

b) In each branch, each table will not know the list of all the reachable files as snapshot history is lost. So, if the table is dropped in that branch, we cannot know which files to clean for that table@ref.

c) we lose table evolution history per branch. We cannot know how schema, sort order, partition spec is evolved for that table@ref.

d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table.

e) we lose metadata-log history per branch in a table. metadata-log is used for deleting old metadata files (and fixed number of maximum metadata files at given point of time). we have an auto action for cleaning the metadata files.

Now assume branch_1 has 1 commit and branch_2 has 99 commits. If metadata files to retain is configured as 100. For a new commit happens on branch_2, we delete branch_1's metadata file even though it is referenced in Nessie branch_1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) In each branch, each table will lose the snapshot history. So, time travel within the branch is not possible.

Time travel would still be possible via Nessie.

b) In each branch, each table will not know the list of all the reachable files as snapshot history is lost.

Can you elaborate why TableMetadata.snapshotLog() is required for this?

So, if the table is dropped in that branch, we cannot know which files to clean for that table@ref.

You can still figure that out by traversing the commit-log.

The only situation I can currently imagine is that you create a table on a new branch, add data and then just drop the branch. But every approach would run into this situation.

c) we lose table evolution history per branch. We cannot know how schema, sort order, partition spec is evolved for that table@ref.

It's still there via the Nessie commits + the information in the referenced snapshots, right?

d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table.

I don't think that there's an issue with exposing other references' information. Proper isolation would require way more than just thinking about it here.

What would those unexpected behaviors be?

e) we lose metadata-log history per table in a branch. metadata-log is used for deleting old metadata files (and fixed number of maximum metadata files at given point of time). we have an auto action for cleaning the metadata files.

Why would we lose that metadata-log (TableMetadata.previousFiles())?

There will be exactly one current and valid metadata (file).

Copy link
Contributor

@ajantha-bhat ajantha-bhat Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b) In each branch, each table will not know the list of all the reachable files as snapshot history is lost.

Can you elaborate why TableMetadata.snapshotLog() is required for this?

I meant Table.snapshots(), which will have all branch's snapshot. So, when I do drop table on branch. At Iceberg side, I cannot know what all snapshots to delete for that branch.

You can still figure that out by traversing the commit-log.

I thought Iceberg itself will handle the drop table. Iceberg need to block its current implementation and relay Nessie during drop table ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table.

I don't think that there's an issue with exposing other references' information. Proper isolation would require way more than just thinking about it here.

What would those unexpected behaviors be?

For example, If table in branch_1 is configured to use write.target-file-size-bytes = 10 MB, same table in all the branches will use that for writing file size. which may be unexpected for same table in other branch as they might be created for other experiments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e) we lose metadata-log history per table in a branch. metadata-log is used for deleting old metadata files (and fixed number of maximum metadata files at given point of time). we have an auto action for cleaning the metadata files.

Why would we lose that metadata-log (TableMetadata.previousFiles())?

There will be exactly one current and valid metadata (file).

I meant we lose per branch, not for overall table. Table in one ref now contains TableMetadata.previousFiles() which has metadata files from all the branches. So, when we add new commit in one branch, it might clean up other branch's metadata file when this configuration is enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now assume branch_1 has 1 commit and branch_2 has 99 commits. If metadata files to retain is configured as 100. For a new commit happens on branch_2, we delete branch_1's metadata file even though it is referenced in Nessie branch_1.

You are correct ajantha. We should be setting this property at table create time and running checks to ensure it isn't changed

d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table

My thinking here is that for the time being table proeprties are global and when we come across a use case which requires table properties per branch we can adjust.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the topic of time travel and history generally: we are in a weird position currently as there are two timelines (iceberg via snapshots and Nessie via commits) and both are accessible via different APIs. We should strive to have 1 history API for a Nessie table ( which uses Nessie's history). So I am not overly worried about the iceberg timeline and iceberg apis to access it, the iceberg timeline may be goofy but the Nessie timeline is the source of truth. We do have to focus on extending Iceberg to ensure that certain API calls are routed to Nessie, these should be generic extension points in the iceberg metadata handling code and should make sense for all extenders of Iceberg (not just Nessie). Make sense?

Copy link
Member

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change LGTM, but I'm not sure I understand all the implications for Iceberg to approve 🤔

As a side note, if we keep Iceberg data in opaque form, would it not be easier to just have two properties in the IcebergTable class: globalState and branchedState and delegate interpretation to the Iceberg-side Nessie code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thnk we should put actual properties for each thing here rather than an opaque String (same comment as @dimas-b I think).

Off the top of my head I think we should save as actual properties:

  • partition spec id
  • schema id
  • sequence number
    but I may be missing some from V2 spec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rymurr: sequence number is a global field. So no need to keep in the on_reference_state.
we just have to keep fields that are specific to current branch.

So, I think current chosen parameters are ok (If table property need to maintain for per branch, we can add that as well)

And agree on the part that we can have a class called TableCurrentState, which keeps these parameters instead of opaque string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we just have to keep fields that are specific to current branch.

ok cool. agreed

sequence number is a global field.

I would just like to amend this statement a bit for my own clarity. We have to save anything that can be changed w/o creating a snapshot. The real risk we face is there are iceberg transactions that don't create a snapshot. So global counters (column id, sequence number) are safe to ignore. Things like schema and partition id are not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally I think we should create a test suite (probably along w/ the iceberg fix) to test all these assumtions

ObjectTypes.Content.newBuilder()
.setId(ID)
.setIcebergGlobal(IcebergGlobal.newBuilder().setIdGenerators("xyz"))
.setIcebergMetadataPointer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now assume branch_1 has 1 commit and branch_2 has 99 commits. If metadata files to retain is configured as 100. For a new commit happens on branch_2, we delete branch_1's metadata file even though it is referenced in Nessie branch_1.

You are correct ajantha. We should be setting this property at table create time and running checks to ensure it isn't changed

d) Table properties configured for table@ref will be visible and used for the same table in other reference also. which can cause unexpected behaviour in other references for the table

My thinking here is that for the time being table proeprties are global and when we come across a use case which requires table properties per branch we can adjust.

ObjectTypes.Content.newBuilder()
.setId(ID)
.setIcebergGlobal(IcebergGlobal.newBuilder().setIdGenerators("xyz"))
.setIcebergMetadataPointer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the topic of time travel and history generally: we are in a weird position currently as there are two timelines (iceberg via snapshots and Nessie via commits) and both are accessible via different APIs. We should strive to have 1 history API for a Nessie table ( which uses Nessie's history). So I am not overly worried about the iceberg timeline and iceberg apis to access it, the iceberg timeline may be goofy but the Nessie timeline is the source of truth. We do have to focus on extending Iceberg to ensure that certain API calls are routed to Nessie, these should be generic extension points in the iceberg metadata handling code and should make sense for all extenders of Iceberg (not just Nessie). Make sense?

@snazy snazy force-pushed the back-to-single-table-metadata branch from 02cbf25 to 707d690 Compare November 10, 2021 15:17

message IcebergGlobal {
string id_generators = 1;
message IcebergRefState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the contentAPI descriptions and some other place of code, It still mentions that on-ref-state is just the snapshot id. can we change that as well?
please look up Iceberg: snapshot-ID in the files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@snazy snazy force-pushed the back-to-single-table-metadata branch from 707d690 to 74b7afb Compare November 10, 2021 16:41
dimas-b
dimas-b previously approved these changes Nov 10, 2021
Copy link
Member

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ajantha-bhat ajantha-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small nit comment.

overall LGTM from my side.

* object, that contains the most up-to-date part for the globally tracked part (Iceberg:
* table-metadata) plus the per-Nessie-reference/hash specific part (Iceberg: snapshot-ID).
* table-metadata) plus the per-Nessie-reference/hash specific part (Iceberg: snapshot-ID,
* schema-ID, partition-spec-ID, default-sort-order-ID).
Copy link
Contributor

@ajantha-bhat ajantha-bhat Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In the descriptions, If we are naming this variable(default-sort-order-ID) based on iceberg spec, then other three variable has to be default-spec-id, current-schema-id, current-snapshot-id

or may be instead of mentioning each field, we just link it to IcebergRefState

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i thikn ID shjould be id

* object, that contains the most up-to-date part for the globally tracked part (Iceberg:
* table-metadata) plus the per-Nessie-reference/hash specific part (Iceberg: snapshot-ID).
* table-metadata) plus the per-Nessie-reference/hash specific part (Iceberg: snapshot-ID,
* schema-ID, partition-spec-ID, default-sort-order-ID).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i thikn ID shjould be id


/** Opaque representation of Iceberg's {@code TableIdGenerators}. */
public abstract String getIdGenerators();
public abstract long getSnapshotId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do these need default values? particularly the shcema/specid etc? Or do they have to be specified every time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think those should have default values. It uses what Iceberg's TableMetadata says and not rely on a default value.

@snazy snazy force-pushed the back-to-single-table-metadata branch from fced329 to c73e21e Compare November 11, 2021 09:35
@snazy snazy changed the title Move table-metadata-pointer back to global state, use snapshot+schema IDs as on-ref state Move table-metadata-pointer back to global state, use snapshot,schema,spec,sort-order ids as on-ref state Nov 11, 2021
…,spec,sort-order ids as on-ref state

This change basically reverts projectnessie#2313. It moves the Iceberg
able-metadata-pointer back to Nessie's global state and
snapshot id plus other relevant ids (schema-id,
partition-spec-id, sort-order-id) to Nessie's on-reference state.
@snazy snazy force-pushed the back-to-single-table-metadata branch from c73e21e to 529860b Compare November 11, 2021 14:54
@snazy snazy merged commit 2b6b2c3 into projectnessie:main Nov 12, 2021
@snazy snazy deleted the back-to-single-table-metadata branch November 12, 2021 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments