Skip to content

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Apr 6, 2022

what is null ?
TableMetadata.metadataFileLocation is null

Scenario:

create branch1;
create table1 on branch1;
commit on table1 on branch1;

create branch2 from branch1;
commit on table1 on branch2;

use branch1;
load table1 in branch1;  -- this table has null `metadataFileLocation`

Reason:
NessieTableOperation.refreshFromMetadataLocation() calls loadTableMetadata() where TableMetadata.buildFrom() creates an TableMetadata with null metadataFileLocation (refer here)
and Nessie didn't set the metadataFileLocation

How basic scenarios are working?
Nessie itself holds metadataFileLocation and uses for all the Nessie operations. But when Iceberg table in nessie catalog is used for iceberg specific functionality like read .snapshots table, Iceberg needs metadataFileLocation which was not filled.

@github-actions github-actions bot added the NESSIE label Apr 6, 2022
@ajantha-bhat
Copy link
Member Author

cc: @nastra , @snazy, @rdblue , @openinx


createBranch(branch2, catalog.currentHash(), branch1);
// commit on tableIdentifier1 on branch2
NessieCatalog catalogBranch1 = initCatalog(branch2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just change to catalog = initCatalog(branch2)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the pattern from base testcases in this file.
Changed this on newly added testcase, base testcases can be done in a follow up PR as it will confuse the reviewers.

NessieCatalog catalog = initCatalog(branch1);
String metadataCommit1 = addRow(catalog, tableIdentifier1, "initial-data",
ImmutableMap.of("id0", 4L));
catalog.refresh();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those catalog.refresh() calls can be removed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is just a double check that even refresh is not gonna fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refresh() shouldn't be used outside of the internals of the catalog imo (I have a PR that cleans that up). Also it's not apparent that this is the reason why refresh() is being used here, so I woudl just remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Removed.


// commit on tableIdentifier1 on branch1
NessieCatalog catalog = initCatalog(branch1);
String metadataCommit1 = addRow(catalog, tableIdentifier1, "initial-data",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename this (and others) to something like metadataLocationOfCommit1. Initially it wasn't clear that this refers to the metadata location

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed on newly added testcase, base testcases can be done in a follow up PR as it will confuse the reviewers.

catalog = initCatalog(branch1);
catalog.refresh();
// load tableIdentifier1 on branch1
BaseTable table = (BaseTable)catalog.loadTable(tableIdentifier1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting

@ajantha-bhat ajantha-bhat force-pushed the npe branch 2 times, most recently from 75da1a7 to 819e751 Compare April 6, 2022 08:07
catalog = initCatalog(branch2);
String metadataLocationOfCommit2 = addRow(catalog, tableIdentifier1, "some-more-data",
ImmutableMap.of("id0", 42L));
Assertions.assertThat(metadataLocationOfCommit2).isNotEqualTo(metadataLocationOfCommit1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe change to .isNotNull().isNotEqualTo(metadataLocationOfCommit1). same for the other check further below

@ajantha-bhat
Copy link
Member Author

@szehon-ho / @openinx : Can you please help in merging this ?

Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me .

@openinx openinx merged commit 4b382e2 into apache:master Apr 7, 2022
@ajantha-bhat
Copy link
Member Author

@openinx : Thanks for merging.

@nastra nastra added this to the Iceberg 0.13.2 Release milestone May 23, 2022
XN137 added a commit to XN137/nessie-demos that referenced this pull request May 23, 2022
- upgrade to iceberg 0.13.1
- upgrade to flink 1.13.6
- update notebooks to new sql grammar
  (see projectnessie/nessie#2273)
- adjust notebooks to new nessie cli
- adjust notebooks to new nessie python api
- adjust notebooks to no longer claim branches have the same
  hash after merging
- avoid silent exception handling
- apply workaround for delta demo
  (for projectnessie/nessie#3552)
- apply workaround for hive demo
  (for apache/iceberg#4509)
XN137 added a commit to XN137/nessie-demos that referenced this pull request May 23, 2022
- upgrade to iceberg 0.13.1
- upgrade to flink 1.13.6
- update notebooks to new sql grammar
  (see projectnessie/nessie#2273)
- adjust notebooks to new nessie cli
- adjust notebooks to new nessie python api
- adjust notebooks to no longer claim branches have the same
  hash after merging
- avoid silent exception handling
- apply workaround for delta demo
  (for projectnessie/nessie#3552)
- apply workaround for hive demo
  (for apache/iceberg#4509)
rdblue pushed a commit that referenced this pull request May 23, 2022
XN137 added a commit to projectnessie/nessie-demos that referenced this pull request May 24, 2022
- upgrade to iceberg 0.13.1
- upgrade to flink 1.13.6
- update notebooks to new sql grammar
  (see projectnessie/nessie#2273)
- adjust notebooks to new nessie cli
- adjust notebooks to new nessie python api
- adjust notebooks to no longer claim branches have the same
  hash after merging
- avoid silent exception handling
- apply workaround for delta demo
  (for projectnessie/nessie#3552)
- apply workaround for hive demo
  (for apache/iceberg#4509)

closes #260
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants