Skip to content

Conversation

@Mehul2500
Copy link
Contributor

Introducing migrateTables() in CatalogUtil which could help in the migration of Iceberg tables in any Source Catalog to any Target Catalog. Uses PR #5037 , for the registerTable() functionality in BaseMetastoreCatalog.

I used tables migrating from Hadoop Catalog to Hive Catalog for the test case.

"Cannot initialize Target Catalog implementation %s: %s", targetCatalogProperties.get("catalogImpl"),
e.getMessage()), e);
}
List<TableIdentifier> allIdentifiers = tableIdentifiers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this code should probably live in Catalog: A new function like Catalog.registerTableFromCatalog() to "move" a single table to the current catalog. The HadoopCatalog could then do the special-handling in its implementation.

sourceCatalog.listTables(ns).stream()).collect(Collectors.toList());
}
List<TableIdentifier> migratedTableIdentifiers = new ArrayList<TableIdentifier>();
allIdentifiers.forEach(tableIdentifier -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this will run for a very long time, like when there a a lot of tables.
If things fail in the meantime, it's hard to resume after the failed table.
I.e. error handling here is tricky.

Not sure whether it is actually possible to properly handle the case when registerTable worked, but dropTable failed - in such a case you'd have the same table in two catalogs.

if (tableIdentifiers == null || tableIdentifiers.isEmpty()) {
List<Namespace> namespaces = (sourceCatalog instanceof SupportsNamespaces) ?
((SupportsNamespaces) sourceCatalog).listNamespaces() : ImmutableList.of(Namespace.empty());
allIdentifiers = namespaces.stream().flatMap(ns ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this will run for a very long time, like when there a a lot of tables.

public static List<TableIdentifier> migrateTables(List<TableIdentifier> tableIdentifiers,
Map<String, String> sourceCatalogProperties, Map<String, String> targetCatalogProperties,
Object sourceHadoopConfig, Object targetHadoopConfig) {
if (tableIdentifiers != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave out the catalog instantiation and configuration here completely. I suspect that users have at least one of these catalogs already handy - and setting up "the same" catalog twice is superfluous.

}

String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
String newMetadataLocation = (base == null) && (metadata.metadataFileLocation() != null) ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?
It's a new commit, not sure whether it is good that it re-uses an existing metadata location that is (potentially) "owned" by another catalog.

Assertions.assertThat(newTable).isNotNull();
TableOperations ops = ((HasTableOperations) newTable).operations();
String metadataLocation = ((NessieTableOperations) ops).currentMetadataLocation();
Assertions.assertThat("file:" + metadataVersionFiles).isEqualTo(metadataLocation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: Your assertions are often the "wrong" way around.
It's always assertThat(<current state>)...., followed by the expectations.

@Test
public void testRegisterTableWithGivenBranch() {
List<String> metadataVersionFiles = metadataVersionFiles(TABLE_NAME);
Assertions.assertThat(1).isEqualTo(metadataVersionFiles.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: assertThat(metadataVersionFiles).hasSize(1)

List<String> metadataVersionFiles = metadataVersionFiles(TABLE_NAME);
Assertions.assertThat(1).isEqualTo(metadataVersionFiles.size());
ImmutableTableReference tableReference =
ImmutableTableReference.builder().reference("main").name(TABLE_NAME).build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a different branch here.
Using the default branch is not that great - and the test says ...Branch implying it's not the default branch.

Assert.assertEquals(ns, JdbcUtil.stringToNamespace(nsString));
}

@Test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these tests better live in CatalogTests?

}

@Test
public void testRegisterTable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pair of tests is repeated (in a very similar way) across multiple catalogs. Can those be centralized somewhere? CatalogTests maybe?

@ajantha-bhat
Copy link
Member

@Mehul2500 : Please rebase this PR.

@ajantha-bhat
Copy link
Member

As there is no activity in this.
I have opened #5492

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 17, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants