-
Notifications
You must be signed in to change notification settings - Fork 3k
Core : Catalog Tables Migration API #5297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| "Cannot initialize Target Catalog implementation %s: %s", targetCatalogProperties.get("catalogImpl"), | ||
| e.getMessage()), e); | ||
| } | ||
| List<TableIdentifier> allIdentifiers = tableIdentifiers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, this code should probably live in Catalog: A new function like Catalog.registerTableFromCatalog() to "move" a single table to the current catalog. The HadoopCatalog could then do the special-handling in its implementation.
| sourceCatalog.listTables(ns).stream()).collect(Collectors.toList()); | ||
| } | ||
| List<TableIdentifier> migratedTableIdentifiers = new ArrayList<TableIdentifier>(); | ||
| allIdentifiers.forEach(tableIdentifier -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this will run for a very long time, like when there a a lot of tables.
If things fail in the meantime, it's hard to resume after the failed table.
I.e. error handling here is tricky.
Not sure whether it is actually possible to properly handle the case when registerTable worked, but dropTable failed - in such a case you'd have the same table in two catalogs.
| if (tableIdentifiers == null || tableIdentifiers.isEmpty()) { | ||
| List<Namespace> namespaces = (sourceCatalog instanceof SupportsNamespaces) ? | ||
| ((SupportsNamespaces) sourceCatalog).listNamespaces() : ImmutableList.of(Namespace.empty()); | ||
| allIdentifiers = namespaces.stream().flatMap(ns -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this will run for a very long time, like when there a a lot of tables.
| public static List<TableIdentifier> migrateTables(List<TableIdentifier> tableIdentifiers, | ||
| Map<String, String> sourceCatalogProperties, Map<String, String> targetCatalogProperties, | ||
| Object sourceHadoopConfig, Object targetHadoopConfig) { | ||
| if (tableIdentifiers != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave out the catalog instantiation and configuration here completely. I suspect that users have at least one of these catalogs already handy - and setting up "the same" catalog twice is superfluous.
| } | ||
|
|
||
| String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1); | ||
| String newMetadataLocation = (base == null) && (metadata.metadataFileLocation() != null) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
It's a new commit, not sure whether it is good that it re-uses an existing metadata location that is (potentially) "owned" by another catalog.
| Assertions.assertThat(newTable).isNotNull(); | ||
| TableOperations ops = ((HasTableOperations) newTable).operations(); | ||
| String metadataLocation = ((NessieTableOperations) ops).currentMetadataLocation(); | ||
| Assertions.assertThat("file:" + metadataVersionFiles).isEqualTo(metadataLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hint: Your assertions are often the "wrong" way around.
It's always assertThat(<current state>)...., followed by the expectations.
| @Test | ||
| public void testRegisterTableWithGivenBranch() { | ||
| List<String> metadataVersionFiles = metadataVersionFiles(TABLE_NAME); | ||
| Assertions.assertThat(1).isEqualTo(metadataVersionFiles.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hint: assertThat(metadataVersionFiles).hasSize(1)
| List<String> metadataVersionFiles = metadataVersionFiles(TABLE_NAME); | ||
| Assertions.assertThat(1).isEqualTo(metadataVersionFiles.size()); | ||
| ImmutableTableReference tableReference = | ||
| ImmutableTableReference.builder().reference("main").name(TABLE_NAME).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a different branch here.
Using the default branch is not that great - and the test says ...Branch implying it's not the default branch.
| Assert.assertEquals(ns, JdbcUtil.stringToNamespace(nsString)); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these tests better live in CatalogTests?
| } | ||
|
|
||
| @Test | ||
| public void testRegisterTable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pair of tests is repeated (in a very similar way) across multiple catalogs. Can those be centralized somewhere? CatalogTests maybe?
9f6e3f8 to
88a050f
Compare
|
@Mehul2500 : Please rebase this PR. |
|
As there is no activity in this. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Introducing
migrateTables()inCatalogUtilwhich could help in the migration of Iceberg tables in any Source Catalog to any Target Catalog. Uses PR #5037 , for theregisterTable()functionality inBaseMetastoreCatalog.I used tables migrating from Hadoop Catalog to Hive Catalog for the test case.