Skip to content

Conversation

@hantangwangd
Copy link
Member

@hantangwangd hantangwangd commented Jan 1, 2025

Description

This PR support renaming table behavior for iceberg tables on hive file catalog.

Motivation and Context

Support renaming table for Iceberg connector on as many catalog types as possible

Impact

Iceberg connector configured with hive file catalogs can now support renaming table

Test Plan

  • Enable existing test cases for renaming table on HIVE file catalog

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support of ``renaming table`` for Iceberg connector when configured with ``HIVE`` file catalog.

@hantangwangd hantangwangd force-pushed the support_hive_catalog_rename_table branch 2 times, most recently from 7cd47c9 to 366408c Compare January 1, 2025 12:02
@hantangwangd hantangwangd marked this pull request as ready for review January 1, 2025 17:21
@hantangwangd hantangwangd requested review from a team and ZacBlanco as code owners January 1, 2025 17:21
@hantangwangd hantangwangd requested review from agrawalreetika, presto-oss and tdcmeehan and removed request for presto-oss January 1, 2025 17:21
ZacBlanco
ZacBlanco previously approved these changes Jan 3, 2025
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes are fine, however the code complexity seems high due to all the nested conditionals. I would consider factoring out some methods or subclassing as mentioned in my comment.

try {
if (!metadataFileSystem.rename(getTableMetadataDirectory(databaseName, tableName), getTableMetadataDirectory(newDatabaseName, newTableName))) {
throw new PrestoException(HIVE_METASTORE_ERROR, "Could not rename table directory");
if (!isIcebergTable(table)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do it for this PR, but all these conditionals make me think that it is worth considering to subclass the FileHiveMetastore and create a FileIcebergHiveMetastore or something similar to capture the differences in behavior. It seems a bit messy currently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, seems there are indeed some implementation details that differ between them and need to be special handled. It is worth considering to put these different details into Iceberg's own subclass.

@steveburnett
Copy link
Contributor

Thanks for the release note! Nit of formatting suggestion:

== RELEASE NOTES ==

Iceberg Connector Changes
* Add support of ``renaming table`` for Iceberg connector when configured with ``HIVE`` file catalog. :pr:`24312`

@hantangwangd
Copy link
Member Author

Thanks @steveburnett, fixed. Please take a look when available.

Copy link
Member

@agrawalreetika agrawalreetika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hantangwangd Can we add some tests in iceberg which tests these newly added code lines?

@hantangwangd hantangwangd force-pushed the support_hive_catalog_rename_table branch from 4a0b64d to e3fa5dc Compare January 5, 2025 17:19
@hantangwangd hantangwangd marked this pull request as draft January 5, 2025 17:21
@hantangwangd hantangwangd force-pushed the support_hive_catalog_rename_table branch from e3fa5dc to 7330926 Compare January 6, 2025 03:44
@hantangwangd hantangwangd marked this pull request as ready for review January 6, 2025 07:29
@hantangwangd
Copy link
Member Author

Can we add some tests in iceberg which tests these newly added code lines?

I constructed a test environment based on a customized ExtendedFileSystem which can simulate the scenarios where certain file operations fail. In this way, I added some test cases to test these newly added code branches. Please take a look when available. And open to any better test solutions. @agrawalreetika @ZacBlanco

@steveburnett
Copy link
Contributor

Thanks @steveburnett, fixed. Please take a look when available.

Looks good, thanks!

ZacBlanco
ZacBlanco previously approved these changes Jan 8, 2025
agrawalreetika
agrawalreetika previously approved these changes Jan 8, 2025
@tdcmeehan tdcmeehan self-assigned this Jan 14, 2025
@hantangwangd hantangwangd force-pushed the support_hive_catalog_rename_table branch from 7330926 to 8bfd938 Compare February 11, 2025 03:30
@steveburnett
Copy link
Contributor

New release note guidelines. Please remove the manual PR link in the following format from the release note entries for this PR.


:pr:`12345`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

@hantangwangd
Copy link
Member Author

@steveburnett Thanks for the reminder, fixed!

@hantangwangd
Copy link
Member Author

This is simply rebased to the newest commit list, please take a look when available, thanks a lot! @ZacBlanco @agrawalreetika @tdcmeehan

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some minor things

metadataFileSystem.delete(newTablePermissionDir, true);
}
catch (IOException e) {
// ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe log a warning here with the error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done!

@Override
public boolean mkdirs(Path f, FsPermission permission) throws IOException
{
if (this.failSignal.get() == FailSignal.COPY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the signal for mkdirs COPY? Maybe just rename to MKDIRS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mkdirs operation is a step of the COPY operation, we want to simulate the behavior of failure during the COPY process. But I think it's fine to use MKDIRS to reduce the confusion, fixed!

Comment on lines 584 to 586
if (!metadataFileSystem.rename(originalMetadataDirectory, newMetadataDirectory)) {
throw new PrestoException(HIVE_METASTORE_ERROR,
format("Could not rename table, because of fail to rename directory %s to %s", originalMetadataDirectory, newMetadataDirectory));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extract this rename in a utility method and use that instead here and above?

Comment on lines 273 to 277
createFile(fileSystem, new Path(originSchemaMetadataPath), DATABASE_CODEC.toBytes(databaseMetadata));
createFile(fileSystem, new Path(newSchemaMetadataPath), DATABASE_CODEC.toBytes(databaseMetadata));
createFile(fileSystem, new Path(originTableMetadataPath), TABLE_CODEC.toBytes(tableMetadata));
createFile(fileSystem, new Path(originTablePermissionFilePath), new byte[128]);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's extract this common code to a method use that instead in all the places where these file setup is done to reduce repetition

Comment on lines 231 to 236
assertFalse(fileSystem.exists(new Path(newTableMetadataPath)));
assertFalse(fileSystem.exists(new Path(newTablePermissionDirPath)));
assertFalse(fileSystem.exists(new Path(newTablePermissionFilePath)));
assertTrue(fileSystem.exists(new Path(originTableMetadataPath)));
assertTrue(fileSystem.exists(new Path(originTablePermissionDirPath)));
assertTrue(fileSystem.exists(new Path(originTablePermissionFilePath)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract this common code in a method?

}

@Test
public void testRenameTableFailCausedByRenameTableSchemaFile()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can reduce the redundancy in different test methods by creating a new method with a common code in it and send different FailSignal as a parameter for different test cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I have refactor the code, extracted the common code logic into a base method testRenameTableWithFailSignalAndValidation. It accepts fail signal, customized rename logic and verification logic as parameters, and be invoked by all the test methods.

@hantangwangd hantangwangd force-pushed the support_hive_catalog_rename_table branch 2 times, most recently from 3110ae5 to 5c30649 Compare February 13, 2025 01:21
@hantangwangd hantangwangd force-pushed the support_hive_catalog_rename_table branch from 5c30649 to cf83835 Compare February 13, 2025 01:36
@hantangwangd
Copy link
Member Author

Hi @ZacBlanco @agrawalreetika, the comments have all been addressed, please take a look when convenient, thanks!

Copy link
Member

@agrawalreetika agrawalreetika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hantangwangd
Copy link
Member Author

Hi @jaystarshot, can you please help take a final look at this PR when available? Thanks a lot!

jaystarshot
jaystarshot previously approved these changes Feb 13, 2025
}

private void renamePath(Path originalPath, Path targetPath, String errorMessage)
throws IOException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this code path seems to throwing a PrestoException in some places with a HIVE_METASTORE_ERROR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an extracted private method, the IOException thrown by it will always be caught by other methods and converted to a PrestoException with HIVEMETASTORE_ERROR as you mentioned.

metadataFileSystem.delete(new Path(originalMetadataDirectory, PRESTO_PERMISSIONS_DIRECTORY_NAME), true);
}
catch (IOException e) {
// ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception seems to be ignored here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an intentional exception ignorance, as the deletion action here will not affect the entire rename behavior.

Copy link
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I am not well versed with how the iceberg connector works but should there be iceberg specific code in the presto hive metastore

metadataFileSystem.delete(newTablePermissionDir, true);
}
catch (IOException e) {
LOG.warn("Could not delete table permission directory: %s", newTablePermissionDir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should we ignore exception here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above.

@hantangwangd
Copy link
Member Author

Sorry I am not well versed with how the iceberg connector works but should there be iceberg specific code in the presto hive metastore

@jaystarshot Thanks for your review. This is a good question. At present, all lake houses share the same HMS code, so that some Iceberg related code logic already existed in FileHiveMetastore before this change. But @ZacBlanco and I has discussed about this in the above conversation, see here. We will extract a subclass of FileHiveMetastore for Iceberg in a later PR.

@hantangwangd hantangwangd merged commit db92976 into prestodb:master Feb 15, 2025
54 checks passed
@hantangwangd hantangwangd deleted the support_hive_catalog_rename_table branch February 15, 2025 03:49
@prestodb-ci prestodb-ci mentioned this pull request Mar 28, 2025
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants