-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Registering tables to nonexistent target namespace leads to metadata deletion in HiveCatalog #13434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| "Cannot handle an empty argument metadata_file"); | ||
|
|
||
| Catalog icebergCatalog = ((HasIcebergCatalog) tableCatalog()).icebergCatalog(); | ||
| if (tableName.hasNamespace() && icebergCatalog instanceof SupportsNamespaces) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this check should be part of the procedure. Instead, such a check should be added in registerTable() in BaseMetastoreCatalog / RestSessionCatalog. Tests can then be added to CatalogTests for RestSessionCatalog. Tests for registerTable in BaseMetastoreCatalog are spread across different catalog implementations, so tests should be added there as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra Thank you for your feedback, I think it makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra I've addressed your feedback. Please take another look when you have time, thanks.
| return PropertyUtil.propertyAsBoolean( | ||
| restCatalog.properties(), | ||
| RESTCompatibilityKitSuite.RCK_REQUIRES_NAMESPACE_CREATE, | ||
| super.requiresNamespaceCreate()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super.requiresNamespaceCreate() is false in parent class.
| && !JdbcUtil.namespaceExists(catalogName, connections, namespace)) { | ||
| throw new NoSuchNamespaceException( | ||
| "Cannot create table %s in catalog %s. Namespace %s does not exist", | ||
| "Cannot create table %s in catalog %s. Namespace does not exist: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the error message b/c CatalogTests.tableCreationWithoutNamespace() expects this format.
assertThatThrownBy(
() ->
catalog().buildTable(TableIdentifier.of("non-existing", "table"), SCHEMA).create())
.isInstanceOf(NoSuchNamespaceException.class)
.hasMessageContaining("Namespace does not exist: non-existing");
I think the other way to look at this problem might be disable the clean up for failed commit on table registration Today the |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
Keep alive |
| return new BaseTable(ops, fullTableName(name(), identifier), metricsReporter()); | ||
| } | ||
|
|
||
| protected void targetNamespaceExists(TableIdentifier identifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're adding the namespace check at a central place that now affects all catalogs extending this class. I think the main issue is that not every catalog actually requires an explicit namespace creation before e.g. creating a table inside a namespace or registering a table. That's also why we introduced the requiresNamespaceCreate() flag in the tests.
That being said, I don't think we can actually perform this check here as it should be specific to the respective catalog implementation and depending on whether that catalog implementation actually requires a namespace to be created or not.
For example, the default behavior of the JDBC catalog is to not require a namespace to exist when you create a table. That means registering a table should also not require for that namespace to exist beforehand.
However, if you configure strict-mode, then namespace existence is required, meaning that the targetNamespaceExists check should only be performed when strict-mode is on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nastra Thank you for the feedback and I agree w/ you.
That's also why we introduced the requiresNamespaceCreate() flag in the tests.
I like this idea when I am working on the test.
How about we make the requiresNamespaceCreate() a default method (return false by default) in SupportsNamespaces?
Doing so:
- Makes the requirement explicitly per implementation, i.e. promoting this semantics from
src/testtosrc/main. - Skips
targetNamespaceExistscheck if the catalog implementation doesn't require it.
|
|
||
| @Test | ||
| public void testRegisterTable() { | ||
| ecsCatalog.createNamespace(Namespace.of("a")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate why this change is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because
EcsCatalogimplementsSupportsNamespacesand thetargetNamespaceExistsI introduced in this PR requires an existent namespace.TestEcsCatalogdoesn't implementCatalogTests<EcsCatalog>and it doesn't know about therequiresNamespaceCreatemethod. Therefore, unlike thetestRegisterTableinCatalogTests, this test doesn't create a namespace beforehand so I created it here.
// From CatalogTests
@Test
public void testRegisterTable() {
C catalog = catalog();
if (requiresNamespaceCreate()) {
catalog.createNamespace(TABLE.namespace());
}
// omitted
}|
It seems reasonable to throw I am also thinking about @dramaticlly ' suggestion on not cleaning up metadata files for |
|
@stevenzwu Thank you for your review.
I agree w/ you and @dramaticlly, doing so would be helpful during catalog migration. |
|
@hsiang-c thanks for the link to PR #14083! It allowed me to catch up on the discussion here. I encountered a similar issue in a concurrent environment where the In PR #14083, I implemented a solution aligned with the approach mentioned above. That is, the metadata file is only deleted upon failure if it was newly created by the register table operation. Could you please take a look when you have a moment @nastra @dramaticlly @stevenzwu @hsiang-c, thanks a lot! |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
keep alive |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Related to: #1533 and #14083
Context
HiveCatalogand realize that themetadata.jsonfiles used are deleted when the target namespace doesn't exist.ValidationExceptionis thrown from thedoCommit()method ofHiveTableOperations, as Error when creating a tableInvalidObjectException#1533 indicated.finallyblock of thedoCommitmethod deletes themetadata.jsonfrom existing Iceberg tables and leads to data corruption.Proposed changes
BaseMetastoreCatalogandRESTSessionCatalog,registerTablemethod now throwsNoSuchNamespaceExceptionif namespace doesn't exist.JdbcCatalog. It can be configured withjdbc.strict-mode=falseand skip namespace existence checks forcreateTableandcreateView. Therefore, I follow this behavior inregisterTable.Tests
BigQueryMetastoreCatalogCatalogTestsInMemoryCatalogCatalogTestsRESTCatalogCatalogTestsHiveCatalogCatalogTestsNessieCatalogCatalogTestsJdbcCatalogTestJdbcCatalogDynamoDbCatalogTestDynamoDbCatalogEcsCatalogTestEcsCatalogGlueCatalogTestGlueCatalogHadoopCatalogTestHadoopCatalogSnowflakeCatalogSnowflakeCatalogTestRESTCompatibilityKitCatalogTestsis usingRESTCatalog(client) andJdbcCatalog(server) combination. Therefore, only iniceberg-open-apimodule, I turned on strict mode onJdbcCatalogso that client and server have consistent behavior.RegisterTableProcedure