-
Notifications
You must be signed in to change notification settings - Fork 3k
Hive: Add table UUID to HMS table properties. #3914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
As per our previous discussions, in the long term we would like to keep Iceberg table properties and HMS table properties in sync. With this change and with the Hive 4.0.0 changes in place the UUID will be synchronised back to the Iceberg table properties too. Would this be a problem? |
I don't think that we'd want to do it. Maybe this is okay for just the Iceberg integration since it likely won't be used with Hive 4? |
(cherry picked from commit e8b9c435722b079035ba3365aae8ba4326c8ad65)
UUID will be considered as immutable, read-only in HMS. It isn't needed to be synced back. It'd be nice if there is a way to label it read-only in HMS. Not sure if that is possible. Hi @aokolnychyi, @RussellSpitzer, @szehon-ho, @anuragmantri can you take a look at this PR? |
|
|
||
| /** | ||
| * Reserved Iceberg table properties list. | ||
| * <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might need to add the new constant UUID to this set of reserved properties just below this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We should NOT allow user to add a uuid as a table property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add a test for that.
| final Map<String, String> expectedProperties = ImmutableMap.of("key", "value"); | ||
| table("tl").updateProperties() | ||
| .set("uuid", uuid) | ||
| .set("key", "value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make the same change for the same test in v1.12 and v1.13?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI said yes :-( . Made the change.
|
I'm not aware of a way to mark a table property as immutable in HMS. This is not a blocking comment. Just curious. Could you elaborate more on how we would use this UUID from HMS directly? |
An application can identify an Iceberg table by just checking the properties in HMS without reading the metadata.json. It is pretty useful in case that the application doesn't have the permission to access the file system. |
| .map(Snapshot::summary) | ||
| .orElseGet(ImmutableMap::of); | ||
| setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), removedProps, hiveEngineEnabled, summary); | ||
| setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), removedProps, hiveEngineEnabled, summary, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should be passing through the metadata here instead of "tbl, metadata.properties and uuid"
Forgot which table this was. I still think we should probably pass through metadata instead of metadata.properties and uuid now that we use both things. Not a big deal though if you want to leave it as separate parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change
|
|
||
| if (Catalogs.hiveCatalog(shell.getHiveConf(), tableProperties)) { | ||
| Assert.assertEquals(13, hmsParams.size()); // 2 newly-added properties + previous_metadata_location prop | ||
| Assert.assertEquals(14, hmsParams.size()); // 2 newly-added properties + previous_metadata_location prop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could we convert 14 -> an expression that calculates the expected number of properties rather than having to manually change this whenever we add new properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to avoid magic number. Here, I'd just leave it as is. Adding a reserved table property doesn't happen frequently. I'd avoid a smart solution to calculate the property number, which add complexity to the unit test. But I'm also open to any nice solution.
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor suggestion, but this all looks good to me
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this all looks good to me. Going to merge unless I hear any other comments in the next hour or so.
|
This looks good to me. Thanks. |
|
My main requirement is that this is a one-way transfer only. UUID should not show up in Iceberg table properties. As long as that's the case, I'm fine with this. |
Yes. One-way. And it won't show up as an Iceberg table property. |
This would require any future implementations that automatically try sync Hive properties back into Iceberg properties to ignore reserved properties, which is the behavior I think we are looking for here. Basically, any reserved properties can not be modified through normal property operations. |
|
Thanks @flyrain and @karuppayya ! Note that this will introduce a slight backwards incompatibility as a user who has previously set UUID as a table property may run into some issues |
#2123 pushes Iceberg table property values to HMS table properties. It'd be nice to add table uuid to HMS table properties as well. Uuid is critical to identify a table. This change is to support the application that wants to identify a table through its uuid by only access the HMS.