Skip to content

Conversation

@flyrain
Copy link
Contributor

@flyrain flyrain commented Jan 18, 2022

#2123 pushes Iceberg table property values to HMS table properties. It'd be nice to add table uuid to HMS table properties as well. Uuid is critical to identify a table. This change is to support the application that wants to identify a table through its uuid by only access the HMS.

Karuppayya Rajendran and others added 2 commits January 17, 2022 22:03
(cherry picked from commit f88366a)
(cherry picked from commit 7d66707)
@pvary
Copy link
Contributor

pvary commented Jan 18, 2022

As per our previous discussions, in the long term we would like to keep Iceberg table properties and HMS table properties in sync. With this change and with the Hive 4.0.0 changes in place the UUID will be synchronised back to the Iceberg table properties too. Would this be a problem?

@rdblue
Copy link
Contributor

rdblue commented Jan 18, 2022

With this change and with the Hive 4.0.0 changes in place the UUID will be synchronised back to the Iceberg table properties too. Would this be a problem?

I don't think that we'd want to do it. Maybe this is okay for just the Iceberg integration since it likely won't be used with Hive 4?

(cherry picked from commit e8b9c435722b079035ba3365aae8ba4326c8ad65)
@github-actions github-actions bot added the MR label Jan 18, 2022
@flyrain
Copy link
Contributor Author

flyrain commented Jan 18, 2022

As per our previous discussions, in the long term we would like to keep Iceberg table properties and HMS table properties in sync. With this change and with the Hive 4.0.0 changes in place the UUID will be synchronised back to the Iceberg table properties too. Would this be a problem?

UUID will be considered as immutable, read-only in HMS. It isn't needed to be synced back. It'd be nice if there is a way to label it read-only in HMS. Not sure if that is possible.

Hi @aokolnychyi, @RussellSpitzer, @szehon-ho, @anuragmantri can you take a look at this PR?


/**
* Reserved Iceberg table properties list.
* <p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might need to add the new constant UUID to this set of reserved properties just below this line.

Copy link
Contributor Author

@flyrain flyrain Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We should NOT allow user to add a uuid as a table property.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a test for that.

@github-actions github-actions bot added the flink label Jan 19, 2022
final Map<String, String> expectedProperties = ImmutableMap.of("key", "value");
table("tl").updateProperties()
.set("uuid", uuid)
.set("key", "value")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the same change for the same test in v1.12 and v1.13?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI said yes :-( . Made the change.

@anuragmantri
Copy link
Contributor

anuragmantri commented Jan 23, 2022

I'm not aware of a way to mark a table property as immutable in HMS. This is not a blocking comment. Just curious. Could you elaborate more on how we would use this UUID from HMS directly?

@flyrain
Copy link
Contributor Author

flyrain commented Jan 23, 2022

I'm not aware of a way to mark a table property as immutable in HMS. This is not a blocking comment. Just curious. Could you elaborate more on how we would use this UUID from HMS directly?

An application can identify an Iceberg table by just checking the properties in HMS without reading the metadata.json. It is pretty useful in case that the application doesn't have the permission to access the file system.

.map(Snapshot::summary)
.orElseGet(ImmutableMap::of);
setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), removedProps, hiveEngineEnabled, summary);
setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), removedProps, hiveEngineEnabled, summary,
Copy link
Member

@RussellSpitzer RussellSpitzer Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should be passing through the metadata here instead of "tbl, metadata.properties and uuid"

Forgot which table this was. I still think we should probably pass through metadata instead of metadata.properties and uuid now that we use both things. Not a big deal though if you want to leave it as separate parameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change


if (Catalogs.hiveCatalog(shell.getHiveConf(), tableProperties)) {
Assert.assertEquals(13, hmsParams.size()); // 2 newly-added properties + previous_metadata_location prop
Assert.assertEquals(14, hmsParams.size()); // 2 newly-added properties + previous_metadata_location prop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could we convert 14 -> an expression that calculates the expected number of properties rather than having to manually change this whenever we add new properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to avoid magic number. Here, I'd just leave it as is. Adding a reserved table property doesn't happen frequently. I'd avoid a smart solution to calculate the property number, which add complexity to the unit test. But I'm also open to any nice solution.

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor suggestion, but this all looks good to me

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this all looks good to me. Going to merge unless I hear any other comments in the next hour or so.

@RussellSpitzer
Copy link
Member

Ok this all looks good to me. Going to merge unless I hear any other comments in the next hour or so.

@pvary or @rdblue ? You commented on the issue earlier?

@anuragmantri
Copy link
Contributor

This looks good to me. Thanks.

@rdblue
Copy link
Contributor

rdblue commented Jan 24, 2022

My main requirement is that this is a one-way transfer only. UUID should not show up in Iceberg table properties. As long as that's the case, I'm fine with this.

@flyrain
Copy link
Contributor Author

flyrain commented Jan 24, 2022

My main requirement is that this is a one-way transfer only. UUID should not show up in Iceberg table properties. As long as that's the case, I'm fine with this.

Yes. One-way. And it won't show up as an Iceberg table property.

@RussellSpitzer
Copy link
Member

My main requirement is that this is a one-way transfer only. UUID should not show up in Iceberg table properties. As long as that's the case, I'm fine with this.

Yes. One-way. And it won't show up as an Iceberg table property.

This would require any future implementations that automatically try sync Hive properties back into Iceberg properties to ignore reserved properties, which is the behavior I think we are looking for here. Basically, any reserved properties can not be modified through normal property operations.

@RussellSpitzer RussellSpitzer merged commit 2ea62c4 into apache:master Jan 24, 2022
@RussellSpitzer
Copy link
Member

Thanks @flyrain and @karuppayya ! Note that this will introduce a slight backwards incompatibility as a user who has previously set UUID as a table property may run into some issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants