Skip to content

Hive Meta Store impersonation access#15736

Merged
arhimondr merged 1 commit intoprestodb:masterfrom
BlueStalker:hms_impersonation
Apr 29, 2021
Merged

Hive Meta Store impersonation access#15736
arhimondr merged 1 commit intoprestodb:masterfrom
BlueStalker:hms_impersonation

Conversation

@BlueStalker
Copy link

@BlueStalker BlueStalker commented Feb 23, 2021

This PR enabled the option of impersonation access of the Hive Metastore impersonation.
It also includes the changes to enable multiple HMS instances load-balancing and reporting the stats for each individual
HMS instance.

Test plan - (Please fill in how you tested your changes)
Hands-on testing when enabling the impersonation
Plan to add more integration testing cases during turning on the config

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
N/A

Hive Changes
* Hive Metastore Impersonation Access, add configuration property ``hive.metastore-impersonation-enabled`` to turn it on
* Load balancing of multiple Hive Metastore instances, add configuration property ``hive.metastore.load-balancing-enabled`` to turn it on

@BlueStalker BlueStalker force-pushed the hms_impersonation branch 2 times, most recently from 8dbea7f to a085454 Compare February 24, 2021 17:11
@BlueStalker
Copy link
Author

ping @arhimondr Sorry that it has been a while since last time I worked on #14464, now that PR is staled since it has been a long time, I reopened it here. Please take another look, Thanks

@BlueStalker BlueStalker force-pushed the hms_impersonation branch 3 times, most recently from 0746e03 to bfc66b0 Compare February 25, 2021 07:59
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still too high level API to wrap in the doAs statement. As explained in the comment here different metastore implementations (Hive / Glue / Inmemory / Thrift / File / etc.) have completely different authentication methods. The doAs mechanism is HiveMetastore specific (even Hadoop client specific). Using it at this level breaks the encapsulation and abstraction.

Additionally, historically in Presto we are trying to minimize the usage of the thread local contexts, as they tend to be very fragile. Think if there's a metastore implementation that does method invokation in a separate thread pool for efficiency (e.g.: multiple calls in parallel). The doAs based implementation will be broken for such a use case.

Thus I strongly recommend you passing the identity all the way down to the lower level interface and wrapping the invokations with doAs only at the ThriftHiveMetastore.class level

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping again, the doAs has been removed, and the new implementation is based on passing the MetastoreContext object explicitly in the interface.

@BlueStalker BlueStalker force-pushed the hms_impersonation branch 3 times, most recently from 4cdaf43 to 4b4af73 Compare March 15, 2021 20:03
@BlueStalker BlueStalker reopened this Mar 15, 2021
@BlueStalker BlueStalker requested a review from arhimondr March 15, 2021 21:15
@BlueStalker BlueStalker reopened this Mar 22, 2021
@aweisberg aweisberg self-requested a review March 25, 2021 14:26
@aweisberg
Copy link
Contributor

Sorry for the delay, the docker images were all deleted yesterday and I was trying to resolve that. I'll give reviewing this a shot. I don't have as much context as Andrii and people are pretty busy right now.

Is any of this a cherrypick from Trino?

@aweisberg
Copy link
Contributor

Since the product tests aren't working we can't properly test this right now. Some of Andrii's feedback from the last PR hasn't been incorporated like baking stuff into the docker images and default value in MetastoreClientConfig.

@BlueStalker
Copy link
Author

BlueStalker commented Mar 25, 2021

Since the product tests aren't working we can't properly test this right now. Some of Andrii's feedback from the last PR hasn't been incorporated like baking stuff into the docker images and default value in MetastoreClientConfig.

Hey @aweisberg Thanks for reviewing, basically, this is the PR that originally from #14464, and this is not exactly the trino PR of https://github.com/trinodb/trino/pull/1441/files,

In this PR, it does the HMS impersonation access like above trino PR did, as well as adding metrics for each individual HMS, while the major concern from @arhimondr is the Thread Local Storage, I think I already addressed in the latest diff.

For the Docker Image problem, I am wondering what's the plan here, are we going to fix the docker image repo and upgrade the memory setting (basically bump up Xmx) in the image?

@aweisberg
Copy link
Contributor

Right now we can't build the docker images because Cloudera put everything behind a paywall. We are trying to address that, but it might take a bit.

I am asking about Trino because if there is code from Trino then we need co-author tags to reflect that.

@aweisberg
Copy link
Contributor

I'm sorry for the delay reviewing this. Both @arhimondr and myself have been pretty swamped with things that have deadlines and this is a non-trivial review. Especially for me as I don't have much context in this area so I would have to build it.

@BlueStalker
Copy link
Author

I'm sorry for the delay reviewing this. Both @arhimondr and myself have been pretty swamped with things that have deadlines and this is a non-trivial review. Especially for me as I don't have much context in this area so I would have to build it.

That's OK. this diff is NOT ONLY containing the changes for HMS impersonation but also including the metrics break-down by multiple-HMS since it also enables the RR multiple HMS access here. It is a bit confusing for me how memory problems have been triggered by this diff. And when I tried to test it locally, as we discussed in the slack, even in loading the data, the yarn container has exited because of memeory issue, and I think the CI here has already been passed that step, it is not clear what's difference between local env and CI env which I feel like it is just a docker compose env..)

@BlueStalker
Copy link
Author

Ping @aweisberg @arhimondr again, this seemed to take quite a long time, what's our plan here.

@aweisberg aweisberg removed their request for review April 9, 2021 17:09
@BlueStalker
Copy link
Author

@arhimondr Addressed most of the comments and please help another run, Also I don't find there is any new commits on docker-image repo, since the read timeout is still happening, I still believe we need to update the memory limit in the docker container for service like hms. @aweisberg do you have some updates on how to updating the docker-image?

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments + missed comments from the previous review. Otherwise looks good to me.

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits.

Please take a look at the test failures, they seem to be related

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github must've done something weird. I was leaving this comment in the memoizeMetastore method

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This PR enabled the option of impersonation access of the Hive
Metastore impersonation.
It also includes the changes to enable multiple HMS instances
load-balancing, and reporting the stats for each individual
HMS instance.
@arhimondr arhimondr merged commit b29aaa3 into prestodb:master Apr 29, 2021
@bhhari bhhari mentioned this pull request May 11, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants