HMS Impersonation Access and breakdown metrics by hosts#13699
HMS Impersonation Access and breakdown metrics by hosts#13699zhenxiao merged 6 commits intoprestodb:masterfrom
Conversation
|
ping @zhenxiao and @nezihyigitbasi |
9a5e5c8 to
35cba33
Compare
|
Interesting that that hive tests always report SocketTimeoutException (Read timeout) here, while it is successful in my local env, I am not very sure this related to the memory pressure of the integration test cause it brings up all the Hadoop related service. Anyone has thoughts on this? |
8d6c8fa to
172864e
Compare
|
I create another diff in docker images, trying to fix this timeout issues, details goes to @nezihyigitbasi Do you know someone can also take a look at this code (changes are related with Hive/HMS). |
|
Before anyone starting to do code review, could you please reformat the changes into logical commits? For example, please squash the "fix test" changes with the commit that broke the test. Thanks! |
|
@rongrong updated, Thanks. |
...c/main/java/com/facebook/presto/hive/authentication/KerberosHiveMetastoreAuthentication.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/facebook/presto/hive/authentication/KerberosHiveMetastoreAuthentication.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/facebook/presto/hive/authentication/KerberosHiveMetastoreAuthentication.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/facebook/presto/hive/authentication/KerberosHiveMetastoreAuthentication.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/facebook/presto/hive/authentication/KerberosHiveMetastoreAuthentication.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/facebook/presto/hive/authentication/KerberosHiveMetastoreAuthentication.java
Outdated
Show resolved
Hide resolved
zhenxiao
left a comment
There was a problem hiding this comment.
nice work, @BlueStalker
I went through it a first pass. I reviewed commit by commit. Some comments on the first commit are resolved by your following commits.
Mostly formatting:
- do not use abbreviation for variable naming
- try not use hms, use Metastore, or Hive Metastore
This feature is very useful. Looking forward to merge it!
...store/src/main/java/com/facebook/presto/hive/authentication/HiveMetastoreAuthentication.java
Outdated
Show resolved
Hide resolved
...tastore/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java
Outdated
Show resolved
Hide resolved
...tore/src/main/java/com/facebook/presto/hive/metastore/thrift/HiveMetastoreClientFactory.java
Outdated
Show resolved
Hide resolved
presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/Transport.java
Outdated
Show resolved
Hide resolved
...c/main/java/com/facebook/presto/hive/authentication/KerberosHiveMetastoreAuthentication.java
Outdated
Show resolved
Hide resolved
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
...ive-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/StaticHiveCluster.java
Outdated
Show resolved
Hide resolved
...metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/StaticMetastoreConfig.java
Outdated
Show resolved
Hide resolved
...metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/StaticMetastoreConfig.java
Outdated
Show resolved
Hide resolved
zhenxiao
left a comment
There was a problem hiding this comment.
Hi @BlueStalker mostly good. Just a few variable naming and formatting
presto-hive-metastore/src/main/java/com/facebook/presto/hive/MetastoreClientConfig.java
Outdated
Show resolved
Hide resolved
presto-hive-metastore/src/main/java/com/facebook/presto/hive/MetastoreClientConfig.java
Outdated
Show resolved
Hide resolved
presto-hive-metastore/src/main/java/com/facebook/presto/hive/MetastoreClientConfig.java
Outdated
Show resolved
Hide resolved
presto-hive-metastore/src/main/java/com/facebook/presto/hive/MetastoreClientConfig.java
Outdated
Show resolved
Hide resolved
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
...store/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastoreClient.java
Outdated
Show resolved
Hide resolved
...ive-metastore/src/test/java/com/facebook/presto/hive/metastore/TestCachingHiveMetastore.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveWriteUtils.java
Outdated
Show resolved
Hide resolved
|
@zhenxiao Just got some time to look at your comments. Updated, and plz check it again |
zhenxiao
left a comment
There was a problem hiding this comment.
nice work @BlueStalker
looks good to me
just one indentation issue
...e-metastore/src/main/java/com/facebook/presto/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
|
[test-facebook] |
arhimondr
left a comment
There was a problem hiding this comment.
Could you please elaborate why hive metastore impersonation is needed?
The hive metastore is just a metadata storage. Enforcing security on the metadata level is generally a security anti pattern. The session agnostic design of the metastore communication is designed this way deliberately.
Presto has an extensive system of security check interfaces, see (ConnectorAccessControl, AccessControl) where all the required security checks can be implemented without directly relying on the Metastore.
Additionally the points of metastore communication are ill defined. The metadata can be cached in Presto. The metadata can be fetched lazily. And all this is a subject to change. This is another reason for using existing access controll mechanisms instead of trying to piggy back the security on the metastore itself.
P.S.:
Please format commit messages according to our standard that can be found in this PR description.
| <version>1.3.5-4</version> | ||
| </dependency> | ||
|
|
||
| <dependency> |
There was a problem hiding this comment.
We don't use apache-commons in Presto. What is this library needed for?
|
|
||
| ``hive.metastore.client.keytab`` Hive metastore client keytab location. | ||
| ``hive.metastore.impersonation.enabled`` Enable metastore end-user impersonation. | ||
| ``hive.metastore.impersonation.user`` Default impersonation user when communicating with Hive Metastore |
There was a problem hiding this comment.
This is because the getPartitionsStatistics calling from the initializer and need a "default" impersonation user to call, we can make this user as authenticated user in presto process, should be "presto"
| start_docker_containers | ||
|
|
||
| # restart HMS to pickup memory settings | ||
| exec_in_hadoop_master_container cp /etc/hadoop/conf/hive-env.sh /etc/hive/conf/hive-env.sh |
There was a problem hiding this comment.
This has to be embeded into the container instead of being copied every time.
| - ../../presto-hive/src/test/sql:/files/sql:ro | ||
| - ./files/words:/usr/share/dict/words:ro | ||
| - ./files/core-site.xml.s3-template:/etc/hadoop/conf/core-site.xml.s3-template:ro | ||
| - ./files/hive-env.sh:/etc/hadoop/conf/hive-env.sh:ro |
| @@ -0,0 +1 @@ | |||
| export HADOOP_OPTS="$HADOOP_OPTS -Xmx1024m" No newline at end of file | |||
There was a problem hiding this comment.
Please add this file to the docker image itself
| @@ -0,0 +1 @@ | |||
| export HADOOP_OPTS="$HADOOP_OPTS -Xmx1024m" No newline at end of file | |||
There was a problem hiding this comment.
Also new line (required by Git)
| private String recordingPath; | ||
| private boolean replay; | ||
| private Duration recordingDuration = new Duration(0, MINUTES); | ||
| private String metastoreDefaultImpersonationUser = ""; |
There was a problem hiding this comment.
We don't use empty strings as default values as the semantics are not clear
Thanks for commenting on this. Hive metastore is a metadata system, not a storage system, we totally agree on that, but in the secure production system, impersonation required because we should not make a common user (usually it is "presto") to be able to do anything on the metastore. Actually, without this patch, I think people who turn on the metastore security must either turn off AuthZ, or do something to make "presto" to be able to do any metastore operations, then AuthN become meaningless to Hive metastore anymore. I appreciate that Presto has its own mechanism to do AuthZ, but I think people don't want to fully replicate existing rules in metastore to presto, and it is probably a lot of work to define all the interface/rules for metastore. For the caching stuff, I am not sure exactly what you are asking, if presto is doing the read operation it should be fine (because HDFS will ensure the authZ in the storage layer), when writing, which is probably the major reason of the patch, we can not cache anything here. Let me know if you still have some concerns. Thanks. |
Is this a feature in the Hive Metastore? Could you please share some information about this feature?
That's something that we are doing at Facebook. We support column based and even row based security models. Additionally we support fine grained security model for most of the DDL and DML operations. There's actually nothing wrong with calling the metastore from out of the
Currently Presto supports caching of the metadata. Either globally or on per transaction level. (https://github.com/prestodb/presto/blob/master/presto-hive-metastore/src/main/java/com/facebook/presto/hive/MetastoreClientConfig.java#L35, https://github.com/prestodb/presto/blob/master/presto-hive-metastore/src/main/java/com/facebook/presto/hive/MetastoreClientConfig.java#L38). Making the metastore communication to be session sensitive makes the caching either impossible or greatly inefficient. And one more thing. Presto currently has the strongly defined exception system. Authorization checks failures triggered by one of the metastore calls should be classified as authorization checks failures and shouldn't cause Presto queries to fail with more generic, external https://github.com/prestodb/presto/blob/master/presto-hive-common/src/main/java/com/facebook/presto/hive/HiveErrorCode.java#L27. Thus the logic of handling authorization failure has to be implemented somewhere anyway. And it is better to have it in some explicit integration points then spread across the code base. |
Things guarantee in presto itself is not same as in external system. Even you have full set of rules defined exactly same in Presto according to Metastore (apparently it is not), to make it work, you need to basically disable the authZ in metastore, because when these API goes to metastore, it only knows there is an authenticated user (pretty much is presto) to make the call to do any APIs. Actually I think there is similar thing in prestosql. refer trinodb/trino#1441
From my quick grab of the code, this cache just save some time for (listtable, getable, getpartitions, etc) read, right? Then how this change affected the performance, it is just some overhead during the cache invalidations? Also, this implementation is following what hive does, and people can disable it by config.
|
Let's take
I see. Impersonating metastore access still seems a little weird to me. But since it is something that is widely being done then i think it is fine. I think we should support it.
What are the key differences between this PR and the PR in the prestosql? Did you consider cherry picking it? |
|
|
@BlueStalker Do you think it is possible to cherry pick the patch from presto-sql and apply additional changes on top of that? It will save us time on review round trips as the contribution standards for both projects are similar and also it will help us keep two Presto streams closer together. |
|
Actually, I put my comments in trinodb/trino#1441, I doubt whether the solution they choose actually works, right now, the solution we choose lives in our production for quite a long time, I think it doesn't make too much sense to cherry-pick their uncertain changes and made additional development on top of that to make this even risky. |
That makes sense. Please reopen the pull request and have it prepared for another round of review. Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines. The links can be found in this very PR description.
I don't think there's any specific plan for maintaining the code in sync. But generally it feels like a good idea to keep the codebases close to each other when possible, as it simplifies cherry picking between the projects. |
This commits squash the original commits from PR prestodb#13699 which includes the follow commits: HMS impersonation access refactoring to use HMS Authentication Module add Config for multiple hms instances Update HMS memory settings address review comments
Summary: This commits squash the original commits from PR prestodb#13699 which includes the follow commits: HMS impersonation access refactoring to use HMS Authentication Module add Config for multiple hms instances Update HMS memory settings address review comments Make Hive metastore caching and impersonation mutually exclusive Update to not use TLS to save the identity info Refactor the implementation to explicitly carry over the authentication information, create a MetastoreContext object which includes the auth info coming from the ConnectorSession. Refactoring with code review comments Reviewers: #ldap_presto-core, chliang Reviewed By: #ldap_presto-core, chliang Subscribers: O4263 subscribe to presto changes Differential Revision: https://code.uberinternal.com/D5935797

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines.
Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.