-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11821] Propagate Kerberos keytab for all environments #9859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…onments, bugfix of renewing credentials in the Classloader context
|
LGTM. Are there any other configs required? I remember Hadoop security had a bunch of configs. /cc @tgravescs |
|
This is the one parameter needed to update kerberos credentials based on a keytab. The other parameters can be loaded from system environments or config files. |
|
so Spark has never officially supported this outside of YARN - local mode, standalone, etc. So this isn't a bug, but would be an improvement. --keytab and --principal options are under the YARN only section, documentation is for yarn only, etc.. I'm fine if you want to take this on but I think if you make it more then YARN you need to make sure it works in all the environments or at least the one we intend to support and its all documented, tested, etc. There have been attempts at adding this to standalone in the past but have always had various issues. cc @pwendell as he might remember more about that mode. I also don't like forcing the HADOOP_SECURITY_AUTHENTICATION setting for all environments to kerberos. For YARN this should be picked up from the hadoop configuration if other modes require it thats fine but at the same time if other modes are using hadoop and kerberos perhaps they should pick up the proper hadoop configuration instead of hardcoding it. |
|
Hi! In case of HADOOP_SECURITY_AUTHENTICATION there isn't this parameter in SparkConf, neither in HiveContext. What can you suggest? I think if the keytab is defined then authentication method should be Kerberos, shouldn't it? |
|
So, is there any code in any of the other backends to actually use those config options? Looking at the bug, isn't the fix actually achieved by just doing the login ( |
|
No It's not enough. Please look at the code of the method UserGroupInformation.loginUserFromKeytab(args.principal, args.keytab). There is a check of authentication type. If it's simple (what is default in the classloader context) then parameters are not used. So that I think the recent patch need this fix. I faced the situation I described. :) |
|
I know that, but that's not what I'm asking about. I'm asking about all the other code; why do you need to set I also agree with Tom that the authentication config value should come from the hadoop configuration, and not be set explicitly like this. If you need to set that, you're using the wrong configuration. Also, I'm a little skeptical that this would work; anybody who has kerberos auth for the Hive metastore probably has the same for HDFS, and while this patch would make it work for local mode, it would not work for Standalone nor Mesos. |
|
Let's see the situation when I wrap the context of SparkContext with my UGI and even wrap the HiveContext with my UGI then most of Spark functions work with my provided credentials. However, the ClientWrapper object is created in another (the classloader) context, so the UGI provided in my application doesn't work anymore. That's why the property has to be set. Is it clear or should I describe it in details? |
|
Ah, I see. Also, in non-local, non-yarn mode, this will just lead to different issues down the line, so those other modes should probably be excluded from this. |
|
Ok, I can prepare a commit with:
@vanzin , @tgravescs what do you think? |
It's still unclear to me why you need to do that at all. If you set |
|
No. I need to load the property, because it's not loaded automatically by |
|
That |
|
I'll check it tomorrow. |
|
Hi! |
|
That would be more acceptable; although you already have to provide the Hive configuration if you're accessing the metastore, so for correctness you should be providing everything (otherwise how do you plan to read the data stored in the tables?). |
… keytab properties in local and yarn mode
|
I see my problem. Defining the environment variable HADOOP_CONF_DIR is not enough and doesn't work (not read config files there). Copy of configs to the classpath works. Thanks @vanzin . You are right about the hive-site. All in all- I don't need to set additional config, just have keytab properties in YARN mode as well as LOCAL mode. I've made a commit. I plan to read data from tables by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better than this would be to update docs/sql-programming-guide.md. It currently only mentions hive-site.xml, when really you need core-site.xml and hdfs-site.xml too (otherwise kerberos, in your case, or things like HDFS HA won't work).
The other changes in this file can be reverted, too.
|
ok to test |
|
Test build #46625 has finished for PR 9859 at commit
|
|
@vanzin thanks for your support. I've made a commit with documentation update and clean the code as you asked. |
|
Test build #46672 has finished for PR 9859 at commit
|
docs/running-on-yarn.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Works also with the "local" master.
docs/sql-programming-guide.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "for HDFS configuration"
|
Just minor nits, otherwise LGTM. |
|
Also, just for posterity, if you're running local mode, you should be able to |
|
Thank you @vanzin for your help. I commited the nits in the documentation. |
|
Test build #46766 has finished for PR 9859 at commit
|
|
retest this please |
|
Test build #46923 has finished for PR 9859 at commit
|
|
The change doesn't affect pyspark; so merging (master and 1.6). |
andrewor14 the same PR as in branch 1.5 harishreedharan Author: woj-i <[email protected]> Closes #9859 from woj-i/master. (cherry picked from commit 6a8cf80) Signed-off-by: Marcelo Vanzin <[email protected]>
@andrewor14 the same PR as in branch 1.5
@harishreedharan