-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18061][SQL][Security] Spark Thriftserver needs to create SPNego principal #15594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@steveloughran @yhuai Will you be able to help review this PR? |
|
I'm not spark committer so can't review it well enough to get in; I was just watching it out of concern for the word "kerberos". How about you ask on the spark developer list for any volunteers to review. Looking at the code, I'd drop using reflection for logging. You can just create an SLF4J log with the name "org.apache.hive.hiveserver2" (or whatever the full path is) and end up with the same log name as the parent class. As for the setting of the hive option, I do think it's an ugly pain which should really be addressed by making hive extensible —until then, what choice to do you have. |
|
Thanks @steveloughran. I will check on the developer list. |
| } else { | ||
| try { | ||
| httpUGI = HiveAuthFactory.loginFromSpnegoKeytabAndReturnUGI(hiveConf) | ||
| setSuperField(this, "httpUGI", httpUGI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have similar code in CLIService.java which is not setting the httpUGI field, do we need to make the behavior the same in both files ?
|
@vanzin I believe this might be your realm :) Could you please help review this. |
|
Thanks Luciano. I am looking at the changes and will add them soon. |
|
ok to test |
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really familiar with the thrift server (Spark's or Hive's) so can't really comment much. Maybe someone more familiar with this code can take a look (use git blame to find out?).
| } | ||
| } | ||
|
|
||
| // Also try creating a UGI object for the SPNego principal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation here is wrong. Also, following blocks have code indented with 4 spaces instead of 2, and wrong indentation.
| val principal = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_PRINCIPAL) | ||
| val keyTabFile = hiveConf.getVar(ConfVars.HIVE_SERVER2_SPNEGO_KEYTAB) | ||
| if (principal.isEmpty() || keyTabFile.isEmpty()) { | ||
| getAncestorField[Log](this, 3, "LOG").info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log message seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vanzin. The log message was added to replicate same behavior as the Hive Thriftserver code block. @steveloughran added the kerberos code and he added his comments above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about the specific log policy here; but I do think using reflection to get a private stuff is dangerous. I know it happens a lot in this code but at some point it'd be nice to move it. you don't need to use reflection to get the same log as a parent, just go
LoggerFactory.getLog("org.apache.hive.thrift...whatever").info(s"something $key")
This is cleaner and not going to break if hive ever change logging frameworks.
|
Test build #69034 has finished for PR 15594 at commit
|
|
Hi @cmirash, is it still active? |
## What changes were proposed in this pull request? This PR proposes to close PRs ... - inactive to the review comments more than a month - WIP and inactive more than a month - with Jenkins build failure but inactive more than a month - suggested to be closed and no comment against that - obviously looking inappropriate (e.g., Branch 0.5) To make sure, I left a comment for each PR about a week ago and I could not have a response back from the author in these PRs below: Closes apache#11129 Closes apache#12085 Closes apache#12162 Closes apache#12419 Closes apache#12420 Closes apache#12491 Closes apache#13762 Closes apache#13837 Closes apache#13851 Closes apache#13881 Closes apache#13891 Closes apache#13959 Closes apache#14091 Closes apache#14481 Closes apache#14547 Closes apache#14557 Closes apache#14686 Closes apache#15594 Closes apache#15652 Closes apache#15850 Closes apache#15914 Closes apache#15918 Closes apache#16285 Closes apache#16389 Closes apache#16652 Closes apache#16743 Closes apache#16893 Closes apache#16975 Closes apache#17001 Closes apache#17088 Closes apache#17119 Closes apache#17272 Closes apache#17971 Added: Closes apache#17778 Closes apache#17303 Closes apache#17872 ## How was this patch tested? N/A Author: hyukjinkwon <[email protected]> Closes apache#18017 from HyukjinKwon/close-inactive-prs.
What changes were proposed in this pull request?
Spark Thriftserver when running in HTTP mode with Kerberos enabled gives a 401 authentication error when receiving beeline HTTP request (with end user as kerberos principal). The similar command works with Hive Thriftserver.
What we find is Hive thriftserver CLI service creates both hive service and SPNego principal when kerberos is enabled whereas Spark Thriftserver only creates hive service principal.
CLIService.java
SparkSQLCLIService.scala
The patch will add missing SPNego principal to Spark Thriftserver.
How was this patch tested?
Ran manual testing with beeline command through spark against kerberized cluster.
Ran Spark unit tests for hive, sql and catalyst.