Skip to content

Conversation

@redsanket
Copy link

I observed this while running a oozie job trying to connect to hbase via spark.
It look like the creds are not being passed in thehttps://github.com/apache/spark/blob/branch-2.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/security/HadoopFSCredentialProvider.scala#L53 for 2.2 release.
More Info as to why it fails on secure grid:
Oozie client gets the necessary tokens the application needs before launching. It passes those tokens along to the oozie launcher job (MR job) which will then actually call the Spark client to launch the spark app and pass the tokens along.
The oozie launcher job cannot get anymore tokens because all it has is tokens ( you can't get tokens with tokens, you need tgt or keytab).
The error here is because the launcher job runs the Spark Client to submit the spark job but the spark client doesn't see that it already has the hdfs tokens so it tries to get more, which ends with the exception.
There was a change with SPARK-19021 to generalize the hdfs credentials provider that changed it so we don't pass the existing credentials into the call to get tokens so it doesn't realize it already has the necessary tokens.

https://issues.apache.org/jira/browse/SPARK-21890
Modified to pass creds to get delegation tokens

@vanzin
Copy link
Contributor

vanzin commented Sep 5, 2017

ok to test

// those will fail with an access control issue. So create new tokens with the logged in
// user as renewer.
val creds = fetchDelegationTokens(
val fetchCreds = fetchDelegationTokens(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here the diff in spark2.2 and master
=> is missing PRINCPAL(aka spark.yarn.principal) config. Not sure if we need to do this now. Let me know your opinion @vanzin @tgravescs

sparkConf.get(PRINCIPAL).flatMap { renewer =>
val creds = new Credentials()
hadoopFSsToAccess(hadoopConf, sparkConf).foreach { dst =>
val dstFs = dst.getFileSystem(hadoopConf)
dstFs.addDelegationTokens(renewer, creds)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code was in getTokenRenewalInterval; that call is only needed when principal and keytab are provided, so adding the code back should be ok. It shouldn't cause any issues if it's not there, though, aside from a wasted round trip to the NNs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to not call it if we don't need to so as long as adding the config back doesn't mess with the mesos side of things (since this is now common code) I think that would be good. the PRINCIPAL config is yarn specific config, but looking at SparkSubmit it appears to be using for mesos as well.

@vanzin do you happen to know if mesos is using that as well, I haven't kept up with mesos kerberos support. so not sure if more is going to happen there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure Mesos is not currently hooked up to the principal / keytab stuff. It just picks up the initial delegation token set, and when those expire, things stop working.

Adding the check back here is the right thing; it shouldn't affect Mesos when it adds support for principal / keytab (or if it does, it can be fixed at that time).

@redsanket
Copy link
Author

Previous discussion on this PR is here #19103

@SparkQA
Copy link

SparkQA commented Sep 5, 2017

Test build #81425 has finished for PR 19140 at commit 0cfca50.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 5, 2017

Test build #81426 has finished for PR 19140 at commit 5424972.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81431 has finished for PR 19140 at commit d72c08f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor

@redsanket can you please test this with a secure Hadoop environment using spark-submit (not Oozie), I don't want to bring in any regression here.

@redsanket
Copy link
Author

@jerryshao yes will do no issues thanks

@redsanket
Copy link
Author

Added principal check back and tested in secure hadoop env. Let me know if this looks fine with you @jerryshao @vanzin @tgravescs

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81471 has finished for PR 19140 at commit 075de5d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81474 has finished for PR 19140 at commit 1184a73.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Sep 6, 2017

LGTM pending tests.

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81477 has finished for PR 19140 at commit 98f0ff2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

+1

@tgravescs
Copy link
Contributor

didn't merge to branch 2.2, will handle under #19103

@asfgit asfgit closed this in b9ab791 Sep 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants