Skip to content

Conversation

@gaborgsomogyi
Copy link
Contributor

@gaborgsomogyi gaborgsomogyi commented Jan 29, 2019

What changes were proposed in this pull request?

HadoopDelegationTokenProvider has basically the same functionality just like ServiceCredentialProvider so the interfaces can be merged.

YARNHadoopDelegationTokenManager now loads ServiceCredentialProviders in one step. The drawback of this if one provider fails all others are not loaded. HadoopDelegationTokenManager loads HadoopDelegationTokenProviders independently so it provides more robust behaviour.

In this PR I've I've made the following changes:

  • Deleted YARNHadoopDelegationTokenManager and ServiceCredentialProvider
  • Made HadoopDelegationTokenProvider a @DeveloperApi

How was this patch tested?

Existing unit tests.

@SparkQA
Copy link

SparkQA commented Jan 29, 2019

Test build #101824 has finished for PR 23686 at commit 94fbdce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

cc @vanzin

@vanzin
Copy link
Contributor

vanzin commented Jan 30, 2019

I'm starting to question whether we should keep this interface. It's in the yarn module, which means it's not considered public by Spark, which leads to issues (such as we don't check for compatibility, and it's been broken in the past). Pinging @mridulm for comments.

It might be a better idea to perhaps make the core interface a @DeveloperApi instead of keeping this one around and never really checking that it works properly across releases.

@mridulm
Copy link
Contributor

mridulm commented Jan 30, 2019

@vanzin We do not have an alternative for custom credential providers without this, do we ?

We could expose HadoopDelegationTokenManager as DeveloperApi, add support for loading custom providers via ServiceLoader there (roll this fix into that class) and remove YARNHadoopDelegationTokenManager for 3.0 (all existing providers extend from HadoopDelegationTokenManager instead of YARNHadoopDelegationTokenManager).

If that is the proposal, I am fine with that.

@gaborgsomogyi
Copy link
Contributor Author

Presume @vanzin meant make HadoopDelegationTokenProvider a @DeveloperApi. In general I support this idea, only one question. Here you've mentioned ServiceLoader had problems with non-private to Spark API's, won't be a problem here(don't know the context)?

@mridulm All in all I think it's good idea to move the custom provider load support into HadoopDelegationTokenManager. In order to do this SPARK-26766 has to be resolved first(another reason why move YarnSparkHadoopUtil.hadoopFSsToAccess). What I not yet caught why HadoopDelegationTokenManager should be @DeveloperApi? Presume only HadoopDelegationTokenProvider has to be exposed, hasn't it?

@vanzin
Copy link
Contributor

vanzin commented Jan 30, 2019

I don't think the manager needs to be exposed, just the provider interface. That's the same thing on the YARN side today.

BTW Gabor has already done all the rest of the work to make this possible, the only thing missing is actually making the interface public.

@mridulm
Copy link
Contributor

mridulm commented Jan 30, 2019

Sorry, I meant HadoopDelegationTokenProvider and not HadoopDelegationTokenManager - copy paste error !

@vanzin
Copy link
Contributor

vanzin commented Feb 11, 2019

So have we reached a consensus to drop this and expose the interface in core as a developer API?

@gaborgsomogyi
Copy link
Contributor Author

From my perspective yes but until now not yet reached to continue this PR.

@SparkQA
Copy link

SparkQA commented Feb 12, 2019

Test build #102254 has finished for PR 23686 at commit 4d89133.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • sealed trait MLEvent extends SparkListenerEvent
  • case class TransformStart() extends MLEvent
  • case class TransformEnd() extends MLEvent
  • case class FitStart[M <: Model[M]]() extends MLEvent
  • case class FitEnd[M <: Model[M]]() extends MLEvent
  • case class LoadInstanceStart[T](path: String) extends MLEvent
  • case class LoadInstanceEnd[T]() extends MLEvent
  • case class SaveInstanceStart(path: String) extends MLEvent
  • case class SaveInstanceEnd(path: String) extends MLEvent
  • class _DynamicModuleFuncGlobals(dict):
  • class LogisticRegressionModel(JavaModel, JavaClassificationModel, JavaMLWritable, JavaMLReadable,
  • class GaussianMixtureModel(JavaModel, JavaMLWritable, JavaMLReadable, HasTrainingSummary):
  • class KMeansModel(JavaModel, GeneralJavaMLWritable, JavaMLReadable, HasTrainingSummary):
  • class BisectingKMeansModel(JavaModel, JavaMLWritable, JavaMLReadable, HasTrainingSummary):
  • class MulticlassClassificationEvaluator(JavaEvaluator, HasLabelCol, HasPredictionCol, HasWeightCol,
  • class LinearRegressionModel(JavaModel, JavaPredictionModel, GeneralJavaMLWritable, JavaMLReadable,
  • class HasTrainingSummary(object):
  • class FileBatchWrite(
  • abstract class FileWriteBuilder(options: DataSourceOptions)
  • case class FileWriterFactory (
  • class OrcWriteBuilder(options: DataSourceOptions) extends FileWriteBuilder(options)

* Removed ServiceCredentialProvider
* Removed YARNHadoopDelegationTokenManager
* Made HadoopDelegationTokenProvider DeveloperAPI
@vanzin
Copy link
Contributor

vanzin commented Feb 12, 2019

I see that you implemented the "make the core interface public" approach and deleted the YARN version; but now the PR description and the bug don't really reflect the code anymore.

@SparkQA
Copy link

SparkQA commented Feb 12, 2019

Test build #102258 has finished for PR 23686 at commit 469f13d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi gaborgsomogyi changed the title [SPARK-26772][YARN] YARNHadoopDelegationTokenManager loads ServiceCredentialProviders independently [SPARK-26772][YARN] Delete ServiceCredentialProvider and make HadoopDelegationTokenProvider a developer API Feb 13, 2019
@gaborgsomogyi
Copy link
Contributor Author

gaborgsomogyi commented Feb 13, 2019

the PR description and the bug don't really reflect the code anymore.

Yeah, updated.

@SparkQA
Copy link

SparkQA commented Feb 13, 2019

Test build #102293 has finished for PR 23686 at commit 332a632.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

Retest this please

@SparkQA
Copy link

SparkQA commented Feb 13, 2019

Test build #102303 has finished for PR 23686 at commit 332a632.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Feb 13, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Feb 14, 2019

Test build #102309 has finished for PR 23686 at commit 332a632.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 14, 2019

Test build #102331 has finished for PR 23686 at commit 332a632.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 14, 2019

Test build #102338 has finished for PR 23686 at commit 332a632.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 14, 2019

Test build #102359 has finished for PR 23686 at commit 73f1dd9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Feb 15, 2019

Merging to master.

@vanzin vanzin closed this in 28ced38 Feb 15, 2019
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…elegationTokenProvider a developer API

## What changes were proposed in this pull request?

`HadoopDelegationTokenProvider` has basically the same functionality just like `ServiceCredentialProvider` so the interfaces can be merged.

`YARNHadoopDelegationTokenManager` now loads `ServiceCredentialProvider`s in one step. The drawback of this if one provider fails all others are not loaded. `HadoopDelegationTokenManager` loads `HadoopDelegationTokenProvider`s independently so it provides more robust behaviour.

In this PR I've I've made the following changes:
* Deleted `YARNHadoopDelegationTokenManager` and `ServiceCredentialProvider`
* Made `HadoopDelegationTokenProvider` a `DeveloperApi`

## How was this patch tested?

Existing unit tests.

Closes apache#23686 from gaborgsomogyi/SPARK-26772.

Authored-by: Gabor Somogyi <[email protected]>
Signed-off-by: Marcelo Vanzin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants