Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,13 @@ object SparkHadoopUtil {

val SPARK_YARN_CREDS_COUNTER_DELIM = "-"

// Just load HdfsConfiguration into the class loader to add
// hdfs-site.xml as a default configuration file otherwise
// some HDFS related configurations doesn't ship to Executors and
// it can cause UnknownHostException when NameNode HA is enabled.
// See SPARK-11227 for more details.
Utils.classForName("org.apache.hadoop.hdfs.HdfsConfiguration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be loaded by reflection -- is it not in Hadoop 2.2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned, Hadoop 2.2 have HdfsConfiguration but are there any way to load HdfsConfiguration explicitly without reflection?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reference it in any way, but, I guess we should ask, what does classloading do that we need, and is there any way to do that directly? this is fairly indirect. Is it that Configuration.addDefaultResource("hdfs-site.xml"); must be called?

Copy link
Member Author

@sarutak sarutak Jun 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, calling Configuration.addDefaultResource("hdfs-site.xml") directly may be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I'd prefer not to reference hdfs-site.xml directly. HdfsConfiguration should know what it needs to load. For instance it also loads the defaults. HdfsConfiguration is also marked as @InterfaceAudience.Private so ideally we shouldn't be using it directly. Based on my other comments I would like to understand better why this isn't loaded on driver already.


/**
* Number of records to update input metrics when reading from HadoopRDDs.
*
Expand Down