@@ -373,6 +373,15 @@ class JavaSparkContext(val sc: SparkContext)
373373 * other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
374374 * etc).
375375 *
376+ * @param conf JobConf for setting up the dataset. Note: This will be put into a Broadcast.
377+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
378+ * sure you won't modify the conf. A safe approach is always creating a new conf for
379+ * a new RDD.
380+ * @param inputFormatClass Class of the InputFormat
381+ * @param keyClass Class of the keys
382+ * @param valueClass Class of the values
383+ * @param minPartitions Minimum number of Hadoop Splits to generate.
384+ *
376385 * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
377386 * record, directly caching the returned RDD will create many references to the same object.
378387 * If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -395,6 +404,14 @@ class JavaSparkContext(val sc: SparkContext)
395404 * Get an RDD for a Hadoop-readable dataset from a Hadooop JobConf giving its InputFormat and any
396405 * other necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable,
397406 *
407+ * @param conf JobConf for setting up the dataset. Note: This will be put into a Broadcast.
408+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
409+ * sure you won't modify the conf. A safe approach is always creating a new conf for
410+ * a new RDD.
411+ * @param inputFormatClass Class of the InputFormat
412+ * @param keyClass Class of the keys
413+ * @param valueClass Class of the values
414+ *
398415 * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
399416 * record, directly caching the returned RDD will create many references to the same object.
400417 * If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -476,6 +493,14 @@ class JavaSparkContext(val sc: SparkContext)
476493 * Get an RDD for a given Hadoop file with an arbitrary new API InputFormat
477494 * and extra configuration options to pass to the input format.
478495 *
496+ * @param conf Configuration for setting up the dataset. Note: This will be put into a Broadcast.
497+ * Therefore if you plan to reuse this conf to create multiple RDDs, you need to make
498+ * sure you won't modify the conf. A safe approach is always creating a new conf for
499+ * a new RDD.
500+ * @param fClass Class of the InputFormat
501+ * @param kClass Class of the keys
502+ * @param vClass Class of the values
503+ *
479504 * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable object for each
480505 * record, directly caching the returned RDD will create many references to the same object.
481506 * If you plan to directly cache Hadoop writable objects, you should first copy them using
@@ -675,6 +700,9 @@ class JavaSparkContext(val sc: SparkContext)
675700
676701 /**
677702 * Returns the Hadoop configuration used for the Hadoop code (e.g. file systems) we reuse.
703+ *
704+ * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you
705+ * plan to set some global configurations for all Hadoop RDDs.
678706 */
679707 def hadoopConfiguration (): Configuration = {
680708 sc.hadoopConfiguration
0 commit comments