Skip to content
Closed
Changes from 12 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
cdef539
Merge pull request #1 from apache/master
YanTangZhai Aug 6, 2014
cbcba66
Merge pull request #3 from apache/master
YanTangZhai Aug 20, 2014
8a00106
Merge pull request #6 from apache/master
YanTangZhai Sep 12, 2014
03b62b0
Merge pull request #7 from apache/master
YanTangZhai Sep 16, 2014
76d4027
Merge pull request #8 from apache/master
YanTangZhai Oct 20, 2014
d26d982
Merge pull request #9 from apache/master
YanTangZhai Nov 4, 2014
e249846
Merge pull request #10 from apache/master
YanTangZhai Nov 11, 2014
6e643f8
Merge pull request #11 from apache/master
YanTangZhai Dec 1, 2014
92242c7
Update HiveQl.scala
YanTangZhai Dec 2, 2014
74175b4
Update HiveQuerySuite.scala
YanTangZhai Dec 3, 2014
950b21e
Update HiveQuerySuite.scala
YanTangZhai Dec 3, 2014
718afeb
Merge pull request #12 from apache/master
YanTangZhai Dec 5, 2014
59e4de9
make hive test
marmbrus Dec 17, 2014
1893956
Merge pull request #14 from marmbrus/pr/3555
YanTangZhai Dec 18, 2014
bd2c444
Update HiveQuerySuite.scala
YanTangZhai Dec 22, 2014
efc4210
Update HiveQuerySuite.scala
YanTangZhai Dec 22, 2014
1e1ebb4
Update HiveQuerySuite.scala
YanTangZhai Dec 22, 2014
e4c2c0a
Merge pull request #15 from apache/master
YanTangZhai Dec 24, 2014
5601a8b
[SPARK-4961] [CORE] Put HadoopRDD.getPartitions forward to reduce DAG…
YanTangZhai Dec 25, 2014
af5abda
[SPARK-4961] [CORE] Put HadoopRDD.getPartitions forward to reduce DAG…
YanTangZhai Dec 25, 2014
6e95955
[SPARK-4961] [CORE] Put HadoopRDD.getPartitions forward to reduce DAG…
YanTangZhai Dec 30, 2014
d4bca32
Merge pull request #19 from apache/master
YanTangZhai Dec 31, 2014
5e3ef70
Merge branch 'SPARK-4692'
YanTangZhai Dec 31, 2014
fd87518
[SPARK-4961] [CORE] Put HadoopRDD.getPartitions forward to reduce DAG…
YanTangZhai Dec 31, 2014
74c1dec
update HadoopRDD.scala
YanTangZhai Dec 31, 2014
5041b35
Merge pull request #24 from apache/master
YanTangZhai Jan 12, 2015
09afdff
[SPARK-4961] [CORE] Put HadoopRDD.getPartitions forward to reduce DAG…
YanTangZhai Jan 14, 2015
b535a53
[SPARK-4961] [CORE] Put HadoopRDD.getPartitions forward to reduce DAG…
YanTangZhai Jan 19, 2015
e2880f9
Merge pull request #27 from apache/master
YanTangZhai Jan 19, 2015
aed530b
[SPARK-4961] [CORE] Put HadoopRDD.getPartitions forward to reduce DAG…
YanTangZhai Jan 20, 2015
5b27571
Merge pull request #28 from apache/master
YanTangZhai Jan 24, 2015
267e375
Merge branch 'master' into SPARK-4961
YanTangZhai Jan 24, 2015
e2c2494
Merge branch 'master' of https://github.com/YanTangZhai/spark
YanTangZhai Jan 24, 2015
572079b
update
YanTangZhai Jan 24, 2015
ae7c139
update
YanTangZhai Jan 24, 2015
d5c0e84
update
YanTangZhai Jan 24, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,8 @@ class HadoopRDD[K, V](
newInputFormat
}

override def getPartitions: Array[Partition] = {
protected def getThesePartitions() : Array[Partition] = {
val start = System.nanoTime
val jobConf = getJobConf()
// add the credentials here as this can be called before SparkContext initialized
SparkHadoopUtil.get.addCredentials(jobConf)
Expand All @@ -203,9 +204,27 @@ class HadoopRDD[K, V](
for (i <- 0 until inputSplits.size) {
array(i) = new HadoopPartition(id, i, inputSplits(i))
}
logDebug("Get these partitions took %f s".format((System.nanoTime - start) / 1e9))
array
}

@transient private var thesePartitions_ : Array[Partition] = {
try {
getThesePartitions()
} catch {
case e: Exception =>
logDebug("Error initializing HadoopRDD's partitions", e)
null

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This comment is kind of moot since I proposed a more general fix in a top-level comment, but I'll still post it anyways:)

I don't think that logging an exception at debug level then returning null is a good error-handling strategy; this is likely to cause a confusing NPE somewhere else with no obvious cause since most users won't have debug-level logging enabled.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the fix in this patch is to force partitions to be eagerly-computed in the driver thread that defines the RDD. This seems like a good idea

How would this interact with the idea of @erikerlandson to defer partition computation?
#3079

}
}

override def getPartitions: Array[Partition] = {
if (thesePartitions_ == null) {
thesePartitions_ = getThesePartitions()
}
thesePartitions_
}

override def compute(theSplit: Partition, context: TaskContext): InterruptibleIterator[(K, V)] = {
val iter = new NextIterator[(K, V)] {

Expand Down