[SPARK-5374][CORE] abstract RDD's DAG graph iteration in DAGScheduler by cloud-fan · Pull Request #4134 · apache/spark

cloud-fan · 2015-01-21T08:39:00Z

There are many methods in DAGScheduler that iterate an RDD's DAG graph such as getParentStages, getMissingParentStages and so on. We should abstract this process to reduce code size.

lianhuiwang · 2015-01-21T12:18:11Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

rdd in getMissingParentStages is not always stage's rdd. stage has a sequence of rdds and stage's rdd is child of stage. if stage's rdd at front of child rdd is cached, this line cannot filter this stage. but getMissingParentStages can complete it.

cloud-fan · 2015-01-23T03:54:50Z

ping @JoshRosen

JoshRosen · 2015-01-23T06:13:25Z

I'm pretty busy with other work at the moment, so it'll be a little while before I can actually review this, but I'd be glad to let Jenkins test it to see whether it uncovers any problems (like I hit in my original patch).

Jenkins, this is ok to test.

rxin · 2015-01-23T08:07:52Z

Thanks for doing it. I took a quick look at this. While it does reduce the LOC, I feel the change is not necessary and actually makes the code harder to understand with the closures. Do we really want something like this?

markhamstra · 2015-01-23T15:56:50Z

I'll take a deeper look over the weekend, but on a first pass I had a similar reaction to @rxin -- I'm not seeing a lot of benefit in terms of code clarity or maintainability, and we tend to avoid making changes to the DAGScheduler that don't offer significant benefits.

AmplabJenkins · 2015-04-27T18:22:16Z

Can one of the admins verify this patch?

cloud-fan · 2015-04-28T01:00:10Z

Closing this one, will do a more meaningful DAGScheduler refactor later.

Clean up DAGScheduler getMissingParentStages / stageDependsOn methods

93fa332

lianhuiwang reviewed Jan 21, 2015
View reviewed changes

cloud-fan added 2 commits January 23, 2015 10:19

roll back getMissingParentStages and abstract RDD graph iteration

ebc35e1

roll back stageDependsOn

53d44f4

cloud-fan force-pushed the 4654 branch from 9aca376 to 53d44f4 Compare January 23, 2015 03:12

cloud-fan changed the title ~~[SPARK-4654][CORE] Clean up DAGScheduler getMissingParentStages / stageDependsOn methods~~ [SPARK-5374][CORE] abstract RDD's DAG graph iteration in DAGScheduler Jan 23, 2015

cloud-fan closed this Apr 28, 2015

JoshRosen mentioned this pull request Jun 13, 2016

[SPARK-15927] Eliminate redundant DAGScheduler code. #13646

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5374][CORE] abstract RDD's DAG graph iteration in DAGScheduler#4134

[SPARK-5374][CORE] abstract RDD's DAG graph iteration in DAGScheduler#4134
cloud-fan wants to merge 3 commits intoapache:masterfrom
cloud-fan:4654

cloud-fan commented Jan 21, 2015

Uh oh!

lianhuiwang Jan 21, 2015

Uh oh!

cloud-fan commented Jan 23, 2015

Uh oh!

JoshRosen commented Jan 23, 2015

Uh oh!

rxin commented Jan 23, 2015

Uh oh!

markhamstra commented Jan 23, 2015

Uh oh!

AmplabJenkins commented Apr 27, 2015

Uh oh!

cloud-fan commented Apr 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

cloud-fan commented Jan 21, 2015

Uh oh!

lianhuiwang Jan 21, 2015

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jan 23, 2015

Uh oh!

JoshRosen commented Jan 23, 2015

Uh oh!

rxin commented Jan 23, 2015

Uh oh!

markhamstra commented Jan 23, 2015

Uh oh!

AmplabJenkins commented Apr 27, 2015

Uh oh!

cloud-fan commented Apr 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants