Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

WIP: execution replay #299

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

WIP: execution replay #299

wants to merge 8 commits into from

Conversation

nh13
Copy link
Member

@nh13 nh13 commented Jun 9, 2017

No description provided.

@nh13 nh13 force-pushed the nh_exec2_replay branch 5 times, most recently from d439a16 to 7c85a28 Compare June 9, 2017 07:50
@nh13
Copy link
Member Author

nh13 commented Jun 9, 2017

How to test:

  1. Run a pipeline
java -Xmx8G -jar dagr-0.2.0-SNAPSHOT.jar --report report.txt --experimental-execution SleepyPipeline -j -n 3 -s 42
  1. Ensure that the only dependency is Task 2 depends on task 0 by looking at the logging statements:
[2017/00/00 00:00:00 | SleepyPipeline | Info] Task 2 will depend on task 0
  1. Modify the execution_log.txt file to remove the last to succeeded tasks and save it as "replay.txt". It should look something like this:
#DEFINITION,SIMPLE_NAME,NAME,CODE,PARENT_CODE,CHILD_NUMBER
#RELATIONSHIP,PARENT_CODE,CHILD_CODE
#STATUS,TASK_CODE,STATUS_NAME,STATUS_ORDINAL
DEFINITION,SleepyPipeline,SleepyPipeline,2120598994,-1,0
STATUS,2120598994,Pending,0
STATUS,2120598994,Queued,1
DEFINITION,SleepInJvmTask,task-0,823563107,2120598994,0
RELATIONSHIP,2120598994,823563107
DEFINITION,SleepInJvmTask,task-1,600184819,2120598994,1
RELATIONSHIP,2120598994,600184819
DEFINITION,SleepInJvmTask,task-2,642647395,2120598994,2
RELATIONSHIP,2120598994,642647395
STATUS,2120598994,Running,3
STATUS,823563107,Pending,0
STATUS,600184819,Pending,0
STATUS,642647395,Pending,0
STATUS,823563107,Queued,1
STATUS,600184819,Queued,1
STATUS,823563107,Submitted,2
STATUS,600184819,Submitted,2
STATUS,600184819,Running,3
STATUS,823563107,Running,3
STATUS,600184819,SucceededExecution,10
STATUS,823563107,SucceededExecution,10
STATUS,642647395,Queued,1
STATUS,642647395,Submitted,2
STATUS,642647395,Running,3
  1. Now run it with the replay file:
java -Xmx8G -jar dagr-0.2.0-SNAPSHOT.jar --report report.txt --experimental-execution --replay-log replay.txt SleepyPipeline -j -n 3 -s 42
  1. Validate via the logging to stdout that the task-0 and task-1 are manually succeeded, while the SleepyPipeline and task-2 execute. The new execution_log.txt will look something like this:
#DEFINITION,SIMPLE_NAME,NAME,CODE,PARENT_CODE,CHILD_NUMBER
#RELATIONSHIP,PARENT_CODE,CHILD_CODE
#STATUS,TASK_CODE,STATUS_NAME,STATUS_ORDINAL
DEFINITION,SleepyPipeline,SleepyPipeline,1255049730,-1,0
STATUS,1255049730,Pending,0
STATUS,1255049730,Queued,1
DEFINITION,SleepInJvmTask,task-0,458297460,1255049730,0
RELATIONSHIP,1255049730,458297460
DEFINITION,SleepInJvmTask,task-1,1770706846,1255049730,1
RELATIONSHIP,1255049730,1770706846
DEFINITION,SleepInJvmTask,task-2,2060167275,1255049730,2
RELATIONSHIP,1255049730,2060167275
STATUS,1255049730,Running,3
STATUS,458297460,Pending,0
STATUS,1770706846,Pending,0
STATUS,2060167275,Pending,0
STATUS,458297460,Queued,1
STATUS,1770706846,Queued,1
STATUS,458297460,ManuallySucceeded,11
STATUS,1770706846,ManuallySucceeded,11
STATUS,2060167275,Queued,1
STATUS,2060167275,Submitted,2
STATUS,2060167275,Running,3
STATUS,2060167275,SucceededExecution,10
STATUS,1255049730,SucceededExecution,10

@codecov-io
Copy link

codecov-io commented Jun 9, 2017

Codecov Report

Merging #299 into master will increase coverage by 2.13%.
The diff coverage is 94.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #299      +/-   ##
==========================================
+ Coverage   91.38%   93.51%   +2.13%     
==========================================
  Files          31       46      +15     
  Lines        1137     1742     +605     
  Branches      106      127      +21     
==========================================
+ Hits         1039     1629     +590     
- Misses         98      113      +15
Impacted Files Coverage Δ
...e/src/main/scala/dagr/core/tasksystem/Linker.scala 100% <ø> (ø) ⬆️
...c/main/scala/dagr/core/tasksystem/Dependable.scala 72.72% <ø> (ø) ⬆️
...n/scala/dagr/core/tasksystem/SimpleInJvmTask.scala 100% <ø> (ø) ⬆️
...src/main/scala/dagr/core/tasksystem/UnitTask.scala 100% <ø> (ø) ⬆️
...src/main/scala/dagr/core/tasksystem/Pipeline.scala 94.73% <ø> (+5.26%) ⬆️
...rc/main/scala/dagr/core/config/Configuration.scala 98.38% <ø> (+1.56%) ⬆️
.../main/scala/dagr/core/tasksystem/Schedulable.scala 100% <ø> (ø) ⬆️
...main/scala/dagr/core/tasksystem/ShellCommand.scala 75% <ø> (ø) ⬆️
core/src/main/scala/dagr/core/exec/Scheduler.scala 100% <ø> (ø)
...ain/scala/dagr/core/execsystem2/TaskExecutor.scala 0% <0%> (ø)
... and 50 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19dbac9...d9d7caa. Read the comment docs.

@nh13 nh13 force-pushed the nh_exec2_replay branch from 0d5d71d to db43946 Compare June 10, 2017 19:10
@nh13
Copy link
Member Author

nh13 commented Jun 10, 2017

@tfenne I am very close to having replay supported in the "old" execsystem, but I am running out of time for today!

@nh13 nh13 force-pushed the nh_exec2_replay branch from db43946 to dddfbb9 Compare June 11, 2017 05:33
@nh13 nh13 mentioned this pull request Jun 11, 2017
@nh13 nh13 changed the base branch from nh_exec2 to master June 11, 2017 05:34
@nh13
Copy link
Member Author

nh13 commented Jun 11, 2017

No tests for replay but it works in my hands for both execution systems!

@nh13 nh13 force-pushed the nh_exec2_replay branch 4 times, most recently from 561451c to de947d5 Compare June 11, 2017 06:14
@nh13 nh13 mentioned this pull request Jun 17, 2017
58 tasks
@nh13
Copy link
Member Author

nh13 commented Jun 18, 2017

@tfenne I now have test coverage across all the code I want.

I was thinking we could do a multi-stage review if you are up for it? I see three main sections:

  1. The abstraction of various execution classes (dagr.core.exec), including a generic executor, task status, and classes for callbacks (ex. when a task status changes).
  2. The changes to the existing executor. A lot of these changes were made to be able to support both the existing and new executor.
  3. The new execution system.

Probably the order should be 3, 2, 1. It may be hard to do each entirely separately, but I think it should be possible with only a minimal look at the others while reviewing one. Thoughts on how to proceed?

@@ -95,7 +95,7 @@ lazy val commonSettings = Seq(
testOptions in Test += Tests.Argument(TestFrameworks.ScalaTest, "-h", Option(System.getenv("TEST_HTML_REPORTS")).getOrElse(htmlReportsDirectory)),
testOptions in Test += Tests.Argument("-l", "LongRunningTest"), // ignores long running tests
// uncomment for full stack traces
//testOptions in Test += Tests.Argument("-oD"),
testOptions in Test += Tests.Argument("-oDF"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: comment me

build.sbt Outdated
@@ -123,6 +123,7 @@ lazy val core = Project(id="dagr-core", base=file("core"))
"com.fulcrumgenomics" %% "commons" % "0.2.0-SNAPSHOT",
"com.fulcrumgenomics" %% "sopt" % "0.2.0-SNAPSHOT",
"com.github.dblock" % "oshi-core" % "3.3",
"com.beachape" %% "enumeratum" % "1.5.12",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: tabs -> spaces

case class Definition(simpleName: String, name: String, code: Int, parentCode: Int, childNumber: Int) extends ToStringIsSimpleNameUpperCase {
/** Returns true if the the other definition represents the same task as the this definition. Assumes that
* the parents have the same number of child tasks. */
def equivalent(that: Definition): Boolean = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfenne this is the where we determine two tasks are "equivalent" across executions. You'll likely find that it is too conservative, so any ideas would be helpful.

val equivalentChildren = currentChildDefinitions.exists { left =>
currentChildDefinitions.exists { right => left != right && left.equivalent(right) }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfenne this is the where we determine if a pipeline-like task (a task that returns other tasks) is equivalent across executions. We look for:

  1. The task must produce the same # of children.
  2. None of the children produced are "equivalent" to each other (i.e. one cannot be misinterpreted for the other across executions).
  3. Each child has one and only one equivalent "definition" from the previous execution, meaning for each child task we found one and only one potential matching task from the previous execution.

If the above are true, we evaluate whether or not to run the children separately (by their previous status), otherwise, we force all the children to run.

}
Logger.out = tmpOut
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add a test of a pipeline within a pipeline within a pipeline, such that the middle-level pipeline produces different pipelines on the second execution, and so the middle and bottom level pipelines are forced to execute.

nh13 added 3 commits July 3, 2017 10:56
Refactor task and exec system prior to new execution system.
This system can be used with the --experimental-execution option.
@nh13 nh13 force-pushed the nh_exec2_replay branch from 286b795 to d9d7caa Compare July 3, 2017 17:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants