Skip to content

Conversation

@jhu-chang
Copy link
Contributor

Add a transient flag DStream.restoredFromCheckpointData to control the restore processing in DStream to avoid duplicate works: check this flag first in DStream.restoreCheckpointData, only when false, the restore process will be executed.

@zsxwing
Copy link
Member

zsxwing commented Dec 11, 2015

I don't get this one. If DStream.restoreCheckpointData is not called, how to recovery the checkpoint data?

@jhu-chang
Copy link
Contributor Author

@zsxwing
The DStream.restoreCheckpointData will be called only once in this PR no matter how many output paths it is in, and controlled by the new added transient variable restoredFromCheckpointData

@zsxwing
Copy link
Member

zsxwing commented Dec 11, 2015

I see. Could you resolve the conflicts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be private

@jhu-chang
Copy link
Contributor Author

@zsxwing
Fix the conflicts and move the restoredFromCheckpointData to private

@zsxwing
Copy link
Member

zsxwing commented Dec 14, 2015

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Dec 14, 2015

Test build #47660 has started for PR 9765 at commit ac87951.

@andrewor14
Copy link
Contributor

retest this please

@shaneknapp
Copy link
Contributor

jenkins test this plese

@shaneknapp
Copy link
Contributor

jenkins test this please

@SparkQA
Copy link

SparkQA commented Dec 14, 2015

Test build #47672 has finished for PR 9765 at commit ac87951.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest writing a simpler test that only tests this issue, such as:

    withStreamingContext(new StreamingContext(conf, batchDuration)) { ssc =>
      ssc.checkpoint(checkpointDir)
      val inputDStream = new CheckpointInputDStream(ssc)
      val checkpointData = inputDStream.checkpointData
      val mappedDStream = inputDStream.map(_ + 100)
      val outputStream = new TestOutputStreamWithPartitions(mappedDStream)
      outputStream.register()
      /// do more two times output
      mappedDStream.foreachRDD(rdd => rdd.count())
      mappedDStream.foreachRDD(rdd => rdd.count())
      assert(checkpointData.restoredTimes === 0)
      val batchDurationMillis = ssc.progressListener.batchDuration
      generateOutput(ssc, Time(batchDurationMillis * 3), checkpointDir, stopSparkContext = false)
      assert(checkpointData.restoredTimes === 0)
    }
    withStreamingContext(new StreamingContext(checkpointDir)) { ssc =>
      val checkpointData =
        ssc.graph.getInputStreams().head.asInstanceOf[CheckpointInputDStream].checkpointData
      ssc.start()
      ssc.stop()
      assert(checkpointData.restoredTimes === 1)
    }

@jhu-chang
Copy link
Contributor Author

retest this please

1 similar comment
@andrewor14
Copy link
Contributor

retest this please

@andrewor14
Copy link
Contributor

Looks good, merging into master once tests pass.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 15, 2015

Test build #47752 has finished for PR 9765 at commit dc38d0f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be TestOutputStreamWithPartitions if using generateOutput.

@jhu-chang
Copy link
Contributor Author

@zsxwing
Thanks for the comments. I should run the test on local, but due the bad connection issues, I still can not get the full dependency jars from maven after merged master. Hope this time, the unit test will be ok...

@jhu-chang
Copy link
Contributor Author

retest this please

1 similar comment
@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 16, 2015

Test build #47832 has finished for PR 9765 at commit 625e20a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this unused line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove outputBuffer and change val outputStream = new TestOutputStreamWithPartitions(mappedDStream, outputBuffer) to val outputStream = new TestOutputStreamWithPartitions(mappedDStream) since the output buffer is not necessary now.

@zsxwing
Copy link
Member

zsxwing commented Dec 16, 2015

Just two nits. Otherwise LGTM

@jhu-chang
Copy link
Contributor Author

@zsxwing
Thanks for your comments, could you help to check this again?

@zsxwing
Copy link
Member

zsxwing commented Dec 17, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Dec 17, 2015

Test build #47936 has finished for PR 9765 at commit 8131033.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

retest this please?

@SparkQA
Copy link

SparkQA commented Dec 17, 2015

Test build #47942 has finished for PR 9765 at commit 8131033.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jhu-chang
Copy link
Contributor Author

OOM happens in the PySpark, Env issue?

@zsxwing
Copy link
Member

zsxwing commented Dec 18, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Dec 18, 2015

Test build #47966 has finished for PR 9765 at commit 8131033.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Dec 18, 2015

@jhu-chang thanks, merging to master and 1.6

asfgit pushed a commit that referenced this pull request Dec 18, 2015
…en recovering from checkpoint data

Add a transient flag `DStream.restoredFromCheckpointData` to control the restore processing in DStream to avoid duplicate works:  check this flag first in `DStream.restoreCheckpointData`, only when `false`, the restore process will be executed.

Author: jhu-chang <[email protected]>

Closes #9765 from jhu-chang/SPARK-11749.

(cherry picked from commit f4346f6)
Signed-off-by: Shixiong Zhu <[email protected]>
@asfgit asfgit closed this in f4346f6 Dec 18, 2015
@jhu-chang jhu-chang deleted the SPARK-11749 branch January 5, 2016 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants