Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Jul 25, 2020

What changes were proposed in this pull request?

DAGSchedulerSuite exists some issue:
afterEach and init are called when the SparkConf of the default SparkContext has no configuration that the test case must set. This causes the SparkContext initialized in beforeEach to be discarded without being used, resulting in waste. On the other hand, the flexibility to add configurations to SparkConf should be addressed by the test framework.

Test suites inherits LocalSparkContext can be simplified.

Why are the changes needed?

Reduce overhead about init SparkContext.
Rewrite the test framework to support apply specified spark configurations.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

Jenkins test.

@beliefer
Copy link
Contributor Author

cc @jiangxb1987 @Ngone51

@SparkQA
Copy link

SparkQA commented Jul 25, 2020

Test build #126523 has finished for PR 29228 at commit 14de6c2.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait SparkConfHelper

@SparkQA
Copy link

SparkQA commented Jul 25, 2020

Test build #126524 has finished for PR 29228 at commit acb4e80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

trait SparkConfHelper {

/** Sets all configurations specified in `pairs`, calls `init`, and then calls `testFun` */
protected def withSparkConf(pairs: (String, String)*)(testFun: SparkConf => Any): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparkConfHelper.withSparkConf seems only used DAGSchedulerSuite. Let's don't make a trait for now but add it into DAGSchedulerSuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SparkConfHelper.withSparkConf will be used in other suite like TaskSchedulerImplSuiteTaskSetManagerSuite.

@jiangxb1987
Copy link
Contributor

It would be really great if you can list the test cases/suites that could get simplified by this change, thanks!

@beliefer
Copy link
Contributor Author

It would be really great if you can list the test cases/suites that could get simplified by this change, thanks!

I added them to the description of this PR.

testWithSparkConf(testName, testTags: _*)()(testFun)(pos)
}

private def testWithSparkConf(testName: String, testTags: Tag*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, you are doing the hack here because you need to modify the SparkConf within beforeEach (between super.beforeEach() and init()). In other words, you don't need to call testWithSparkConf instead of test, if you don't do extra initialization at the beforeEach() stage. Thus, this change is necessarily useful for those test suites you listed, right?

A more staightforward idea would likely to be having a withConfig(pairs: (String, String)*) method to create a new SparkConf with the specified configuration values? The idea doesn't simplify DAGSchedulerSuite that much, because you still need to first stop the SparkContext then call init() with your own SparkConf, but at least it's not worse than the current approach, and more easy to understand and to be reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your first question, yes, it is.
For your second question, withConfig(pairs: (String, String)*) mean to stop the SparkContext and then call init(). So I created the function testWithSparkConf which avoid to stop the SparkContext then call init().

Copy link
Member

@Ngone51 Ngone51 Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @jiangxb1987 . There're only 7 places using the testWithSparkConf() comparing to other 81 tests within the DAGSchedulerSuite. We could just call init inside each test to ease other tests who needs to call afterEach inside the test.(Sorry, just realized that it's actually the same with current implementation)

And I have another idea for the whole thing. That is, we could probably initialize the SparkContext like this:

trait LocalSparkContext ... {
  @transient private var _sc: SparkContext = _
   private val conf = new SparkConf()

  def sc: SparkContext = {
      if (_sc == null) {
        _sc = new SparkContext(conf)
      }
      _sc
  }

  def withConf(pairs: (String, String)*) = {
    if (_sc != null) { 
       // probably, log warning when SparkContext already initialized
       // since configurations won't take effect in this case
     }
     paris.foreach { case (k, v) => conf.set(k, v) }
  }

  override def afterEach(): Unit = {
    try {
      resetSparkContext()
    } finally {
      super.afterEach()
    }
  }

  def resetSparkContext(): Unit = {
    LocalSparkContext.stop(sc)
    ResourceProfile.clearDefaultProfile()
    sc = null
  }
}

class DAGSchedulerSuite extends LocalSparkContext ... {
  private var firstInit: Boolean = _

  override def beforeEach: Unit = {
    super.beforeEach()
    firstInit = true
  }  

  override def sc: SparkContext = {
     val sc = super.sc()
     if (firstInit) {
       init(sc)
       firstInit = false
     }
     sc
  }
}

@beliefer @jiangxb1987 WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the above proposal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will implement this proposal.

@SparkQA
Copy link

SparkQA commented Aug 19, 2020

Test build #127645 has finished for PR 29228 at commit a1fb471.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLimits

@SparkQA
Copy link

SparkQA commented Aug 19, 2020

Test build #127646 has finished for PR 29228 at commit c95f004.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 20, 2020

Test build #127669 has finished for PR 29228 at commit b258649.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 20, 2020

Test build #127672 has finished for PR 29228 at commit 69790ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


/** Manages a local `sc` `SparkContext` variable, correctly stopping it after each test. */
trait LocalSparkContext extends BeforeAndAfterEach with BeforeAndAfterAll { self: Suite =>
trait LocalSparkContext extends Logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually put Logging at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment on lines 51 to 52
logWarning("Because SparkContext already initialized, " +
"since configurations won't take effect in this case.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: These configurations ${paris.mkString(", ")} won't take effect since the SparkContext has been already initialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

super.beforeEach()
init(new SparkConf())
firstInit = true
setConf("spark.master" -> "local[2]", "spark.app.name" -> "DAGSchedulerSuite")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better to expose the conf via a function to set the conf. setConf is designed to be used by the test only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

trait LocalSparkContext extends Logging
with BeforeAndAfterEach with BeforeAndAfterAll { self: Suite =>

private var _conf: SparkConf = new SparkConf()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SparkConf should have the default values for master and appName. So test suites extend it could use the SparkContext directly without any specific configurations when the test doesn't really care.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Aug 20, 2020

Test build #127686 has finished for PR 29228 at commit bd9b85f.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 21, 2020

Test build #127732 has finished for PR 29228 at commit c7e5e7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

_sc
}

def withConf(pairs: (String, String)*): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe like this?

def withConf(pairs: (String, String)*)(f: => Unit) = {
  try f finally {
     // reset conf here
  }
}

If so, we don't need to create the new SparkConf for each test.

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127827 has finished for PR 29228 at commit 86dc8f8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127831 has finished for PR 29228 at commit 86dc8f8.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127837 has finished for PR 29228 at commit 86dc8f8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member

Ngone51 commented Aug 25, 2020

LGTM. BTW, could you update the PR description? You may not need to list all the test suites with the latest change but just say test suites inherits LocalSparkContext can be simplified.

@jiangxb1987 Could you also take a look?

@beliefer
Copy link
Contributor Author

@Ngone51 I updated the description of this PR.

* After migrating all test suites that use [[LocalSparkContext]] to use [[LocalSC]], we will
* delete the original [[LocalSparkContext]] and rename [[LocalSC]] to [[LocalSparkContext]].
*/
trait LocalSC extends BeforeAndAfterEach
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this class is only used for temporary purpose, can we name it as TempLocalSparkContext ? TBH I don't like the SC name which is very vague to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@jiangxb1987
Copy link
Contributor

LGTM otherwise

@SparkQA
Copy link

SparkQA commented Aug 29, 2020

Test build #128018 has finished for PR 29228 at commit 1029d26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with TimeLimits

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 2, 2020

Test build #128190 has finished for PR 29228 at commit 1029d26.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with TimeLimits

@beliefer
Copy link
Contributor Author

beliefer commented Sep 3, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Sep 3, 2020

Test build #128231 has finished for PR 29228 at commit 1029d26.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with TimeLimits

@beliefer
Copy link
Contributor Author

beliefer commented Sep 3, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Sep 3, 2020

Test build #128234 has finished for PR 29228 at commit 1029d26.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with TimeLimits

@beliefer
Copy link
Contributor Author

beliefer commented Sep 3, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Sep 3, 2020

Test build #128244 has finished for PR 29228 at commit 1029d26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with TimeLimits

@beliefer
Copy link
Contributor Author

beliefer commented Sep 9, 2020

cc @jiangxb1987

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2020

Test build #128470 has finished for PR 29228 at commit 1029d26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with TimeLimits

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants