-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations. #29228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #126523 has finished for PR 29228 at commit
|
|
Test build #126524 has finished for PR 29228 at commit
|
| trait SparkConfHelper { | ||
|
|
||
| /** Sets all configurations specified in `pairs`, calls `init`, and then calls `testFun` */ | ||
| protected def withSparkConf(pairs: (String, String)*)(testFun: SparkConf => Any): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkConfHelper.withSparkConf seems only used DAGSchedulerSuite. Let's don't make a trait for now but add it into DAGSchedulerSuite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SparkConfHelper.withSparkConf will be used in other suite like TaskSchedulerImplSuite、TaskSetManagerSuite.
|
It would be really great if you can list the test cases/suites that could get simplified by this change, thanks! |
I added them to the description of this PR. |
| testWithSparkConf(testName, testTags: _*)()(testFun)(pos) | ||
| } | ||
|
|
||
| private def testWithSparkConf(testName: String, testTags: Tag*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, you are doing the hack here because you need to modify the SparkConf within beforeEach (between super.beforeEach() and init()). In other words, you don't need to call testWithSparkConf instead of test, if you don't do extra initialization at the beforeEach() stage. Thus, this change is necessarily useful for those test suites you listed, right?
A more staightforward idea would likely to be having a withConfig(pairs: (String, String)*) method to create a new SparkConf with the specified configuration values? The idea doesn't simplify DAGSchedulerSuite that much, because you still need to first stop the SparkContext then call init() with your own SparkConf, but at least it's not worse than the current approach, and more easy to understand and to be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your first question, yes, it is.
For your second question, withConfig(pairs: (String, String)*) mean to stop the SparkContext and then call init(). So I created the function testWithSparkConf which avoid to stop the SparkContext then call init().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @jiangxb1987 . There're only 7 places using the testWithSparkConf() comparing to other 81 tests within the DAGSchedulerSuite. We could just call (Sorry, just realized that it's actually the same with current implementation)init inside each test to ease other tests who needs to call afterEach inside the test.
And I have another idea for the whole thing. That is, we could probably initialize the SparkContext like this:
trait LocalSparkContext ... {
@transient private var _sc: SparkContext = _
private val conf = new SparkConf()
def sc: SparkContext = {
if (_sc == null) {
_sc = new SparkContext(conf)
}
_sc
}
def withConf(pairs: (String, String)*) = {
if (_sc != null) {
// probably, log warning when SparkContext already initialized
// since configurations won't take effect in this case
}
paris.foreach { case (k, v) => conf.set(k, v) }
}
override def afterEach(): Unit = {
try {
resetSparkContext()
} finally {
super.afterEach()
}
}
def resetSparkContext(): Unit = {
LocalSparkContext.stop(sc)
ResourceProfile.clearDefaultProfile()
sc = null
}
}
class DAGSchedulerSuite extends LocalSparkContext ... {
private var firstInit: Boolean = _
override def beforeEach: Unit = {
super.beforeEach()
firstInit = true
}
override def sc: SparkContext = {
val sc = super.sc()
if (firstInit) {
init(sc)
firstInit = false
}
sc
}
}@beliefer @jiangxb1987 WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the above proposal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will implement this proposal.
|
Test build #127645 has finished for PR 29228 at commit
|
|
Test build #127646 has finished for PR 29228 at commit
|
|
Test build #127669 has finished for PR 29228 at commit
|
|
Test build #127672 has finished for PR 29228 at commit
|
|
|
||
| /** Manages a local `sc` `SparkContext` variable, correctly stopping it after each test. */ | ||
| trait LocalSparkContext extends BeforeAndAfterEach with BeforeAndAfterAll { self: Suite => | ||
| trait LocalSparkContext extends Logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we usually put Logging at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| logWarning("Because SparkContext already initialized, " + | ||
| "since configurations won't take effect in this case.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These configurations ${paris.mkString(", ")} won't take effect since the SparkContext has been already initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| super.beforeEach() | ||
| init(new SparkConf()) | ||
| firstInit = true | ||
| setConf("spark.master" -> "local[2]", "spark.app.name" -> "DAGSchedulerSuite") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better to expose the conf via a function to set the conf. setConf is designed to be used by the test only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| trait LocalSparkContext extends Logging | ||
| with BeforeAndAfterEach with BeforeAndAfterAll { self: Suite => | ||
|
|
||
| private var _conf: SparkConf = new SparkConf() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SparkConf should have the default values for master and appName. So test suites extend it could use the SparkContext directly without any specific configurations when the test doesn't really care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
Test build #127686 has finished for PR 29228 at commit
|
|
retest this please |
|
Test build #127732 has finished for PR 29228 at commit
|
| _sc | ||
| } | ||
|
|
||
| def withConf(pairs: (String, String)*): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe like this?
def withConf(pairs: (String, String)*)(f: => Unit) = {
try f finally {
// reset conf here
}
}If so, we don't need to create the new SparkConf for each test.
|
Test build #127827 has finished for PR 29228 at commit
|
|
retest this please |
|
Test build #127831 has finished for PR 29228 at commit
|
|
retest this please |
|
Test build #127837 has finished for PR 29228 at commit
|
|
LGTM. BTW, could you update the PR description? You may not need to list all the test suites with the latest change but just say test suites inherits @jiangxb1987 Could you also take a look? |
|
@Ngone51 I updated the description of this PR. |
| * After migrating all test suites that use [[LocalSparkContext]] to use [[LocalSC]], we will | ||
| * delete the original [[LocalSparkContext]] and rename [[LocalSC]] to [[LocalSparkContext]]. | ||
| */ | ||
| trait LocalSC extends BeforeAndAfterEach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this class is only used for temporary purpose, can we name it as TempLocalSparkContext ? TBH I don't like the SC name which is very vague to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
LGTM otherwise |
|
Test build #128018 has finished for PR 29228 at commit
|
jiangxb1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
retest this please |
|
Test build #128190 has finished for PR 29228 at commit
|
|
retest this please |
|
Test build #128231 has finished for PR 29228 at commit
|
|
retest this please |
|
Test build #128234 has finished for PR 29228 at commit
|
|
retest this please |
|
Test build #128244 has finished for PR 29228 at commit
|
|
cc @jiangxb1987 |
|
retest this please |
|
Test build #128470 has finished for PR 29228 at commit
|
|
Merged to master. |
What changes were proposed in this pull request?
DAGSchedulerSuiteexists some issue:afterEachandinitare called when theSparkConfof the defaultSparkContexthas no configuration that the test case must set. This causes theSparkContextinitialized inbeforeEachto be discarded without being used, resulting in waste. On the other hand, the flexibility to add configurations toSparkConfshould be addressed by the test framework.Test suites inherits
LocalSparkContextcan be simplified.Why are the changes needed?
Reduce overhead about init
SparkContext.Rewrite the test framework to support apply specified spark configurations.
Does this PR introduce any user-facing change?
'No'.
How was this patch tested?
Jenkins test.