[SPARK-23820][CORE] Enable use of long form of callsite in logs#21433
[SPARK-23820][CORE] Enable use of long form of callsite in logs#21433michaelmior wants to merge 1 commit intoapache:masterfrom
Conversation
|
Could you please add the description for this PR? |
|
Done! |
|
Although maybe not widely used, I could see allowing control of this via an undocumented param |
|
Test build #4200 has finished for PR 21433 at commit
|
9800d2e to
245181a
Compare
|
Rebased on top of master. The failing test is unrelated. |
|
Test build #4203 has finished for PR 21433 at commit
|
|
@srowen Yes, I don't expect it will be widely used but I've personally found it helpful in some performance debugging and it's a fairly low impact change. I was just hoping to avoid having to keep applying this patch and doing my own build of Spark in the future :) |
|
Test build #4205 has finished for PR 21433 at commit
|
|
Test build #4206 has finished for PR 21433 at commit
|
|
Merged to master |
|
Thanks @srowen! |
| val parentIds = rdd.dependencies.map(_.rdd.id) | ||
| val callSite = callsiteForm match { | ||
| case "short" => rdd.creationSite.shortForm | ||
| case "long" => rdd.creationSite.longForm |
There was a problem hiding this comment.
If the users input is neither short nor long, we will get an exception, right?
There was a problem hiding this comment.
yea, usually we will define an enum and verify the given config value is valid or not.
For this particular case, I think we can just define a boolean flag: spark.eventLog.callsite.longForm.enabled.
| ConfigBuilder("spark.eventLog.overwrite").booleanConf.createWithDefault(false) | ||
|
|
||
| private[spark] val EVENT_LOG_CALLSITE_FORM = | ||
| ConfigBuilder("spark.eventLog.callsite").stringConf.createWithDefault("short") |
There was a problem hiding this comment.
short is defined? Where is the test case? Why this is not documented?
There was a problem hiding this comment.
Not sure whether we should introduce a conf here. cc @rxin
There was a problem hiding this comment.
Ah yeah this should have been documented, that is a good point. I should have looked more carefully at the configs. The other eventLog configs aren't internal either.
I agree this could also be a boolean, or simply handle anything but "short" or "long" as "short"
| } | ||
| new RDDInfo(rdd.id, rddName, rdd.partitions.length, | ||
| rdd.getStorageLevel, parentIds, rdd.creationSite.shortForm, rdd.scope) | ||
| rdd.getStorageLevel, parentIds, callSite, rdd.scope) |
There was a problem hiding this comment.
This sounds a general issue. Why we only apply it in RDDInfo?
There was a problem hiding this comment.
@gatorsmile As far as I am aware, RDDInfo is the only place the call site is included in the event log.
|
@gatorsmile @cloud-fan I'll just go with a boolean config as there really is no need for more than two options and this simplifies things quite a bit. |
|
@michaelmior Since Spark 2.4 is branch cut, this PR still needs more review. I would revert this PR from branch 2.4 and master first. We can discuss the conf and implementation in the master branch. The preferred conf name is |
srowen
left a comment
There was a problem hiding this comment.
I think it's easy enough to update this to a boolean and add a doc and pick it into branch-2.4; it's not imminent as there are still about 30-40 open issues. Still, it's not vital enough that it must be included.
| ConfigBuilder("spark.eventLog.overwrite").booleanConf.createWithDefault(false) | ||
|
|
||
| private[spark] val EVENT_LOG_CALLSITE_FORM = | ||
| ConfigBuilder("spark.eventLog.callsite").stringConf.createWithDefault("short") |
There was a problem hiding this comment.
Ah yeah this should have been documented, that is a good point. I should have looked more carefully at the configs. The other eventLog configs aren't internal either.
I agree this could also be a boolean, or simply handle anything but "short" or "long" as "short"
|
Given lack of certainty, and that's this is small and easy to add back in a different form, and the fact that 2.4 is quickly teeing up, let me revert this for now. We can proceed with a different approach in a new PR. |
|
Yea we can add this back easily.
…On Tue, Sep 11, 2018 at 12:50 PM Sean Owen ***@***.***> wrote:
Given lack of certainty, and that's this is small and easy to add back in
a different form, and the fact that 2.4 is quickly teeing up, let me revert
this for now. We can proceed with a different approach in a new PR.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21433 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AATvPPKRRxsg30kJA9RAItGJDHPF4mX_ks5uaBQjgaJpZM4UONdo>
.
|
This is a rework of apache#21433 to address some concerns there. Closes apache#22398 from michaelmior/long-callsite2. Authored-by: Michael Mior <mmior@uwaterloo.ca> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This adds an option to event logging to include the long form of the callsite instead of the short form.