-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8981][CORE][FOLLOW-UP] Clean up MDC properties after running a task #28756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #123639 has finished for PR 28756 at commit
|
|
ping @cloud-fan @igreenfield Please take a look! |
| .filter(_._1.startsWith(Executor.MDC_KEY)).map { item => | ||
| val key = item._1.substring(4) | ||
| if (key == Executor.TASK_MDC_KEY && item._2 != taskName) { | ||
| logWarning(s"Override mdc.taskName is not allowed, ignore ${item._2}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we simply set the task mdc key at the end? then it will not be overwritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we've already add the task mdc key at the new line 326. We can remove the warning if it's ok to override silently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original setMDCForTask method looks OK as long as we set task mdc key at the end. It also helps avoid duplicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we do not let override the task name in MDC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then our document is wrong. We must make sure taskName always represent the value as we documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should let the user override it, and write to the log that is overridden.
(windows way is not let you do things, Linux way: with great power comes greater responsibility)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit to let users override the task name? It's just confusing to me. Let's not support a non-existing use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the UI you can set the description I think many users can benefit from setting the taskName to be the same as that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, they can use custom MDC properties instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| } | ||
|
|
||
| override def run(): Unit = { | ||
| val oldMdcProperties = mdcProperties.keys.map(k => (k, MDC.get(k))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply clear all the mdc properties here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can normally, and I'd prefer to clear it inside finally.
| val taskDeserializationProps: ThreadLocal[Properties] = new ThreadLocal[Properties] | ||
|
|
||
| val MDC_KEY = "mdc." | ||
| val TASK_MDC_KEY = s"${MDC_KEY}taskName" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you change this key you need also to update the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the change doesn't introduce any difference to the end-user, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I remember before it was just taskName without mdc.taskName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used internally in order to handle taskName consistently with custom MDC properties. It has no effect on users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| .filter(_._1.startsWith(Executor.MDC_KEY)).map { item => | ||
| val key = item._1.substring(4) | ||
| if (key == Executor.TASK_MDC_KEY && item._2 != taskName) { | ||
| logWarning(s"Override mdc.taskName is not allowed, ignore ${item._2}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we do not let override the task name in MDC?
|
@cloud-fan @igreenfield Have updated according to your comments. |
|
Test build #123842 has finished for PR 28756 at commit
|
|
Test build #123843 has finished for PR 28756 at commit
|
|
thanks, merging to master! |
|
thanks all!! |
|
Currently, spark only adds taskName to mdc, can add executorId to MDC as well? https://aws.github.io/aws-emr-containers-best-practices/troubleshooting/docs/where-to-look-for-spark-logs/ |
What changes were proposed in this pull request?
This PR is a followup of #26624. This PR cleans up MDC properties if the original value is empty.
Besides, this PR adds a warning and ignore the value when the user tries to override the value of
taskName.Why are the changes needed?
Before this PR, running the following jobs:
there's still MDC value "ABC" in the log of the second count job even if we've unset the value.
Does this PR introduce any user-facing change?
Yes, user will 1) no longer see the MDC values after unsetting the value; 2) see a warning if he/she tries to override the value of
taskName.How was this patch tested?
Tested Manaually.