Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Jun 8, 2020

What changes were proposed in this pull request?

This PR is a followup of #26624. This PR cleans up MDC properties if the original value is empty.
Besides, this PR adds a warning and ignore the value when the user tries to override the value of taskName.

Why are the changes needed?

Before this PR, running the following jobs:

sc.setLocalProperty("mdc.my", "ABC")
sc.parallelize(1 to 100).count()
sc.setLocalProperty("mdc.my", null)
sc.parallelize(1 to 100).count()

there's still MDC value "ABC" in the log of the second count job even if we've unset the value.

Does this PR introduce any user-facing change?

Yes, user will 1) no longer see the MDC values after unsetting the value; 2) see a warning if he/she tries to override the value of taskName.

How was this patch tested?

Tested Manaually.

@Ngone51 Ngone51 changed the title [SPARK-8981] [SPARK-8981][CORE][FOLLOW-UP] Clean up MDC properties after running a task Jun 8, 2020
@SparkQA
Copy link

SparkQA commented Jun 8, 2020

Test build #123639 has finished for PR 28756 at commit be3d752.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member Author

Ngone51 commented Jun 10, 2020

ping @cloud-fan @igreenfield Please take a look!

.filter(_._1.startsWith(Executor.MDC_KEY)).map { item =>
val key = item._1.substring(4)
if (key == Executor.TASK_MDC_KEY && item._2 != taskName) {
logWarning(s"Override mdc.taskName is not allowed, ignore ${item._2}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we simply set the task mdc key at the end? then it will not be overwritten.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we've already add the task mdc key at the new line 326. We can remove the warning if it's ok to override silently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original setMDCForTask method looks OK as long as we set task mdc key at the end. It also helps avoid duplicated code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we do not let override the task name in MDC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then our document is wrong. We must make sure taskName always represent the value as we documented.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should let the user override it, and write to the log that is overridden.
(windows way is not let you do things, Linux way: with great power comes greater responsibility)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit to let users override the task name? It's just confusing to me. Let's not support a non-existing use case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the UI you can set the description I think many users can benefit from setting the taskName to be the same as that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, they can use custom MDC properties instead?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}

override def run(): Unit = {
val oldMdcProperties = mdcProperties.keys.map(k => (k, MDC.get(k)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply clear all the mdc properties here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can normally, and I'd prefer to clear it inside finally.

val taskDeserializationProps: ThreadLocal[Properties] = new ThreadLocal[Properties]

val MDC_KEY = "mdc."
val TASK_MDC_KEY = s"${MDC_KEY}taskName"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you change this key you need also to update the docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the change doesn't introduce any difference to the end-user, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I remember before it was just taskName without mdc.taskName

Copy link
Member Author

@Ngone51 Ngone51 Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used internally in order to handle taskName consistently with custom MDC properties. It has no effect on users.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

.filter(_._1.startsWith(Executor.MDC_KEY)).map { item =>
val key = item._1.substring(4)
if (key == Executor.TASK_MDC_KEY && item._2 != taskName) {
logWarning(s"Override mdc.taskName is not allowed, ignore ${item._2}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we do not let override the task name in MDC?

@Ngone51
Copy link
Member Author

Ngone51 commented Jun 11, 2020

@cloud-fan @igreenfield Have updated according to your comments.

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123842 has finished for PR 28756 at commit 0cefc61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123843 has finished for PR 28756 at commit 2f1b86b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 91cd06b Jun 11, 2020
@Ngone51
Copy link
Member Author

Ngone51 commented Jun 11, 2020

thanks all!!

@melin
Copy link

melin commented Jan 8, 2025

Currently, spark only adds taskName to mdc, can add executorId to MDC as well?
Plan to write logs to kafka via kafka appender, and then periodically write kafka data to s3 for consumption.

https://aws.github.io/aws-emr-containers-best-practices/troubleshooting/docs/where-to-look-for-spark-logs/
Executor Logs - s3://my_s3_log_location/${virtual-cluster-id}/jobs/${job-id}/containers/${spark-application-id}/${spark-job-id-driver-executor-id}/(stderr.gz/stdout.gz)

@Ngone51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants