[SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile #39410

ivoson · 2023-01-05T14:30:01Z

What changes were proposed in this pull request?

As described in SPARK-41848, we update the executor's free cores based on executor's resource profile. For tasks with TaskResourceProfile in standalone cluster, since executors with default resource profile can be reused across different TaskResourceProfiles, and the task cpus can be different from the value we get from executor's resource profile.

The changes to fix the issues:

Decrease executor free cores when launch tasks by taskCpus from TaskDescription;
Adding taskCpus to StatusUpdate as well as the other resources, so that taskCpus can be reported when task is finished and we can increase executor free cores by the reported taskCpus;

Why are the changes needed?

Fixing the bug as described in SPARK-41848

Does this PR introduce any user-facing change?

No

How was this patch tested?

New UTs added.

ivoson · 2023-01-05T14:30:56Z

cc @Ngone51 could you please take a look at this PR? Thanks

AmplabJenkins · 2023-01-05T14:36:02Z

Can one of the admins verify this patch?

Ngone51 · 2023-01-06T02:05:57Z

cc @tgravescs @mridulm too

Ngone51 · 2023-01-06T03:08:00Z

core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala

  override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer): Unit = {
    val resources = taskResources.getOrElse(taskId, Map.empty[String, ResourceInformation])
-    val msg = StatusUpdate(executorId, taskId, state, data, resources)
+    val cpus = taskCpus.getOrElse(taskId, 0)


Could we get the taskCpu by executor.runningTask(taskId).taskDescription.cpus to get rid of taskCpus?

Thanks, that looks better. Making the change.

Ngone51 · 2023-01-06T03:10:06Z

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

  }

+  // this function is for testing only
+  def getExecutorAvailableCpus(


Could we make it private spark?

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

Ngone51 · 2023-01-06T03:18:55Z

core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala

+    val conf = new SparkConf()
+      .set(EXECUTOR_CORES, execCores)
+      .set(SCHEDULER_REVIVE_INTERVAL.key, "1m") // don't let it auto revive during test
+      .set(EXECUTOR_INSTANCES, 0) // avoid errors about duplicate executor registrations


Why there could be "duplicate executor registrations"?

Kept the comments from other UTs, while I checked the code and didn't see any chance for the duplicate registrations. Removing the comments here.

Ngone51 · 2023-01-06T03:21:07Z

core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala

+    when(mockEndpointRef.send(LaunchTask)).thenAnswer((_: InvocationOnMock) => {})
+
+    var executorAddedCount: Int = 0
+    val infos = scala.collection.mutable.ArrayBuffer[ExecutorInfo]()


Could you move scala.collection.mutable to import list?

mridulm · 2023-01-06T03:25:39Z

I will need to test this and revisit the code to understand the issue better.
I would expect this info to be at driver already - not sure why executors are having to let the driver know.

Ngone51 · 2023-01-06T03:27:26Z

core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala

+    }
+
+    // To avoid allocating any resources immediately after releasing the resource from the task to
+    // make sure that `availableAddrs` below won't change


available cpus?

Thanks, fixed.

Ngone51 · 2023-01-06T03:30:18Z

@mridulm Please note that this is a new bug introduced by a recent PR (#37268) that hasn't been rollout.

mridulm · 2023-01-06T03:43:35Z

Ah ! Thanks for the context @Ngone51 - just checked git blame to see it is part of 3.4 :-)

tgravescs · 2023-01-06T15:43:38Z

Overall approach looks good, I think @Ngone51 covered comments I had.

…rainedSchedulerBackend.scala Co-authored-by: wuyi <[email protected]>

dongjoon-hyun

cc @xinrong-meng (Apache Spark 3.4 release manager) since this is filed as a blocker-level JIRA issue.

dongjoon-hyun · 2023-01-09T06:59:38Z

core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala

      "Our unexpected executor does not have a request time.")
  }

+  test("SPARK-41848: executor core decrease should base on taskCpus") {


nit. decrease should base on -> should be decreased based on?

Thanks, updated.

WeichenXu123

LGTM

dongjoon-hyun · 2023-01-09T20:04:39Z

Merged to master. Thank you, @ivoson , @Ngone51 , @mridulm , @tgravescs , @WeichenXu123 .
Also, cc @sunchao

mridulm · 2023-01-09T20:06:40Z

@Ngone51 Had a query here ... the number of cores for a task is determined in resourceOffers based on TaskSet - given this, cant we not leverage it at driver given a task id ? Instead of passing it from executor ?

dongjoon-hyun · 2023-01-09T21:10:12Z

Sorry for missing that, @mridulm .

mridulm · 2023-01-10T00:34:54Z

Oh no, you are good @dongjoon-hyun - I was not actively reviewing this PR !

Ngone51 · 2023-01-10T01:48:16Z

@mridulm That might be an alternative. I was thinking about it too. But checking the code, I found that a TaskSet might have been cleaned up when a task finishes, e.g., in the case of executor lost comes before the StatusUpdate. TaskSchedulerImpl.statusUpdate() has also considered this case:

spark/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

Line 843 in c13fea9

    
           ("Ignoring update with state %s for TID %s because its task set is gone (this is " +

in that case, we'd have troubles to know the exact task cores used. We might need to maintain an extra constructor at the driver to track the resources used by the task if we want to do this alternative way.

Instead of passing it from executor ?

The current way is actually consistent with the way we handled with custom resources, which is also assigned at driver and returned with StatusUpdate. Though, the custom resources itself also has the same issue with task cores as you concerned.

ivoson · 2023-01-10T05:09:48Z

Thanks for review. @Ngone51 @tgravescs @dongjoon-hyun @WeichenXu123 @mridulm

mridulm · 2023-01-10T06:24:34Z

Thanks @Ngone51, that is an excellent case where the approach I asked about wont work !

ivoson added 2 commits January 4, 2023 00:00

Fix task over-scheduled with task reource profile

a65f7cc

Adding UTs

73d216f

github-actions bot added the CORE label Jan 5, 2023

Ngone51 requested review from mridulm and tgravescs January 6, 2023 02:10

Ngone51 reviewed Jan 6, 2023

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala Outdated Show resolved Hide resolved

Ngone51 reviewed Jan 6, 2023

View reviewed changes

ivoson and others added 2 commits January 8, 2023 23:16

Update core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseG…

f1cbc26

…rainedSchedulerBackend.scala Co-authored-by: wuyi <[email protected]>

address comments

908f1dc

dongjoon-hyun reviewed Jan 9, 2023

View reviewed changes

Fixing ut

5dc5e7a

Ngone51 approved these changes Jan 9, 2023

View reviewed changes

dongjoon-hyun reviewed Jan 9, 2023

View reviewed changes

dongjoon-hyun approved these changes Jan 9, 2023

View reviewed changes

address comments

2f3446e

WeichenXu123 approved these changes Jan 9, 2023

View reviewed changes

dongjoon-hyun closed this in c292bd7 Jan 9, 2023

ivoson deleted the SPARK-41848 branch January 10, 2023 05:09

[SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile #39410

[SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile #39410

Uh oh!

Conversation

ivoson commented Jan 5, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

ivoson commented Jan 5, 2023

Uh oh!

AmplabJenkins commented Jan 5, 2023

Uh oh!

Ngone51 commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ngone51 Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mridulm commented Jan 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ngone51 commented Jan 6, 2023

Uh oh!

mridulm commented Jan 6, 2023

Uh oh!

tgravescs commented Jan 6, 2023

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jan 9, 2023

Uh oh!

mridulm commented Jan 9, 2023

Uh oh!

dongjoon-hyun commented Jan 9, 2023

Uh oh!

mridulm commented Jan 10, 2023

Uh oh!

Ngone51 commented Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivoson commented Jan 10, 2023

Uh oh!

mridulm commented Jan 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Ngone51 commented Jan 6, 2023 •

edited

Loading

Ngone51 Jan 6, 2023 •

edited

Loading

Ngone51 commented Jan 10, 2023 •

edited

Loading