-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41848][CORE] Fixing task over-scheduled with TaskResourceProfile #39410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @Ngone51 could you please take a look at this PR? Thanks |
|
Can one of the admins verify this patch? |
|
cc @tgravescs @mridulm too |
| override def statusUpdate(taskId: Long, state: TaskState, data: ByteBuffer): Unit = { | ||
| val resources = taskResources.getOrElse(taskId, Map.empty[String, ResourceInformation]) | ||
| val msg = StatusUpdate(executorId, taskId, state, data, resources) | ||
| val cpus = taskCpus.getOrElse(taskId, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get the taskCpu by executor.runningTask(taskId).taskDescription.cpus to get rid of taskCpus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that looks better. Making the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| } | ||
|
|
||
| // this function is for testing only | ||
| def getExecutorAvailableCpus( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make it private spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
Outdated
Show resolved
Hide resolved
| val conf = new SparkConf() | ||
| .set(EXECUTOR_CORES, execCores) | ||
| .set(SCHEDULER_REVIVE_INTERVAL.key, "1m") // don't let it auto revive during test | ||
| .set(EXECUTOR_INSTANCES, 0) // avoid errors about duplicate executor registrations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there could be "duplicate executor registrations"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept the comments from other UTs, while I checked the code and didn't see any chance for the duplicate registrations. Removing the comments here.
| when(mockEndpointRef.send(LaunchTask)).thenAnswer((_: InvocationOnMock) => {}) | ||
|
|
||
| var executorAddedCount: Int = 0 | ||
| val infos = scala.collection.mutable.ArrayBuffer[ExecutorInfo]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move scala.collection.mutable to import list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
I will need to test this and revisit the code to understand the issue better. |
| } | ||
|
|
||
| // To avoid allocating any resources immediately after releasing the resource from the task to | ||
| // make sure that `availableAddrs` below won't change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
available cpus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
|
Ah ! Thanks for the context @Ngone51 - just checked git blame to see it is part of 3.4 :-) |
|
Overall approach looks good, I think @Ngone51 covered comments I had. |
…rainedSchedulerBackend.scala Co-authored-by: wuyi <[email protected]>
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @xinrong-meng (Apache Spark 3.4 release manager) since this is filed as a blocker-level JIRA issue.
| "Our unexpected executor does not have a request time.") | ||
| } | ||
|
|
||
| test("SPARK-41848: executor core decrease should base on taskCpus") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. decrease should base on -> should be decreased based on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
WeichenXu123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Merged to master. Thank you, @ivoson , @Ngone51 , @mridulm , @tgravescs , @WeichenXu123 . |
|
@Ngone51 Had a query here ... the number of cores for a task is determined in resourceOffers based on TaskSet - given this, cant we not leverage it at driver given a task id ? Instead of passing it from executor ? |
|
Sorry for missing that, @mridulm . |
|
Oh no, you are good @dongjoon-hyun - I was not actively reviewing this PR ! |
|
@mridulm That might be an alternative. I was thinking about it too. But checking the code, I found that a
in that case, we'd have troubles to know the exact task cores used. We might need to maintain an extra constructor at the driver to track the resources used by the task if we want to do this alternative way.
The current way is actually consistent with the way we handled with custom resources, which is also assigned at driver and returned with |
|
Thanks for review. @Ngone51 @tgravescs @dongjoon-hyun @WeichenXu123 @mridulm |
|
Thanks @Ngone51, that is an excellent case where the approach I asked about wont work ! |
What changes were proposed in this pull request?
As described in SPARK-41848, we update the executor's free cores based on executor's resource profile. For tasks with TaskResourceProfile in standalone cluster, since executors with default resource profile can be reused across different TaskResourceProfiles, and the task cpus can be different from the value we get from executor's resource profile.
The changes to fix the issues:
TaskDescription;StatusUpdateas well as the other resources, so that taskCpus can be reported when task is finished and we can increase executor free cores by the reported taskCpus;Why are the changes needed?
Fixing the bug as described in SPARK-41848
Does this PR introduce any user-facing change?
No
How was this patch tested?
New UTs added.