Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -715,7 +715,9 @@ private[deploy] class Master(
val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
.filter(canLaunchExecutor(_, app.desc))
.sortBy(_.coresFree).reverse
if (waitingApps.length == 1 && usableWorkers.isEmpty) {
val appMayHang = waitingApps.length == 1 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to verify waitingApps.length == 1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm..I was also thinking about this question when I was fixing. I think we can extend it to multiple apps. Let me try it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how useful is this warning message, this just reflects the current status within one round of scheduling, we can tolerant the behavior to not launch a new executor on currently available workers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from #25047 (comment) originally. The warning is mainly used to detect the possible hang of an application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this check, but that seems to be the most simple solution for now. Let's keep it then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of ==1 is that there is only 1 active application, if more then 1, other applications could be using resources so unless you did a lot more checking. We could definitely make it a more comprehensive check in the future. I believe this was supposed to be a simple sanity check for people playing around that may have started the cluster with not enough resources. The standalone mode does not tell you there aren't enough and will just hang forever.

waitingApps.head.executors.isEmpty && usableWorkers.isEmpty
if (appMayHang) {
logWarning(s"App ${app.id} requires more resource than any of Workers could have.")
}
val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
Expand Down