-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21383][Core] Fix the YarnAllocator allocates more Resource #18651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When NodeManagers launched the Executors, the missing will excel the real value, this can lead to YARN allocate more resource.
|
ok to test |
Not sure what that means. Maybe "exceed"? |
|
Test build #79716 has finished for PR 18651 at commit
|
| // occupation. | ||
| amClient.releaseAssignedContainer(containerId) | ||
| try { | ||
| numExecutorToBeLaunched += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not thread safe. += is two operations.
| }) | ||
| }) | ||
| } finally { | ||
| numExecutorToBeLaunched -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't this be done in the actual thread that launching the executor? Otherwise you're decrementing the counter as soon as the task is submitted for execution, but you're not really waiting for the task to execute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. When I test the code by experiment, i decrease the numExecutorToBeLaunched in the updateInternalState function, but I later found this may impact the test.
I will fix this soon.
|
|
||
| @volatile private var numExecutorsRunning = 0 | ||
|
|
||
| @volatile private var numExecutorToBeLaunched = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better name would be numExecutorsStarting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,I will change the name.
When NodeManagers launched the Executors, the missing will excel the real value, this can lead to YARN allocate more resource.
|
I just update the code, and test by experiment, can you take a look at @vanzin |
|
Test build #79743 has finished for PR 18651 at commit
|
|
Test build #79745 has finished for PR 18651 at commit
|
| .format( | ||
| allocatedContainers.size, | ||
| numExecutorsRunning, | ||
| numExecutorsRunning.get, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it woudl be nice to print the numExecutorsStarting here as well.
| val missing = targetNumExecutors - numPendingAllocate - numExecutorsRunning | ||
| val missing = targetNumExecutors - numPendingAllocate - | ||
| numExecutorsStarting.get - numExecutorsRunning.get | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add in a debug message here, something like below. I found this very useful when debugging this issue and think it would be useful for debugging other allocating issues in the future.
logDebug(s"Updating resource requests, target: $targetNumExecutors, pending: " +
-
s"$numPendingAllocate, running: $numExecutorsRunning, executorsStarting $numExecutorsPendingStart")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your advice! I just add the debug info.
|
Test build #79764 has finished for PR 18651 at commit
|
| if (executorIdToContainer.contains(executorId)) { | ||
| val container = executorIdToContainer.get(executorId).get | ||
| internalReleaseContainer(container) | ||
| numExecutorsRunning -= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be an atomic integer because this method is synchronized already - and so is the other method where it's modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just try to keep consistency with numExecutorsStarting
vanzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to numExecutorsRunning are not necessary, but at the same time they don't hurt. I'll let Tom take a look since he left feedback earlier.
| // occupation. | ||
| // Assigned container should be released immediately | ||
| // to avoid unnecessary resource occupation. | ||
| numExecutorsStarting.decrementAndGet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safer to put this in a finally, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is more robust. I have update the code.
|
Test build #79789 has finished for PR 18651 at commit
|
| // to avoid unnecessary resource occupation. | ||
| amClient.releaseAssignedContainer(containerId) | ||
| } finally { | ||
| numExecutorsStarting.decrementAndGet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you move this out of updateInternalState? I would rather see those stay together because updateInternalState is synchronized and if they aren't changed at the same time you could end up with the wrong numbers. numExecutorsStarting could still be say 1 when you have already added that to the # running and then updateResourceRequests gets an incorrect number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you move this to updateInternalState, then if there's a problem in ExecutorRunnable.run this counter will not be decremented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateInternalState is called right after the run if it succeeds. You do still need to handle the failure as well but in the failure the numExecutorsRunning is never incremented. I would rather see those handled separately and make sure we don't over count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is "starting" not "running".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note I thought the last version handled this properly where he had the decrement in the catch block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but its a bug right now as the numbers can be wrong. Are you looking at the synchronization?
Right now everything is called synchronized up to the point of launcher pool to do the ExecutorRunnable. At this point running is not incremented, pending is decremented and we now increment Starting. That is fine.
But when the ExecutorRunnable finishes the only place its called synchronized is in updateInternalState. This right now increments running but does not decrement starting. if updateResourceRequests gets called (which is synchronized), Right after updateInternalState (which leave the syncrhonized) but before the finally block executes and decrements starting the total number can be more then it really is. That executor is counted as both running and starting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. The current code can double count the same executor as starting and running, while the previous code could count it as starting even though it failed to start (for a really small window), but that is a self-healing situation while the previous can have some adverse effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, the fact that its still marked as Starting even though failed will fix itself next loop through. Its no different then if we didn't know it failed and it was still in the ExecutorRunnable.run code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that put numExecutorsStarting.decrementAndGet() together with numExecutorsRunning.incrementAndGet() in the updateInternalState is better if we can.
Why I try to put numExecutorsStarting.decrementAndGet() in the finally block is that if there some Exceptions is not NonFatal, and caught by the following code, we may can not allocated resources as we specified, this is the same as @vanzin worried.
We may double the count in the current code, but this only slow down the allocation rate for a small time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its a fatal error I don't really expect us to continue so dynamic allocation doesn't matter. If you know a case where we would recover from fatal exception, I'm fine with adding in another catch there to decrement in the fatal case as well.
| if (numExecutorsRunning < targetNumExecutors) { | ||
| if (numExecutorsRunning.get < targetNumExecutors) { | ||
| if (launchContainers) { | ||
| numExecutorsStarting.incrementAndGet() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note if we change the below back this needs to move outside if(launchContainers)
|
I update the code, please take a look at @vanzin @tgravescs |
|
Test build #79928 has finished for PR 18651 at commit
|
|
LGTM. @vanzin anything further? |
|
Merging to master / 2.2. |
When NodeManagers launching Executors, the `missing` value will exceed the real value when the launch is slow, this can lead to YARN allocates more resource. We add the `numExecutorsRunning` when calculate the `missing` to avoid this. Test by experiment. Author: DjvuLee <[email protected]> Closes #18651 from djvulee/YarnAllocate. (cherry picked from commit 8de080d) Signed-off-by: Marcelo Vanzin <[email protected]>
When NodeManagers launching Executors, the `missing` value will exceed the real value when the launch is slow, this can lead to YARN allocates more resource. We add the `numExecutorsRunning` when calculate the `missing` to avoid this. Test by experiment. Author: DjvuLee <[email protected]> Closes apache#18651 from djvulee/YarnAllocate.
When NodeManagers launching Executors, the `missing` value will exceed the real value when the launch is slow, this can lead to YARN allocates more resource. We add the `numExecutorsRunning` when calculate the `missing` to avoid this. Test by experiment. Author: DjvuLee <[email protected]> Closes apache#18651 from djvulee/YarnAllocate. (cherry picked from commit 8de080d) Signed-off-by: Marcelo Vanzin <[email protected]>
What changes were proposed in this pull request?
When NodeManagers launching Executors,
the
missingvalue will exceed thereal value when the launch is slow, this can lead to YARN allocates more resource.
We add the
numExecutorsRunningwhen calculate themissingto avoid this.How was this patch tested?
Test by experiment.