-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21834] Incorrect executor request in case of dynamic allocation #19048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21834] Incorrect executor request in case of dynamic allocation #19048
Conversation
120f383 to
e30bbac
Compare
|
Test build #81110 has finished for PR 19048 at commit
|
|
cc - @markhamstra , @sameeragarwal, @rxin, @vanzin, |
|
Test build #81116 has finished for PR 19048 at commit
|
|
Jenkins retest this please. |
|
I'm not sure I understand why is this a problem. What is the undesired behavior that happens because of this? That's not explained either in the PR nor in the bug. The way I understand the code, yes, there are potentially redundant calls to update the target number of executors; but then, |
|
Looking at the scheduler and the dynamic executor allocator code, this is what my understanding, correct me if I am wrong. Let's say the dynamic executor allocator is ramping down the number of executors. There are 10 executors running and it needs only 4. Then |
Have you actually observed that behavior? The way I understand the code, both
Is that not what you're seeing? |
|
Test build #81132 has finished for PR 19048 at commit
|
|
That's not really true.
|
|
I think I'm starting to understand what you're getting at, but I still don't see why this has anything to do with the CGSB. What I understand from your comment is that the EAM may reduce its target and at the same time try to kill idle executors, basically doubling down and killing too many executors in the process. Isn't this what this piece of code is trying to prevent? If killing an idle executor would bring the number below the current target, then it won't be killed. That's a pretty recent fix so maybe you haven't seen it (SPARK-21656). |
|
To be clear there is no issue on EAM side. Consider the following situation -
|
Why? Because of the idle timeout? If that's your point, then the change I referenced above should avoid that.
How? The scheduler (a.k.a. CGSB) does not kill executors on its own. It has to be told to do so in some way. |
|
If you can actually provide logs that show what you're trying to say that would probably be easier. |
Yes because of idle timeout. Note that the
Because the EAM asks it to kill 2 of them. But please note that while killing 2 executors the EAM did not reduce its target to 3, it is still 5. But since scheduler keeps its internal target, it reduces its target from 5 to 3. And the EAM and scheduler gets out of sync.
Actually, I added a lot of debug log to find this issue so probably the log is not going to be of any help to you. |
|
I think I see what you're saying. But I still think it's the fault of the EAM.
And I think the problem here is that the EAM should not be telling the CGSB that the target is 5 when 5 is actually the "minimum" the EAM wants, but there may be more executors running that haven't timed out yet. Basically, this code in the EAM: Should be changed to account for the current number of executors, so that the EAM doesn't tell the CGSB that it wants less executors than currently exist. Because even if the EAM may not currently "need" the extra executors, it hasn't timed them out, so they need to be counted towards the "number of executors that I expect to be active". Your solution (the new Now if the EAM tells the CGSB the correct amount of executors it expects to be active (which means something like |
Actually if you look at the api,
I agree with you on this. May be it would be cleaner if we provide a new api like this - |
I think the main thing that bothers me is that adding anything to the API is making all this code even more complicated and confusing than it already is. Having two (3 if you count the YARN allocator) places track all this state is bound to lead to these issues. Optimally only the EAM would keep track of these things; the CGSB shouldn't really be dealing with executor allocation and de-allocation, just with managing the existing executors that connect to it. But fixing things like that is probably a much larger change (the words "hornets' nest" come to mind). Barring that, I think that we should make the change that leads to the correct behavior without making the internal interface more complicated than it needs to be. If changing the semantics of Or maybe you can reach the same thing through other means. For example, maybe if you get rid of the If none of those work, then we can talk about adding new things. |
|
On a high level I agree that keeping the states in 3 places is creating a mess but changing that would require a big refactoring which is probably outside of the scope of this change.
That might work. But there is a race-condition in doing that. In order to do that, we need to have a
It is possible that the total executors value changed by another thread between Instead of doing that, how about we add a new api to |
|
Well, that's adding an API that does the same thing as existing APIs but a little bit differently. In my view that adds to the problem, instead of fixing it. Now every caller into the For example: the |
|
(Or it can call |
Okay, that seems like a reasonable hack. Only downside as you mentioned is extra trip to the CM and adding more confusion to the usage of |
|
One thing I don't understand clearly is why we should update the |
|
@jiangxb1987 - I agree with you. I do not have the context or history to comment on that. Unfortunately, the api has been designed that way and book keeping of target number of executors is done by the CGSB. Changing the existing scheduler behavior will require a bigger change and possibly breaking some existing api behavior which I think is out of the scope of this PR. |
297059f to
22c3596
Compare
|
Test build #81193 has finished for PR 19048 at commit
|
|
Test build #81194 has finished for PR 19048 at commit
|
22c3596 to
6cc5fab
Compare
|
Test build #81195 has finished for PR 19048 at commit
|
|
Yea I agree the change made in this PR looks good for your issue, I'm just suggesting maybe we could refactor the code further more, maybe as a follow up work. |
|
Not sure why the test failed? May be the build is unstable? cc - @vanzin |
|
SparkR tests have been super flaky lately. retest this please |
|
retest this please |
|
Not sure why the PRB is not picking up my requests. @sitalkedia can you close and re-open the PR to see if that does it? (The change looks fine, it just would be nice to get a clean test run.) |
|
jenkins retest this please. |
|
Created #19081. |
What changes were proposed in this pull request?
killExecutor api currently does not allow killing an executor without updating the total number of executors needed. In case of dynamic allocation is turned on and the allocator tries to kill an executor, the scheduler reduces the total number of executors needed ( see https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L635) which is incorrect because the allocator already takes care of setting the required number of executors itself.
How was this patch tested?
Ran a job on the cluster and made sure the executor request is correct