-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
depsolver concurrency limitations #36
Comments
You've summarized the issue correctly. This bottleneck was introduced when we switched back from the pure erlang implementation of the dependency solver feature to using the Ruby/Gecode-based solver that shipped with Chef 10. The Chef Server keeps a pool of depsolver processes running and waiting for requests. When this pool is exhausted, the API returns 503, for Service Unavailable. It is not recommended that this pool size exceed the number of CPUs that you have on your Chef Server, as this is a compute-constrained resource. Chef 12 ships with a new version of the pooler library (the library that handles said pooling) that now allows for requests to the pool to queue up when no workers are available. This hasn't yet been exposed in Chef 12 for the dependency solver, but it wouldn't take a lot of work to add that ability. |
Coming back to this issue again, the conclusion still stands that we should add queuing to the depsolver workers. I'll add the We should also include upgrading to the newest |
Hi, why is it a minor issue? We have 25 nodes and they all constantly fail to run chef-client. |
You can work around it by raising the number of dependency solvers. The On Mon, Jul 11, 2016, 17:24 Alexander S. [email protected] wrote:
|
Yeah, thank you, looking into that now. |
(cross-filing as requested in chef-boneyard/chef-provisioning#112)
Using Open Source Chef Server 11.1.5 (but really ever since Chef 11) there seems to be a hard limit on the number of concurrent cookbook dependency resolutions.
We are running
chef-client
at regular intervals on all our hosts, but occasionally we want to kick them off immediately on one or more hosts, so we pdsh over them. This means that N (where N is the pdsh fanout) chef-client runs start at almost exactly the same time, and since they look up their cookbooks very early they nearly simultaneously hit the depsolvers.Correlating with general chef-server load and the number of cookbooks / cookbook versions, it used to be that we could do this to no more than 5-10 nodes at the same time, in line with the default number of depsolvers (5). I bumped the
erchef['depsolver_worker_count']
to 20 now and can do this to ~30 nodes at once without hitting errors, but see errors at 40. Since they're probably not perfectly in sync it looks to me like this is just hitting the new, higher limit again.To sum up, it looks like each depsolver worker can only do one dependency resolution at a time, and it is not possible to have more than
depsolver_worker_count
simultaneous chef-client runs. Is that so, and is that by design or a known issue, or (quoting @jkeiser) "bad mojo"?The text was updated successfully, but these errors were encountered: