-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate scheduler queue saturation #3765
Comments
Possible sources:
Local investigation:
|
Long locks (for example how the plugin cache refresh was implemented before pr #3752 would also probably stall the threads that create jobs. Another long running task might be the endpoint that dispenses all the queues to the runners. do we really need that list in there to just pop a job (any job) from the queue? |
It looks like the boefje runner would try to ingest the list of queues for all 300 organisations, and would timeout on receiving this list (as the scheduler itself was probably also still very busy fetching all katalogi) resulting in no queue's being present in the runner and as such it would stop fetching jobs. |
Exactly. Which could indicate that the queue was already full and a restart happened and would take substantial time to bootstrap all the caches
We can optimize the endpoint to relay the available queues that a runner can pop off jobs from. This issue addresses this particular issue: #3358 Filtering parameters can also be added to give the task runner a narrower view on whats available. |
It could but if the second problem existed it would mean Job-popping would stop, and the queue's would fill up regardless of the katalogus locks being slow/or broad. Refreshing the katalogus caches continuously, because it takes longer for them to fill just adds a lot of load to the system, but does not stop functionality as far as I can see (since the code stops the cache timers while updating). there would at least be a small (default 30s) window of valid plugin caches. |
No description provided.
The text was updated successfully, but these errors were encountered: