-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strict priority polling / bug fix #263
Strict priority polling / bug fix #263
Conversation
…ning jobs. Bug causes dispatch function to overwhelm all other functionality, as it is continually re-queued.
@dispatch_timer ||= after(1) do | ||
@dispatch_timer = nil | ||
dispatch | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! 🍻
@mariokostelac I'm happy with this PR, WDYT? If you like so, could you fix the merge conflicts in #236, then we can merge it? @atyndall great work and investigation 🍻 |
@atyndall It's a great writeup! I think that it would be even more correct to cancel the old timer and start it again (since we know that there are no jobs in the queue). Am I wrong? |
@mariokostelac I did investigate this approach, and although my memory is a bit hazy at this stage, IIRC it seemed like the act of cancelling the existing timer was quite expensive and slowed things down, compared to letting it run. |
Closing in favor of #284 |
hi @atyndall You would mind reviewing this https://github.com/phstc/shoryuken/wiki/Polling-strategies#strictpriority? Feel free to update it 🍻 |
@phstc Added some additional content, looks great! |
@atyndall awesome! Thanks 🍻 |
TL;DR @mariokostelac's PR #236 restructured Shoryuken to improve performance for our jobs significantly / enabled implementation of our use case, so we forked off his PR and implemented Strict Priority Polling similar to that proposed by @gshutler in #106. We additionally found a bug that was causing a lot of problems with our very short running jobs (< 50 ms per job).
Not entirely sure how to handle the merging of this, as @mariokostelac's work in #236 is yet to be integrated, but we thought it would be good to make people aware of it.
To describe our Strict Priority polling implementation; we adapted @gshutler's implementation in #106 to mostly preserve its strict priority properties, but to also introduce the ability for queues to pause themselves using the
delay:
option when they find they have no messages available to process.To describe the bug; we noticed that after 15-30 minutes minutes of running the queueing system, the job processing speed would begin to slow to a crawl, going from 10 jobs a second to less than 1 job a second. This was with a config like this;
Reviewing the verbose log output, we saw that the following log lines would slowly increase in frequency until they were begin logged many times a second and would start to generate many megabytes of log files every minute, filled 90% with these two lines, and occasionally with notification of a new job starting or completing. This effect would quickly render the Shoryuken process as ineffective and it eventually completely stop processing work.
Our working theory is that when
Shoryuken::Manager#processor_done
callsShoryuken::Manager#dispatch
in quick succession, as happens with our very short running jobs, theafter(1)
calls within it schedule#dispatch
to run again and again. As the terminating case for these recursive calls is for the processor to become available, and more and more of these become queued, it becomes less and less likely that they will be cleaned up, until eventually they grow to a number that completely chokes the Shoryuken process.Our fixes is to change
after(1) { dispatch }
toWhat this does is effectively ensure that only one
#dispatch
will be queued into the future at any one time. Our testing has observed that after this change, this problem does not reoccur.