Strict priority polling / bug fix #263

atyndall · 2016-10-27T00:40:04Z

TL;DR @mariokostelac's PR #236 restructured Shoryuken to improve performance for our jobs significantly / enabled implementation of our use case, so we forked off his PR and implemented Strict Priority Polling similar to that proposed by @gshutler in #106. We additionally found a bug that was causing a lot of problems with our very short running jobs (< 50 ms per job).

Not entirely sure how to handle the merging of this, as @mariokostelac's work in #236 is yet to be integrated, but we thought it would be good to make people aware of it.

To describe our Strict Priority polling implementation; we adapted @gshutler's implementation in #106 to mostly preserve its strict priority properties, but to also introduce the ability for queues to pause themselves using the delay: option when they find they have no messages available to process.

To describe the bug; we noticed that after 15-30 minutes minutes of running the queueing system, the job processing speed would begin to slow to a crawl, going from 10 jobs a second to less than 1 job a second. This was with a config like this;

timeout: 30     # Seconds grace to shutdown
concurrency: 1  # The number of allocated threads to process messages. Default 25
delay: 30       # The delay in seconds to pause a queue when it's empty. Default 0
queues:
  - [queue_1, 1]

Reviewing the verbose log output, we saw that the following log lines would slowly increase in frequency until they were begin logged many times a second and would start to generate many megabytes of log files every minute, filled 90% with these two lines, and occasionally with notification of a new job starting or completing. This effect would quickly render the Shoryuken process as ineffective and it eventually completely stop processing work.

DEBUG: Ready: 0, Busy: 1, Active Queues: [["queue_1", 1]]
DEBUG: Pausing fetcher, because all processors are busy

Our working theory is that when Shoryuken::Manager#processor_done calls Shoryuken::Manager#dispatch in quick succession, as happens with our very short running jobs, the after(1) calls within it schedule #dispatch to run again and again. As the terminating case for these recursive calls is for the processor to become available, and more and more of these become queued, it becomes less and less likely that they will be cleaned up, until eventually they grow to a number that completely chokes the Shoryuken process.

Our fixes is to change after(1) { dispatch } to

@dispatch_timer ||= after(1) do
  @dispatch_timer = nil
  dispatch
end

What this does is effectively ensure that only one #dispatch will be queued into the future at any one time. Our testing has observed that after this change, this problem does not reoccur.

…ning jobs. Bug causes dispatch function to overwhelm all other functionality, as it is continually re-queued.

phstc · 2016-11-02T22:07:21Z

lib/shoryuken/manager.rb

+        @dispatch_timer ||= after(1) do
+          @dispatch_timer = nil
+          dispatch
+        end


phstc · 2016-11-02T22:08:28Z

@mariokostelac I'm happy with this PR, WDYT? If you like so, could you fix the merge conflicts in #236, then we can merge it?

@atyndall great work and investigation 🍻

Port from #263

mariokostelac · 2016-12-13T19:02:35Z

@atyndall It's a great writeup! I think that it would be even more correct to cancel the old timer and start it again (since we know that there are no jobs in the queue). Am I wrong?

atyndall · 2016-12-13T23:33:27Z

@mariokostelac I did investigate this approach, and although my memory is a bit hazy at this stage, IIRC it seemed like the act of cancelling the existing timer was quite expensive and slowed things down, compared to letting it run.

phstc · 2016-12-13T23:43:46Z

Closing in favor of #284

phstc · 2017-04-03T20:38:05Z

hi @atyndall

You would mind reviewing this https://github.com/phstc/shoryuken/wiki/Polling-strategies#strictpriority? Feel free to update it 🍻

atyndall · 2017-04-04T00:45:15Z

@phstc Added some additional content, looks great!

phstc · 2017-04-04T00:48:58Z

@atyndall awesome! Thanks 🍻

atyndall added 5 commits October 27, 2016 07:52

Improve log messages with queue name, if no options

a2ad5ac

Move shared polling code into BaseStrategy

6040f29

Add StrictPriority polling strategy

68b6f76

Fixes bug with dispatch if run over a long period with many short run…

7cdefa2

…ning jobs. Bug causes dispatch function to overwhelm all other functionality, as it is continually re-queued.

Make changes to support Ruby 2.0

a922c95

atyndall mentioned this pull request Oct 27, 2016

Refactor fetcher, polling strategy and manager #236

Closed

phstc reviewed Nov 2, 2016

View reviewed changes

lib/shoryuken/manager.rb

@dispatch_timer ||= after(1) do

@dispatch_timer = nil

dispatch

end

Copy link

Collaborator

phstc Nov 2, 2016 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! 🍻

phstc pushed a commit that referenced this pull request Dec 3, 2016

This is a port of the dispatch fix in #263

2e7a5e3

phstc pushed a commit that referenced this pull request Dec 3, 2016

This is a port of the dispatch fix in #263

35b6739

phstc pushed a commit that referenced this pull request Dec 3, 2016

This is a port of the dispatch fix in #263

d4ddc7e

phstc mentioned this pull request Dec 3, 2016

Port from #263 #277

Merged

phstc added a commit that referenced this pull request Dec 3, 2016

Merge pull request #277 from phstc/dispatch-fix

2078a4d

Port from #263

phstc mentioned this pull request Dec 5, 2016

Extracted queue polling strategy #106

Closed

mariokostelac mentioned this pull request Dec 13, 2016

Refactor fetcher, polling strategy and manager #284

Merged

phstc closed this Dec 13, 2016

atyndall mentioned this pull request Dec 13, 2016

Strict priority polling w/ backmerge #288

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strict priority polling / bug fix #263

Strict priority polling / bug fix #263

atyndall commented Oct 27, 2016 •

edited

Loading

phstc Nov 2, 2016 •

edited

Loading

phstc commented Nov 2, 2016

mariokostelac commented Dec 13, 2016

atyndall commented Dec 13, 2016 •

edited

Loading

phstc commented Dec 13, 2016

phstc commented Apr 3, 2017

atyndall commented Apr 4, 2017

phstc commented Apr 4, 2017

Strict priority polling / bug fix #263

Strict priority polling / bug fix #263

Conversation

atyndall commented Oct 27, 2016 • edited Loading

phstc Nov 2, 2016 • edited Loading

Choose a reason for hiding this comment

phstc commented Nov 2, 2016

mariokostelac commented Dec 13, 2016

atyndall commented Dec 13, 2016 • edited Loading

phstc commented Dec 13, 2016

phstc commented Apr 3, 2017

atyndall commented Apr 4, 2017

phstc commented Apr 4, 2017

atyndall commented Oct 27, 2016 •

edited

Loading

phstc Nov 2, 2016 •

edited

Loading

atyndall commented Dec 13, 2016 •

edited

Loading