Stop queuing up heartbeat threads #345

phstc · 2017-03-24T02:25:43Z

phstc · 2017-03-24T02:30:05Z

lib/shoryuken/manager.rb

@@ -19,8 +19,11 @@ def initialize(fetcher, polling_strategy)
      @polling_strategy = polling_strategy

      @heartbeat = Concurrent::TimerTask.new(run_now: true,
-                                             execution_interval: HEARTBEAT_INTERVAL,
-                                             timeout_interval: 60) { dispatch }
+                                             execution_interval: HEARTBEAT_INTERVAL) { @pool.post { dispatch } if @dispatching.false? }


@waynerobinson I know it isn't the most beautiful solution in the world. But that fix the thread leaking, I still need to play more with that. But I think the "final" fix will be something like that.

That will probably do it. It reuses the same worker pool for dispatching (which I don't really think is problematic) and the @pool.post { ... } will return instantly, meaning the timeout never occurs (because of the implementation, you can't actually turn off the timeout code in TimerTask).

And this still lets you use TimerTask as a supervisor-like solution for dispatch, ensuring it stays executing, even if there is an error.

@waynerobinson thanks for reviewing it 🍻

I've just changed it to use a separate and single thread pool.

I'm now running some performance tests, let's see how it goes 🙏

Looks good.

As long as the block finishes faster than the execution_timeout then this leak won't occur, even if the currently broken state of concurrent-ruby.

Now if only the team there could be convinced that there is a bug in the first place. 😜

phstc · 2017-03-24T02:31:32Z

lib/shoryuken/manager.rb

+
+      Concurrent::TimerTask.new(execution_interval: 1) do
+        Shoryuken.logger.info "Threads: #{Thread.list.size}"
+      end.execute


@waynerobinson With concurrency 10: bundle exec bin/shoryuken -c 10 -q test -r ./test.rb, the thread size kept consistent at 16.

Fix #338

phstc · 2017-03-24T03:06:13Z

lib/shoryuken/manager.rb


      @pool = Concurrent::FixedThreadPool.new(@count, max_queue: @count)
+      @dispatcher_pool = Concurrent::SingleThreadExecutor.new


As long as the block finishes faster than the execution_timeout then this leak won't occur, even if the currently broken state of concurrent-ruby.

@waynerobinson now as I'm using this SingleThreadExecutor, it's no longer an issue, it supports only one thread at time and the fallback policy is discard. If we try to post while there's a thread running, it will just discard the post.

Now if only the team there could be convinced that there is a bug in the first place. 😜

I will try to reply on that issue as well. I think they should allow to configure a max_queue and fallback_policy for the TimerTask, too bad they don't allow it, and it ends up having a max_queue: infinity.

Except that TimerTask is the thing that's leaking threads because of the ones it creates to do the timeout. But I'm sure the scheduling of a dispatch task occurs faster than the heartbeat period.

Don't really need a max_queue in TimerTask because the way the class is designed it should be locking around the actual task and only letting one operate at a time.

But the lack of a correct implementation for the timeout causes thread leaks in the timeout monitor if the task takes longer than the execution interval (it should be the timeout interval at the very least, but it never refers to this… hence the bug) to complete.

The current design intention of TimerTask seems to have it limited to only ever using 2 threads.

phstc · 2017-03-24T03:09:53Z

lib/shoryuken/manager.rb

@@ -3,7 +3,7 @@ class Manager
    include Util

    BATCH_LIMIT        = 10
-    HEARTBEAT_INTERVAL = 0.1
+    HEARTBEAT_INTERVAL = 1


@waynerobinson I'm testing HEARTBEAT_INTERVAL = 1, as I'm also calling dispatch in the processor_done callback.

That's probably going to cause issues with empty queues as it will take a least HEARTBEAT_INTERVAL before it will request new messages.

Instead of this you could just have dispatch run in a loop and rely on the TimerTask to restart it if it ever dies.

Something like this:

def dispatch_now while true if ready.zero? return unless (queue = @polling_strategy.next_queue) logger.debug { "Ready: #{ready}, Busy: #{busy}, Active Queues: #{@polling_strategy.active_queues}" } batched_queue?(queue) ? dispatch_batch(queue) : dispatch_single_messages(queue) else sleep MINIMUM_WAIT # 0.05 or something to prevent CPU pegging end end end

Although instead of ready.zero?, is there just some type of latch on the pool to await a free thread so you don't have to do the sleep?

Although instead of ready.zero?, is there just some type of latch on the pool to await a free thread so you don't have to do the sleep?

@waynerobinson not I'm aware of 🐼

phstc · 2017-03-24T11:57:19Z

lib/shoryuken/manager.rb

-      batched_queue?(queue) ? dispatch_batch(queue) : dispatch_single_messages(queue)
+        batched_queue?(queue) ? dispatch_batch(queue) : dispatch_single_messages(queue)
+      ensure
+        dispatch_async


sleep MINIMUM_WAIT # 0.05 or something to prevent CPU pegging

@waynerobinson I need to keep it running longer, but it seems to be working without a sleep. As the @dispatcher_executor discards post when there's a thread running, the request for fetching is kind of doing sleep, it should only be like without any sleep if all processors are busy or non queue is available as you pointed out. I will keep monitoring the CPU, if that does not work, we can def add sleep 0.05.

We always use Receive Wait Time of > 0, so it will always block if the network is up anyway. The sleep is really just for worst case situations.

phstc · 2017-03-24T12:01:51Z

lib/shoryuken/manager.rb


      @fetcher = fetcher
      @polling_strategy = polling_strategy

-      @heartbeat = Concurrent::TimerTask.new(run_now: true,


@waynerobinson I'm doing the loop in a ensure block, and also calling dispatch_async when a processor done, just in case. So I don't think we need to the heartbeat anymore.

Not sure if the dispatch_async in processor_done does anything given the normal dispatcher re-runs at the end anyway.

But the ensure and re-run should keep the dispatch loop running I think. 👍

@waynerobinson you are right, maybe I was too overcautious. I added it in there just in case. But I'm considering removing it, ensure should work.

waynerobinson · 2017-03-24T12:07:32Z

Great work! 🍾

phstc force-pushed the fix-338 branch 2 times, most recently from ddc901a to 7baae10 Compare March 24, 2017 02:27

phstc commented Mar 24, 2017

View reviewed changes

phstc force-pushed the fix-338 branch from 7baae10 to 29b6684 Compare March 24, 2017 02:32

Stop queuing up heartbeat threads

9395fe6

Fix #338

phstc force-pushed the fix-338 branch 4 times, most recently from 71a56e3 to 1cd46bf Compare March 24, 2017 03:00

phstc commented Mar 24, 2017

View reviewed changes

Use SingleThreadExecutor for the heartbeat

8bf1dba

phstc force-pushed the fix-338 branch from 1cd46bf to 8bf1dba Compare March 24, 2017 03:07

phstc commented Mar 24, 2017

View reviewed changes

Remove heartbeat in favor of "dispatcher_loop"

4abd1d7

phstc force-pushed the fix-338 branch from ca45fdc to 4abd1d7 Compare March 24, 2017 15:08

phstc merged commit 2e0ad70 into master Mar 24, 2017

phstc deleted the fix-338 branch March 24, 2017 15:24

jjoos mentioned this pull request Apr 10, 2017

CPU overload with v3.0.4 #348

Closed

bnferguson mentioned this pull request May 25, 2018

Thread leak in logs-drain (potentially due to TimerTask bug) travis-ci/travis-logs#178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop queuing up heartbeat threads #345

Stop queuing up heartbeat threads #345

phstc commented Mar 24, 2017

phstc Mar 24, 2017

waynerobinson Mar 24, 2017

phstc Mar 24, 2017

waynerobinson Mar 24, 2017

phstc Mar 24, 2017

phstc Mar 24, 2017 •

edited

Loading

waynerobinson Mar 24, 2017 •

edited

Loading

phstc Mar 24, 2017

waynerobinson Mar 24, 2017

waynerobinson Mar 24, 2017

waynerobinson Mar 24, 2017

phstc Mar 24, 2017

phstc Mar 24, 2017

waynerobinson Mar 24, 2017

phstc Mar 24, 2017

waynerobinson Mar 24, 2017

phstc Mar 24, 2017

waynerobinson commented Mar 24, 2017


		@pool = Concurrent::FixedThreadPool.new(@count, max_queue: @count)
		@dispatcher_pool = Concurrent::SingleThreadExecutor.new

Stop queuing up heartbeat threads #345

Stop queuing up heartbeat threads #345

Conversation

phstc commented Mar 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phstc Mar 24, 2017 • edited Loading

Choose a reason for hiding this comment

waynerobinson Mar 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waynerobinson commented Mar 24, 2017

phstc Mar 24, 2017 •

edited

Loading

waynerobinson Mar 24, 2017 •

edited

Loading