Allow access to Shoryuken utilization metrics #672

cjlarose · 2021-07-10T00:21:39Z

Related: #671

The idea here is to create some sort of public API such that users can query for Shoryuken's current runtime state in terms of utilization. Ideally, such an API should consider the possibility that Shoryuken is using multiple processing groups and allow users to discern which utilization metrics are associated with which group.

The most obvious time that users might want access to this data is, of course, when that data changes. One option would be to provide the information to middleware directly. This would allow users to build something akin to sidekiq-statsd.

class StatsMiddleware
  def call(worker_instance, _queue, _sqs_msg, _body)
    manager = worker_instance.manager
    puts manager.group_name
    puts manager.running?
    puts manager.busy_processors
    puts manager.max_processors
    yield
  end
end

Shoryuken.configure_server do |config|
  config.server_middleware do |chain|
    chain.add StatsMiddleware
  end
end

This is a little bit awkward, though, because while users would be notified whenever a new job is picked up (busy_processors in incremented), they wouldn't be notified whenever a processor becomes available (busy_processors is decremented), because while executing middleware, a processor is necessarily currently being consumed.

Another option would be to expose some callbacks that are guaranteed to be executed any time that the utilization metrics change. I think this ultimately gives users the greatest flexibility on how they want to use the data. For example:

Shoryuken.configure_server do |config|
  config.on(:manager_startup) do |event|
    puts event.group_name
    puts event.processor_metrics.running?
    puts event.processor_metrics.busy_processors
    puts event.processor_metrics.max_processors
  end

  config.on(:worker_assignment) do |event|
    puts event.group_name
    puts event.processor_metrics.running?
    puts event.processor_metrics.busy_processors
    puts event.processor_metrics.max_processors
  end

  config.on(:worker_complete) do |event|
    puts event.group_name
    puts event.processor_metrics.running?
    puts event.processor_metrics.busy_processors
    puts event.processor_metrics.max_processors
  end
end

The text was updated successfully, but these errors were encountered:

rbroemeling · 2021-07-12T23:06:23Z

@cjlarose Do you have a preferred direction for this functionality?

cjlarose · 2021-07-13T01:35:22Z

I'm experimenting with adding this functionality by developing a shoryuken-statsd gem alongside whatever we'll need in Shoryuken itself. That work is here: https://github.com/cjlarose/shoryuken-statsd

That way I can be confident that we'll have the right hooks in Shoryuken so that folks can build their own metrics integration if they need to. Out of curiosity @rbroemeling, would you be interested in statsd integration specifically, or do you expect to use a different platform/protocol?

cjlarose · 2021-07-13T06:37:35Z

Opened #673 as a draft. I ended up adding a new event called :utilization_update instead of forcing folks to subscribe to a bunch of different events for all the times that the utilization metrics would change. Let me know if that new event would work for your use case.

rbroemeling · 2021-07-19T22:38:05Z

@cjlarose We're specifically looking for Datadog metrics (i.e., to send metrics to dogstatsd). So, they're statsd-compatible, but we might want to add our own implementation as well so that we can use some of Datadog's specific implementation details and enhancements on the metrics that we'll be reporting from Shoryuken.

I like the idea of writing an event system into Shoryuken that ensures that people can easily implement their own statistics gathering if/when necessary, that's a great plan.

rbroemeling · 2021-07-19T22:45:56Z

Your draft PR looks good, @cjlarose. One concern that I have is that in extreme cases this could cause storms of stats updates (i.e., assume each loop retrieves 10 messages, then each loop will "storm" 20 statsd packets into the statsd listener). In high-load cases, doing two stats reports per job (i.e., one on assignation, one on completion) might be an unnecessary amount of statsd load.

Brainstorming some other metrics that might be interesting (though, not positive that these fit with the utilization_update event):

number of SQS messages retrieved from AWS on the last call
duration of main dispatch loop (one iteration took XXms)

cjlarose · 2021-07-19T23:08:42Z

Awesome to hear that you're interested in Datadog specifically because that's what I was targeting when I was experimenting with the shoryuken-statsd project.

One concern that I have is that in extreme cases this could cause storms of stats updates (i.e., assume each loop retrieves 10 messages, then each loop will "storm" 20 statsd packets into the statsd listener).

This is something I thought of, but one thing to consider is that clients like dogstatsd don't send 1 UDP packet for every metric update: instead, a bunch of updates are buffered internally and then the whole buffer is flushed into one big packet (depending on the network MTU). I think I might just try to get an MVP working first, and then we can adjust accordingly. Either way, I think it's possible to defer the responsibility of throttling/debouncing/batching to the client from Shoryuken's perspective. And plus, there might be some clients that actually do want to be notified on every update, so we should at least give them that option in case they need it.

Brainstorming some other metrics that might be interesting (though, not positive that these fit with the utilization_update event):
* number of SQS messages retrieved from AWS on the last call

* duration of main dispatch loop (one iteration took XXms)

I've been thinking about some the same ideas, too! I think what I'll do is try to wrap up shoryuken-statsd's MVP which would just be the utilization metrics, and then we can go from there.

cjlarose mentioned this issue Jul 10, 2021

Update dispatch hook to have utilization metrics #671

Closed

cjlarose mentioned this issue Jul 13, 2021

Add new :utilization_update event #673

Merged

cjlarose closed this as completed in #673 Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow access to Shoryuken utilization metrics #672

Allow access to Shoryuken utilization metrics #672

cjlarose commented Jul 10, 2021

rbroemeling commented Jul 12, 2021

cjlarose commented Jul 13, 2021 •

edited

Loading

cjlarose commented Jul 13, 2021 •

edited

Loading

rbroemeling commented Jul 19, 2021

rbroemeling commented Jul 19, 2021

cjlarose commented Jul 19, 2021

Allow access to Shoryuken utilization metrics #672

Allow access to Shoryuken utilization metrics #672

Comments

cjlarose commented Jul 10, 2021

rbroemeling commented Jul 12, 2021

cjlarose commented Jul 13, 2021 • edited Loading

cjlarose commented Jul 13, 2021 • edited Loading

rbroemeling commented Jul 19, 2021

rbroemeling commented Jul 19, 2021

cjlarose commented Jul 19, 2021

cjlarose commented Jul 13, 2021 •

edited

Loading

cjlarose commented Jul 13, 2021 •

edited

Loading