-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manager occasionally dying #428
Comments
Before #369 Shoryuken used to continuous retry failures while fetching, but this was also causing problems, in case that a retry wouldn't ever work - like if the instance permanently loses the connection. Maybe we could try some exponential backoff: def fetch(queue, limit)
# fetch code
rescue AWSERROR # need to check the base error class name
if retry_count <= 10
sleep(retry_count * 1)
retry
else
raise
end
end Would a retry after a few seconds work in your case? |
Thanks for clarifying. Yep, a retry like that would work fine for us. We're running shoryuken on ~12 servers total right now, and we've only seen the issue on one server at a time. |
@ethangunderson do you know what's the exception that was thrown? So I can create a more specific |
That's all that I have in logs. |
@ethangunderson I've just released 3.1.11, could you try it out? This version will auto retry fetch errors (up to 3 times). |
We're running into a problem where ~10 times a month, a shoryuken manager will die. The logs around the shutdown look like this:
The error message seems to indicate that SQS returned a 500 when the manager attempted to fetch messages. That exception is caught here, which results in a shutdown here.
Does that seem right? Is that the right behavior, or am I missing something?
The text was updated successfully, but these errors were encountered: