Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttle courier queues when the channel has rate limit redis key not… #382

Merged
merged 3 commits into from
Dec 8, 2021

Conversation

norkans7
Copy link
Contributor

@norkans7 norkans7 commented Oct 8, 2021

… expired

@codecov
Copy link

codecov bot commented Oct 8, 2021

Codecov Report

Merging #382 (9560dc9) into main (ca40349) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #382      +/-   ##
==========================================
+ Coverage   71.56%   71.59%   +0.02%     
==========================================
  Files          94       94              
  Lines        8269     8277       +8     
==========================================
+ Hits         5918     5926       +8     
  Misses       1754     1754              
  Partials      597      597              
Impacted Files Coverage Δ
queue/queue.go 72.00% <ø> (ø)
handlers/whatsapp/whatsapp.go 78.05% <100.00%> (+0.40%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca40349...9560dc9. Read the comment docs.

redis.call("zrem", KEYS[2] .. ":active", queue)
return {"retry", ""}
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicpottier if you are willing, would love your review on this

@nicpottier
Copy link
Collaborator

This feels slightly odd in that it seems like we are basically using a redis key as a communication channel to move a channel into being throttled when we next try to pop a message off. Why are we doing it that way as opposed to moving the channel to the throttled list explicitly when we decide we are throttled? I guess the advantage of this method is we can set how long we throttle the channel for by tweaking the expiration but the downside is that every call to pop a message off a queue now has an extra lookup. I think I'd rather see our throttled logic expanded to just allow throttles more than a second if that's what is needed:
https://github.com/nyaruka/courier/blob/main/queue/queue.go#L212

That could work by having both the queue and say a counter in the throttle list and that gets decremented in the dethrottler, moving anything that now has a count <= 0 to the active queue set.

@nicpottier
Copy link
Collaborator

Maybe an easier way to do this would be to move throttling to be a key based system instead of using a set. IE, we insert a throttled:[channel-uuid] with an expiration in redis when a channel is throttled with an expiration and that's how we check whether a channel is throttled or not. I'm not entirely sure off the top of my head why this wasn't the method we used and it does seem fairly obvious so I wonder if we are missing something there.

@nicpottier
Copy link
Collaborator

Oh looking at this more closely I see why it is the way it is. We still use throttled queues to track # of workers so we can do fair queuing. And we don't want throttled to just expire entirely because we need to actively re-add those throttled channels back to the active set, so it can't just be a passive expiration. I guess the proposed solution makes sense then. The extra lookup per pop probably isn't a big deal.

if rr.StatusCode == 429 || rr.StatusCode == 503 {
rateLimitKey := fmt.Sprintf("rate_limit:%s", msg.Channel().UUID().String())
rc.Do("set", rateLimitKey, "engaged")
rc.Do("expire", rateLimitKey, 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the 429 response tell us when we can retry ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we get

{"meta":{"api_status":"stable","version":"2.35.4"},"errors":[{"code":1015,"title":"Too many requests","details":"Rate limiting engaged - Too Many Requests."}]}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we're backing down for 2 seconds but do we have any information if that's a reasonable back down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are pausing the queue to send for 2 seconds and try again once we have the 429 status we pause for another 2 seconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developers.facebook.com/docs/whatsapp/api/rate-limits#capacity

The rate limit we are having is that we had more than 50 requests per second so stopping 2 seconds seems that will help reset the limit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok can you add a little comment to the code here explaining the rational for 2 and maybe mention that in future it should use header values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then let's get this out there!

@norkans7 norkans7 requested a review from rowanseymour December 8, 2021 14:16
@rowanseymour rowanseymour merged commit 46475f1 into main Dec 8, 2021
@rowanseymour rowanseymour deleted the fix-WA-rate-limit branch December 8, 2021 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants