Skip to content

Bug with distributed rate limit syncing #335

@GUI

Description

@GUI

We have a process where rate limit information gets synced between our multiple servers. This is done so that we can store the rate limit information locally in memory on each server (for performance reasons), but then still ensure the rate limit information is correct across a cluster of separate machines (since a single user's traffic might be distributed across the individual servers).

I recently noticed quite a few errors like this being thrown from this sync process:

2016-04-28T16:11:15.46838 2016/04/28 16:11:15 [error] 3550#0: [lua] interval_lock.lua:41: timeout_exec(): timeout exec pcall failed: ...pi-umbrella/proxy/jobs/distributed_rate_limit_puller.lua:52: bad "exptime" argument, context: ngx.timer

After digging around, this case can crop up for longer duration rate limits (for example, on APIs that have per day limits). The culprit was that our distributed information didn't have the correct TTL settings, which caused a negative calculation in the TTL when it came time to populate the local memory version of rate limit information.

This wasn't a fatal error, but it could have led to some odd rate limit counts for these longer duration rate limits depending on which server you hit.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions