-
Notifications
You must be signed in to change notification settings - Fork 833
Description
The distributor ingestion rate limit increases the number of "consumed tokens" in the rate limiter once the request is received and before writing to ingesters:
cortex/pkg/distributor/distributor.go
Line 581 in 527f9b5
if !d.ingestionRateLimiter.AllowN(now, userID, totalN) { |
In the event of an ingesters outage (eg. 2+ ingesters are unavailable), this means that each tenant remote write request will consume tokens from its rate limiter even if samples have not been successfully ingested. The client (eg. Prometheus) will retry writes and this will further consume tokens from the rate limiter, until it will eventually hit the rate limit, regardless any samples has been actually ingested.
The burst should protect from this, but in the event of a relatively long outage we would end up consuming the burst too (eg. we set burst to 10x the rate limit).
I'm wondering if a better approach would be checking if enough tokens are still available in the rate limiter once the request is received but actually consuming them from the rate limiter only after samples have been successfully written to ingesters. Due to concurrency, the actual accepted rate could be higher than the limit, but we would err in favour of the customer instead of rate limiting for writes we haven't actually ingested.
Related discussions: