Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(utils): Add stub for rate-limit-based CircuitBreaker class #74557

Merged
merged 6 commits into from
Jul 23, 2024

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Jul 19, 2024

This adds a new CircuitBreaker class, which will eventually contain the methods necessary for a rate-limit-based circuit breaker implementation. Our current circuit breaker implementation has four drawbacks that this is aiming to solve:

  • The standalone circuit_breaker_activated function checks how many errors have been tallied, but doesn't have an accompanying way to actually do the tallying, forcing a reimplementation of the error tracking piece in each scenario in which it's used.
  • The with_circuit_breaker wrapper fixes that, but only works if the errors are allowed to bubble from the (potentially fairly low-level) spot where we're actually making the request all the way up to the (potentially much higher-level) spot where we're deciding whether or not to make the request in the first place. Handle the errors gracefully anywhere below the with_circuit_breaker call and the breaker has no idea they happened, negating the entire purpose of keeping track.
  • By catching errors with a simple try-except, with_circuit_breaker a) counts every kind of exception as an errored request, when that may or may not be accurate, and b) can only track instances of actual Exceptions, not any other problem (like a 500 response) which may occur.
  • Both methods reset the breaker from broken to closed the moment a single request goes through. That works well if the issue is a service having a complete outage, but not if the service is instead overloaded such that it's timing out on, say, 95% of requests. In that case, a single request succeeding isn't in fact a good indicator that the service is working and/or has been fixed.

To solve these problems, the new implementation:

  • Will include a way to track errors, so that particular wheel doesn't have to be reinvented over and over (solving problem 1) and so that it can be called selectively and in non-Exception spots (solving both halves of problem 3).
  • Will do the tracking in a separate method from the one checking if requests should be made, so those two functions can happen in different levels of the call stack (solving problem 2).
  • Will be based on a rate limiter rather than an immediately-resetting tally, so errors don't have to be consecutive to trip the breaker (solving be initial-outage-detection part of problem 4) and so that we don't conclude that an overloaded service has caught back up when it hasn't (solving the service-restoration-detection part of problem 4).

This PR adds the class (along with a CircuitBreakerConfig class), documentation of how to use it, a constructor which sets up the appropriate rate limits, and tests of said constructor. Future PRs will add other helper methods and add and test the core error-tracking and request-gating methods.

The initial use of the circuit breaker will be for Seer-based grouping during ingest. Once we're happy with how it's working, we can go back and refactor the places which use the current circuit breaker to use this new one.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jul 19, 2024
@lobsterkatie lobsterkatie marked this pull request as ready for review July 19, 2024 15:53
Copy link
Contributor

@vartec vartec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nitpicks, but LGTM!

src/sentry/utils/circuit_breaker2.py Outdated Show resolved Hide resolved
src/sentry/utils/circuit_breaker2.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jul 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.14%. Comparing base (30e501a) to head (7fc3a3d).
Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #74557   +/-   ##
=======================================
  Coverage   78.14%   78.14%           
=======================================
  Files        6730     6731    +1     
  Lines      300022   300065   +43     
  Branches    51618    51620    +2     
=======================================
+ Hits       234455   234495   +40     
- Misses      59242    59245    +3     
  Partials     6325     6325           
Files Coverage Δ
src/sentry/utils/circuit_breaker2.py 100.00% <100.00%> (ø)

... and 6 files with indirect coverage changes

@lobsterkatie lobsterkatie force-pushed the kmclb-add-CircuitBreaker-class-stub branch from 2ba434a to 7fc3a3d Compare July 23, 2024 06:16
@lobsterkatie lobsterkatie merged commit 4ae109a into master Jul 23, 2024
51 checks passed
@lobsterkatie lobsterkatie deleted the kmclb-add-CircuitBreaker-class-stub branch July 23, 2024 17:17
lobsterkatie added a commit that referenced this pull request Jul 23, 2024
#74559)

This is a follow-up to #74557, which added the beginnings of a rate-limit-based circuit breaker, in the form of a new `CircuitBreaker` class. In this PR, various helpers, for checking the state of the breaker and the underlying rate limiters and for communicating with redis, have been added to the class. It also adds a `MockCircuitBreaker` subclass for use in tests, which contains a number of methods for mocking both circuit breaker and rate limiter state.

Note that though these helpers don't have accompanying tests, they are tested (indirectly) in the final PR in the series[1], as part of testing the methods which use them.

[1] #74560
lobsterkatie added a commit that referenced this pull request Jul 23, 2024
This completes the work, started in #74557 and #74559, of adding a new, class-and-rate-limit-based circuit breaker implementation to the codebase. In this PR, the core `record_error` and `should_allow_request` methods are added to the `CircuitBreaker` class, along with accompaying tests.
@github-actions github-actions bot locked and limited conversation to collaborators Aug 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants