Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fault-tolerance for cache system errors #577

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
134 changes: 131 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ See the [Backing & Hacking blog post](https://www.kickstarter.com/backing-and-ha
- [Customizing responses](#customizing-responses)
- [RateLimit headers for well-behaved clients](#ratelimit-headers-for-well-behaved-clients)
- [Logging & Instrumentation](#logging--instrumentation)
- [Fault Tolerance & Error Handling](#fault-tolerance--error-handling)
- [Built-in error handling](#built-in-error-handling)
- [Expose Rails cache errors to Rack::Attack](#expose-rails-cache-errors-to-rackattack)
- [Configure cache timeout](#configure-cache-timeout)
- [Failure cooldown](#failure-cooldown)
- [Custom error handling](#custom-error-handling)
- [Testing](#testing)
- [How it works](#how-it-works)
- [About Tracks](#about-tracks)
Expand Down Expand Up @@ -400,11 +406,133 @@ ActiveSupport::Notifications.subscribe(/rack_attack/) do |name, start, finish, r
end
```

## Fault Tolerance & Error Handling

Rack::Attack has a mission-critical dependency on your [cache store](#cache-store-configuration).
If the cache system experiences an outage, it may cause severe latency within Rack::Attack
and lead to an overall application outage.

Although Rack::Attack is designed to be "fault-tolerant by default", depending on your application
setup, additional configuration may be required. Please **read this section carefully** to understand
how to best protect your application.

### Built-in error handling

As a Rack middleware component, Rack::Attack wraps your application's request handling endpoint.
When an error occurs within either within Rack::Attack **or** within your application, by default:

- If the error is a Redis or Dalli cache error, Rack::Attack logs the error then allows the request.
- Otherwise, Rack::Attack raises the error. The request will fail.

All errors will trigger a failure cooldown (see below), regardless of whether they are allowed or raised.

### Expose Rails cache errors to Rack::Attack

If you are using Rack::Attack with Rails cache, by default, Rails cache will **suppress**
any such errors, and Rack::Attack will not be able to handle them properly as per above.
This can be dangerous: if your cache is timing out due to high request volume,
for example, Rack::Attack will continue to blindly send requests to your cache and worsen the problem.

To mitigate this:

* When using Rails cache with `:redis_cache_store`, you'll need to expose errors to Rack::Attack
with a custom error handler as follows:

```ruby
# in your Rails config
config.cache_store = :redis_cache_store,
{ # ...
error_handler: -> (method:, returning:, exception:) do
raise exception if Rack::Attack.calling?
end
}
```

* Rails `:mem_cache_store` and `:dalli_store` suppress all Dalli errors. The recommended
workaround is to set a [Rack::Attack-specific cache configuration](#cache-store-configuration).

### Configure cache timeout

In your application config, it is recommended to set your cache timeout to 0.1 seconds or lower.
Please refer to the [Rails Guide](https://guides.rubyonrails.org/caching_with_rails.html).

```ruby
# Set 100 millisecond timeout on Redis
config.cache_store = :redis_cache_store,
{ # ...
connect_timeout: 0.1,
read_timeout: 0.1,
write_timeout: 0.1
}
```

To use different timeout values specific to Rack::Attack, you may set a
[Rack::Attack-specific cache configuration](#cache-store-configuration).

### Failure cooldown

When any error occurs, Rack::Attack becomes disabled for a 60 seconds "cooldown" period.
This prevents a cache outage from adding timeout latency on each Rack::Attack request.
All errors trigger the failure cooldown, regardless of whether they are allowed or handled.
You can configure the cooldown period as follows:

```ruby
# in initializers/rack_attack.rb

# Disable Rack::Attack for 5 minutes if any cache failure occurs
Rack::Attack.failure_cooldown = 300

# Do not use failure cooldown
Rack::Attack.failure_cooldown = nil
```

### Custom error handling

For most use cases, it is not necessary to re-configure Rack::Attack's default error handling.
However, there are several ways you may do so.

First, you may specify the list of errors to allow as an array of Class and/or String values.

```ruby
# in initializers/rack_attack.rb
Rack::Attack.allowed_errors += [MyErrorClass, 'MyOtherErrorClass']
```

Alternatively, you may define a custom error handler as a Proc. The error handler will receive all errors,
regardless of whether they are on the allow list. Your handler should return either `:allow`, `:block`,
or `:throttle`, or else re-raise the error; other returned values will allow the request.

```ruby
# Set a custom error handler which blocks allowed errors
# and raises all others
Rack::Attack.error_handler = -> (error, request) do
if Rack::Attack.allow_error?(error)
Rails.logger.warn("Blocking error: #{error.class.name} from IP #{request.ip}")
:block
else
raise(error)
end
end
```

Lastly, you can define the error handlers as a Symbol shortcut:

```ruby
# Handle all errors with block response
Rack::Attack.error_handler = :block

# Handle all errors with throttle response
Rack::Attack.error_handler = :throttle

# Handle all errors by allowing the request
Rack::Attack.error_handler = :allow
```

## Testing

A note on developing and testing apps using Rack::Attack - if you are using throttling in particular, you will
need to enable the cache in your development environment. See [Caching with Rails](http://guides.rubyonrails.org/caching_with_rails.html)
for more on how to do this.
When developing and testing apps using Rack::Attack, if you are using throttling in particular,
you must enable the cache in your development environment. See
[Caching with Rails](http://guides.rubyonrails.org/caching_with_rails.html) for how to do this.

### Disabling

Expand Down
139 changes: 122 additions & 17 deletions lib/rack/attack.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,18 @@ class IncompatibleStoreError < Error; end
autoload :Fail2Ban, 'rack/attack/fail2ban'
autoload :Allow2Ban, 'rack/attack/allow2ban'

THREAD_CALLING_KEY = 'rack.attack.calling'
DEFAULT_FAILURE_COOLDOWN = 60
DEFAULT_ALLOWED_ERRORS = %w[Dalli::DalliError Redis::BaseError].freeze

class << self
attr_accessor :enabled, :notifier, :throttle_discriminator_normalizer
attr_accessor :enabled,
:notifier,
:throttle_discriminator_normalizer,
:error_handler,
:allowed_errors,
:failure_cooldown

attr_reader :configuration

def instrument(request)
Expand All @@ -59,6 +69,40 @@ def reset!
cache.reset!
end

def failed!
@last_failure_at = Time.now
end

def failure_cooldown?
return false unless @last_failure_at && failure_cooldown

Time.now < @last_failure_at + failure_cooldown
end

def allow_error?(error)
allowed_errors&.any? do |ignored_error|
case ignored_error
when String then error.class.ancestors.any? {|a| a.name == ignored_error }
else error.is_a?(ignored_error)
end
end
end

def calling?
!!thread_store[THREAD_CALLING_KEY]
end

def with_calling
thread_store[THREAD_CALLING_KEY] = true
yield
ensure
thread_store[THREAD_CALLING_KEY] = nil
end

def thread_store
defined?(RequestStore) ? RequestStore.store : Thread.current
end

extend Forwardable
def_delegators(
:@configuration,
Expand Down Expand Up @@ -86,7 +130,11 @@ def reset!
)
end

# Set defaults
# Set class defaults
self.failure_cooldown = DEFAULT_FAILURE_COOLDOWN
self.allowed_errors = DEFAULT_ALLOWED_ERRORS.dup

# Set instance defaults
@enabled = true
@notifier = ActiveSupport::Notifications if defined?(ActiveSupport::Notifications)
@throttle_discriminator_normalizer = lambda do |discriminator|
Expand All @@ -102,32 +150,89 @@ def initialize(app)
end

def call(env)
return @app.call(env) if !self.class.enabled || env["rack.attack.called"]
return @app.call(env) if !self.class.enabled || env["rack.attack.called"] || self.class.failure_cooldown?

env["rack.attack.called"] = true
env['PATH_INFO'] = PathNormalizer.normalize_path(env['PATH_INFO'])
request = Rack::Attack::Request.new(env)
result = :allow

self.class.with_calling do
begin
result = get_result(request)
rescue StandardError => error
return do_error_response(error, request)
end
end

do_response(result, request)
end

private

def get_result(request)
if configuration.safelisted?(request)
@app.call(env)
:allow
elsif configuration.blocklisted?(request)
# Deprecated: Keeping blocklisted_response for backwards compatibility
if configuration.blocklisted_response
configuration.blocklisted_response.call(env)
else
configuration.blocklisted_responder.call(request)
end
:block
elsif configuration.throttled?(request)
# Deprecated: Keeping throttled_response for backwards compatibility
if configuration.throttled_response
configuration.throttled_response.call(env)
else
configuration.throttled_responder.call(request)
end
:throttle
else
configuration.tracked?(request)
@app.call(env)
:allow
end
end

def do_response(result, request)
case result
when :block then do_block_response(request)
when :throttle then do_throttle_response(request)
else @app.call(request.env)
end
end

def do_block_response(request)
# Deprecated: Keeping blocklisted_response for backwards compatibility
if configuration.blocklisted_response
configuration.blocklisted_response.call(request.env)
else
configuration.blocklisted_responder.call(request)
end
end

def do_throttle_response(request)
# Deprecated: Keeping throttled_response for backwards compatibility
if configuration.throttled_response
configuration.throttled_response.call(request.env)
else
configuration.throttled_responder.call(request)
end
end

def do_error_response(error, request)
self.class.failed!
result = error_result(error, request)
result ? do_response(result, request) : raise(error)
end

def error_result(error, request)
handler = self.class.error_handler
if handler
error_handler_result(handler, error, request)
elsif self.class.allow_error?(error)
:allow
end
end

def error_handler_result(handler, error, request)
result = handler

if handler.is_a?(Proc)
args = [error, request].first(handler.arity)
result = handler.call(*args) # may raise error
end

%i[block throttle].include?(result) ? result : :allow
end
end
end
30 changes: 8 additions & 22 deletions lib/rack/attack/store_proxy/dalli_proxy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,34 +24,26 @@ def initialize(client)
end

def read(key)
rescuing do
with do |client|
client.get(key)
end
with do |client|
client.get(key)
end
end

def write(key, value, options = {})
rescuing do
with do |client|
client.set(key, value, options.fetch(:expires_in, 0), raw: true)
end
with do |client|
client.set(key, value, options.fetch(:expires_in, 0), raw: true)
end
end

def increment(key, amount, options = {})
rescuing do
with do |client|
client.incr(key, amount, options.fetch(:expires_in, 0), amount)
end
with do |client|
client.incr(key, amount, options.fetch(:expires_in, 0), amount)
end
end

def delete(key)
rescuing do
with do |client|
client.delete(key)
end
with do |client|
client.delete(key)
end
end

Expand All @@ -66,12 +58,6 @@ def with
end
end
end

def rescuing
yield
rescue Dalli::DalliError
nil
end
end
end
end
Expand Down
Loading