Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fault injection filter #219

Merged
merged 15 commits into from
Nov 21, 2016
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
/build
/docs/landing_source/.bundle
/generated
cscope.*
BROWSE
150 changes: 150 additions & 0 deletions docs/configuration/http_filters/fault_filter.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
.. _config_http_filters_fault_injection:

Fault Injection
===============

The fault injection filter can be used to test the resiliency of
microservices to different forms of failures. The filter can be used to
inject delays and abort requests with user-specified error codes, thereby
providing the ability to stage different failure scenarios such as service
failures, service overloads, high network latency, network partitions,
etc. Faults injection can be limited to a specific set of requests based on
a set of pre-defined request headers.

The scope of failures is restricted to those that are observable by an
application communicating over the network. CPU and disk failures on the
local host cannot be emulated.

Currently, the fault injection filter has the following limitations:

* Faults will be injected on all configured routes in the Envoy instance
* Abort codes are restricted to HTTP status codes only
* Delays are restricted to fixed duration.

Future versions will include support for restricting faults to specific
routes, injecting *gRPC* and *HTTP/2* specific error codes and delay
durations based on distributions.

Configuration
-------------

*Note: The fault injection filter must be inserted before any other filter,
including the router filter.*

.. code-block:: json

{
"type" : "decoder",
"name" : "fault",
"config" : {
"abort" : {
"abort_percent" : "...",
"http_status" : "..."
},
"delay" : {
"type" : "...",
"fixed_delay_percent" : "...",
"fixed_duration_ms" : "..."
},
"headers" : []
}
}

abort.abort_percent
*(required, integer)* The percentage of requests that
should be aborted with the specified *http_status* code. Valid values
range from 0 to 100.

abort.http_status
*(required, integer)* The HTTP status code that will be used as the
response code for the request being aborted.

delay.type:
*(required, string)* Specifies the type of delay being
injected. Currently only *fixed* delay type (step function) is supported.

delay.fixed_delay_percent:
*(required, integer)* The percentage of requests that will
be delayed for the duration specified by *fixed_duration_ms*. Valid
values range from 0 to 100.

delay.fixed_duration_ms:
*(required, integer)* The delay duration in
milliseconds. Must be greater than 0.

:ref:`headers <config_http_filters_fault_injection_headers>`
*(optional, array)* Specifies a set of headers that the filter should match on.

The abort and delay blocks can be omitted. If they are not specified in the
configuration file, their respective values will be obtained from the
runtime.

Runtime
-------

The HTTP fault injection filter supports the following runtime settings:

http.fault.abort.abort_percent
% of requests that will be aborted if the headers match. Defaults to the
*abort_percent* specified in config. If the config does not contain an
*abort* block, then *abort_percent* defaults to 0.

http.fault.abort.http_status
HTTP status code that will be used as the of requests that will be
aborted if the headers match. Defaults to the HTTP status code specified
in the config. If the config does not contain an *abort* block, then
*http_status* defaults to 0.

http.fault.delay.fixed_delay_percent
% of requests that will be delayed if the headers match. Defaults to the
*delay_percent* specified in the config or 0 otherwise.

http.fault.delay.fixed_duration_ms
The delay duration in milliseconds. If not specified, the
*fixed_duration_ms* specified in the config will be used. If this field
is missing from both the runtime and the config, no delays will be
injected.

.. _config_http_filters_fault_injection_headers:

Headers
-------

The fault injection filter can be applied selectively to requests that
match a set of headers specified in the fault filter config. The chances of
actual fault injection further depend on the values of *abort_percent* and
*fixed_delay_percent* parameters. Each element of the array in the
*headers* field should be in the following format:

.. code-block:: json

[
{"name": "...", "value": "..."}
]

name
*(required, string)* Specifies the name of the header in the request.

value
*(optional, string)* Specifies the value of the header. If the value is
absent a request that has the *name* header will match, regardless of the
header's value.

The filter will check the request's headers against all the specified
headers in the filter config. A match will happen if all the headers in the
config are present in the request with the same values (or based on
presence if the ``value`` field is not in the config).

Statistics
----------

The fault filter outputs statistics in the *http.<stat_prefix>.fault.* namespace. The :ref:`stat
prefix <config_http_conn_man_stat_prefix>` comes from the owning HTTP connection manager.

.. csv-table::
:header: Name, Type, Description
:widths: 1, 1, 2

delays_injected, Counter, Total requests that were delayed
aborts_injected, Counter, Total requests that were aborted

1 change: 1 addition & 0 deletions docs/configuration/http_filters/http_filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ HTTP filters
:maxdepth: 2

buffer_filter
fault_filter
dynamodb_filter
grpc_http1_bridge_filter
health_check_filter
Expand Down
1 change: 1 addition & 0 deletions source/common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ add_library(
http/http2/codec_impl.cc
http/http2/conn_pool.cc
http/filter/buffer_filter.cc
http/filter/fault_filter.cc
http/filter/ratelimit.cc
http/user_agent.cc
http/utility.cc
Expand Down
155 changes: 155 additions & 0 deletions source/common/http/filter/fault_filter.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
#include "fault_filter.h"

#include "envoy/event/timer.h"
#include "envoy/http/codes.h"
#include "envoy/http/header_map.h"
#include "envoy/stats/stats.h"

#include "common/common/assert.h"
#include "common/common/empty_string.h"
#include "common/http/codes.h"
#include "common/http/header_map_impl.h"
#include "common/http/headers.h"

namespace Http {

FaultFilterConfig::FaultFilterConfig(const Json::Object& json_config, Runtime::Loader& runtime,
const std::string& stat_prefix, Stats::Store& stats)
: runtime_(runtime), stats_{ALL_FAULT_FILTER_STATS(POOL_COUNTER_PREFIX(stats, stat_prefix))} {

if (json_config.hasObject("abort")) {
const Json::Object& abort = json_config.getObject("abort");
abort_percent_ = static_cast<uint64_t>(abort.getInteger("abort_percent", 0));

if (abort_percent_ > 0) {
if (abort_percent_ > 100) {
throw EnvoyException("abort percentage cannot be greater than 100");
}
}

// TODO: Throw error if invalid return code is provided
if (abort.hasObject("http_status")) {
http_status_ = static_cast<uint64_t>(abort.getInteger("http_status"));
} else {
throw EnvoyException("missing http_status in abort config");
}
}

if (json_config.hasObject("delay")) {
const Json::Object& delay = json_config.getObject("delay");
const std::string type = delay.getString("type", "empty");
if (type == "fixed") {
fixed_delay_percent_ = static_cast<uint64_t>(delay.getInteger("fixed_delay_percent", 0));
fixed_duration_ms_ = static_cast<uint64_t>(delay.getInteger("fixed_duration_ms", 0));

if (fixed_delay_percent_ > 0) {
if (fixed_delay_percent_ > 100) {
throw EnvoyException("delay percentage cannot be greater than 100");
}
}
if (0 == fixed_duration_ms_) {
throw EnvoyException("delay duration must be greater than 0");
}
} else {
throw EnvoyException("delay type is either empty or invalid");
}
}

if (json_config.hasObject("headers")) {
std::vector<Json::Object> config_headers = json_config.getObjectArray("headers");
for (const Json::Object& header_map : config_headers) {
// allow header value to be empty, allows matching to be only based on header presence.
fault_filter_headers_.emplace_back(Http::LowerCaseString(header_map.getString("name")),
header_map.getString("value", EMPTY_STRING));
}
}
}

FaultFilter::FaultFilter(FaultFilterConfigPtr config) : config_(config) {}

FaultFilter::~FaultFilter() { ASSERT(!delay_timer_); }

// Delays and aborts are independent events. One can inject a delay
// followed by an abort or inject just a delay or abort. In this callback,
// if we inject a delay, then we will inject the abort in the delay timer
// callback.
FilterHeadersStatus FaultFilter::decodeHeaders(HeaderMap& headers, bool) {
// Check for header matches first
if (!Router::ConfigUtility::matchHeaders(headers, config_->filterHeaders())) {
return FilterHeadersStatus::Continue;
}

if (config_->runtime().snapshot().featureEnabled("fault.http.delay.fixed_delay_percent",
config_->delayPercent())) {
uint64_t duration_ms = config_->runtime().snapshot().getInteger(
"fault.http.delay.fixed_duration_ms", config_->delayDuration());

// Delay only if the duration is >0ms
if (0 != duration_ms) {
delay_timer_ =
callbacks_->dispatcher().createTimer([this]() -> void { postDelayInjection(); });
delay_timer_->enableTimer(std::chrono::milliseconds(duration_ms));
config_->stats().delays_injected_.inc();
return FilterHeadersStatus::StopIteration;
}
}

if (config_->runtime().snapshot().featureEnabled("fault.http.abort.abort_percent",
config_->abortPercent())) {
abortWithHTTPStatus();
config_->stats().aborts_injected_.inc();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should go in abortWithHTTPStatus()

return FilterHeadersStatus::StopIteration;
}

return FilterHeadersStatus::Continue;
}

FilterDataStatus FaultFilter::decodeData(Buffer::Instance&, bool) {
return FilterDataStatus::Continue;
}

FilterTrailersStatus FaultFilter::decodeTrailers(HeaderMap&) {
return FilterTrailersStatus::Continue;
}

FaultFilterStats FaultFilter::generateStats(const std::string& prefix, Stats::Store& store) {
std::string final_prefix = prefix + "fault.";
return {ALL_FAULT_FILTER_STATS(POOL_COUNTER_PREFIX(store, final_prefix))};
}

void FaultFilter::onResetStream() { resetTimerState(); }

void FaultFilter::postDelayInjection() {
resetTimerState();
// Delays can be followed by aborts
if (config_->runtime().snapshot().featureEnabled("fault.http.abort.abort_percent",
config_->abortPercent())) {
abortWithHTTPStatus();
config_->stats().aborts_injected_.inc();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should go in abortWithHTTPStatus()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I left it out of that function was because in future, we might add abortWithGRPCStatus and abortWithHTTP2Error . Rather than duplicating the stats increment counter in all 3 functions, I thought its better to keep it here. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just move it for now. We can deal with the future when it happens. :)

} else {
// Continue request processing
callbacks_->continueDecoding();
}
}

void FaultFilter::abortWithHTTPStatus() {
// TODO: check http status codes obtained from runtime
Http::HeaderMapPtr response_headers{new HeaderMapImpl{
{Headers::get().Status, std::to_string(config_->runtime().snapshot().getInteger(
"fault.http.abort.http_status", config_->abortCode()))}}};
callbacks_->encodeHeaders(std::move(response_headers), true);
}

void FaultFilter::resetTimerState() {
if (delay_timer_) {
delay_timer_->disableTimer();
delay_timer_.reset();
}
}

void FaultFilter::setDecoderFilterCallbacks(StreamDecoderFilterCallbacks& callbacks) {
callbacks_ = &callbacks;
callbacks_->addResetStreamCallback([this]() -> void { onResetStream(); });
}

} // Http
Loading