-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fault injection filter #219
Changes from 14 commits
c6ba483
e222bf2
0330789
e599ab3
5a9bb24
db749c5
1740094
4c349c0
5f3d0e1
8eac972
57f3116
d836100
415321a
8b920ec
251661a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
/build | ||
/docs/landing_source/.bundle | ||
/generated | ||
cscope.* | ||
BROWSE |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
.. _config_http_filters_fault_injection: | ||
|
||
Fault Injection | ||
=============== | ||
|
||
The fault injection filter can be used to test the resiliency of | ||
microservices to different forms of failures. The filter can be used to | ||
inject delays and abort requests with user-specified error codes, thereby | ||
providing the ability to stage different failure scenarios such as service | ||
failures, service overloads, high network latency, network partitions, | ||
etc. Faults injection can be limited to a specific set of requests based on | ||
a set of pre-defined request headers. | ||
|
||
The scope of failures is restricted to those that are observable by an | ||
application communicating over the network. CPU and disk failures on the | ||
local host cannot be emulated. | ||
|
||
Currently, the fault injection filter has the following limitations: | ||
|
||
* Faults will be injected on all configured routes in the Envoy instance | ||
* Abort codes are restricted to HTTP status codes only | ||
* Delays are restricted to fixed duration. | ||
|
||
Future versions will include support for restricting faults to specific | ||
routes, injecting *gRPC* and *HTTP/2* specific error codes and delay | ||
durations based on distributions. | ||
|
||
Configuration | ||
------------- | ||
|
||
*Note: The fault injection filter must be inserted before any other filter, | ||
including the router filter.* | ||
|
||
.. code-block:: json | ||
|
||
{ | ||
"type" : "decoder", | ||
"name" : "fault", | ||
"config" : { | ||
"abort" : { | ||
"abort_percent" : "...", | ||
"http_status" : "..." | ||
}, | ||
"delay" : { | ||
"type" : "...", | ||
"fixed_delay_percent" : "...", | ||
"fixed_duration_ms" : "..." | ||
}, | ||
"headers" : [] | ||
} | ||
} | ||
|
||
abort.abort_percent | ||
*(required, integer)* The percentage of requests that | ||
should be aborted with the specified *http_status* code. Valid values | ||
range from 0 to 100. | ||
|
||
abort.http_status | ||
*(required, integer)* The HTTP status code that will be used as the | ||
response code for the request being aborted. | ||
|
||
delay.type: | ||
*(required, string)* Specifies the type of delay being | ||
injected. Currently only *fixed* delay type (step function) is supported. | ||
|
||
delay.fixed_delay_percent: | ||
*(required, integer)* The percentage of requests that will | ||
be delayed for the duration specified by *fixed_duration_ms*. Valid | ||
values range from 0 to 100. | ||
|
||
delay.fixed_duration_ms: | ||
*(required, integer)* The delay duration in | ||
milliseconds. Must be greater than 0. | ||
|
||
:ref:`headers <config_http_filters_fault_injection_headers>` | ||
*(optional, array)* Specifies a set of headers that the filter should match on. | ||
|
||
The abort and delay blocks can be omitted. If they are not specified in the | ||
configuration file, their respective values will be obtained from the | ||
runtime. | ||
|
||
Runtime | ||
------- | ||
|
||
The HTTP fault injection filter supports the following runtime settings: | ||
|
||
http.fault.abort.abort_percent | ||
% of requests that will be aborted if the headers match. Defaults to the | ||
*abort_percent* specified in config. If the config does not contain an | ||
*abort* block, then *abort_percent* defaults to 0. | ||
|
||
http.fault.abort.http_status | ||
HTTP status code that will be used as the of requests that will be | ||
aborted if the headers match. Defaults to the HTTP status code specified | ||
in the config. If the config does not contain an *abort* block, then | ||
*http_status* defaults to 0. | ||
|
||
http.fault.delay.fixed_delay_percent | ||
% of requests that will be delayed if the headers match. Defaults to the | ||
*delay_percent* specified in the config or 0 otherwise. | ||
|
||
http.fault.delay.fixed_duration_ms | ||
The delay duration in milliseconds. If not specified, the | ||
*fixed_duration_ms* specified in the config will be used. If this field | ||
is missing from both the runtime and the config, no delays will be | ||
injected. | ||
|
||
.. _config_http_filters_fault_injection_headers: | ||
|
||
Headers | ||
------- | ||
|
||
The fault injection filter can be applied selectively to requests that | ||
match a set of headers specified in the fault filter config. The chances of | ||
actual fault injection further depend on the values of *abort_percent* and | ||
*fixed_delay_percent* parameters. Each element of the array in the | ||
*headers* field should be in the following format: | ||
|
||
.. code-block:: json | ||
|
||
[ | ||
{"name": "...", "value": "..."} | ||
] | ||
|
||
name | ||
*(required, string)* Specifies the name of the header in the request. | ||
|
||
value | ||
*(optional, string)* Specifies the value of the header. If the value is | ||
absent a request that has the *name* header will match, regardless of the | ||
header's value. | ||
|
||
The filter will check the request's headers against all the specified | ||
headers in the filter config. A match will happen if all the headers in the | ||
config are present in the request with the same values (or based on | ||
presence if the ``value`` field is not in the config). | ||
|
||
Statistics | ||
---------- | ||
|
||
The fault filter outputs statistics in the *http.<stat_prefix>.fault.* namespace. The :ref:`stat | ||
prefix <config_http_conn_man_stat_prefix>` comes from the owning HTTP connection manager. | ||
|
||
.. csv-table:: | ||
:header: Name, Type, Description | ||
:widths: 1, 1, 2 | ||
|
||
delays_injected, Counter, Total requests that were delayed | ||
aborts_injected, Counter, Total requests that were aborted | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
#include "fault_filter.h" | ||
|
||
#include "envoy/event/timer.h" | ||
#include "envoy/http/codes.h" | ||
#include "envoy/http/header_map.h" | ||
#include "envoy/stats/stats.h" | ||
|
||
#include "common/common/assert.h" | ||
#include "common/common/empty_string.h" | ||
#include "common/http/codes.h" | ||
#include "common/http/header_map_impl.h" | ||
#include "common/http/headers.h" | ||
|
||
namespace Http { | ||
|
||
FaultFilterConfig::FaultFilterConfig(const Json::Object& json_config, Runtime::Loader& runtime, | ||
const std::string& stat_prefix, Stats::Store& stats) | ||
: runtime_(runtime), stats_{ALL_FAULT_FILTER_STATS(POOL_COUNTER_PREFIX(stats, stat_prefix))} { | ||
|
||
if (json_config.hasObject("abort")) { | ||
const Json::Object& abort = json_config.getObject("abort"); | ||
abort_percent_ = static_cast<uint64_t>(abort.getInteger("abort_percent", 0)); | ||
|
||
if (abort_percent_ > 0) { | ||
if (abort_percent_ > 100) { | ||
throw EnvoyException("abort percentage cannot be greater than 100"); | ||
} | ||
} | ||
|
||
// TODO: Throw error if invalid return code is provided | ||
if (abort.hasObject("http_status")) { | ||
http_status_ = static_cast<uint64_t>(abort.getInteger("http_status")); | ||
} else { | ||
throw EnvoyException("missing http_status in abort config"); | ||
} | ||
} | ||
|
||
if (json_config.hasObject("delay")) { | ||
const Json::Object& delay = json_config.getObject("delay"); | ||
const std::string type = delay.getString("type", "empty"); | ||
if (type == "fixed") { | ||
fixed_delay_percent_ = static_cast<uint64_t>(delay.getInteger("fixed_delay_percent", 0)); | ||
fixed_duration_ms_ = static_cast<uint64_t>(delay.getInteger("fixed_duration_ms", 0)); | ||
|
||
if (fixed_delay_percent_ > 0) { | ||
if (fixed_delay_percent_ > 100) { | ||
throw EnvoyException("delay percentage cannot be greater than 100"); | ||
} | ||
} | ||
if (0 == fixed_duration_ms_) { | ||
throw EnvoyException("delay duration must be greater than 0"); | ||
} | ||
} else { | ||
throw EnvoyException("delay type is either empty or invalid"); | ||
} | ||
} | ||
|
||
if (json_config.hasObject("headers")) { | ||
std::vector<Json::Object> config_headers = json_config.getObjectArray("headers"); | ||
for (const Json::Object& header_map : config_headers) { | ||
// allow header value to be empty, allows matching to be only based on header presence. | ||
fault_filter_headers_.emplace_back(Http::LowerCaseString(header_map.getString("name")), | ||
header_map.getString("value", EMPTY_STRING)); | ||
} | ||
} | ||
} | ||
|
||
FaultFilter::FaultFilter(FaultFilterConfigPtr config) : config_(config) {} | ||
|
||
FaultFilter::~FaultFilter() { ASSERT(!delay_timer_); } | ||
|
||
// Delays and aborts are independent events. One can inject a delay | ||
// followed by an abort or inject just a delay or abort. In this callback, | ||
// if we inject a delay, then we will inject the abort in the delay timer | ||
// callback. | ||
FilterHeadersStatus FaultFilter::decodeHeaders(HeaderMap& headers, bool) { | ||
// Check for header matches first | ||
if (!Router::ConfigUtility::matchHeaders(headers, config_->filterHeaders())) { | ||
return FilterHeadersStatus::Continue; | ||
} | ||
|
||
if (config_->runtime().snapshot().featureEnabled("fault.http.delay.fixed_delay_percent", | ||
config_->delayPercent())) { | ||
uint64_t duration_ms = config_->runtime().snapshot().getInteger( | ||
"fault.http.delay.fixed_duration_ms", config_->delayDuration()); | ||
|
||
// Delay only if the duration is >0ms | ||
if (0 != duration_ms) { | ||
delay_timer_ = | ||
callbacks_->dispatcher().createTimer([this]() -> void { postDelayInjection(); }); | ||
delay_timer_->enableTimer(std::chrono::milliseconds(duration_ms)); | ||
config_->stats().delays_injected_.inc(); | ||
return FilterHeadersStatus::StopIteration; | ||
} | ||
} | ||
|
||
if (config_->runtime().snapshot().featureEnabled("fault.http.abort.abort_percent", | ||
config_->abortPercent())) { | ||
abortWithHTTPStatus(); | ||
config_->stats().aborts_injected_.inc(); | ||
return FilterHeadersStatus::StopIteration; | ||
} | ||
|
||
return FilterHeadersStatus::Continue; | ||
} | ||
|
||
FilterDataStatus FaultFilter::decodeData(Buffer::Instance&, bool) { | ||
return FilterDataStatus::Continue; | ||
} | ||
|
||
FilterTrailersStatus FaultFilter::decodeTrailers(HeaderMap&) { | ||
return FilterTrailersStatus::Continue; | ||
} | ||
|
||
FaultFilterStats FaultFilter::generateStats(const std::string& prefix, Stats::Store& store) { | ||
std::string final_prefix = prefix + "fault."; | ||
return {ALL_FAULT_FILTER_STATS(POOL_COUNTER_PREFIX(store, final_prefix))}; | ||
} | ||
|
||
void FaultFilter::onResetStream() { resetTimerState(); } | ||
|
||
void FaultFilter::postDelayInjection() { | ||
resetTimerState(); | ||
// Delays can be followed by aborts | ||
if (config_->runtime().snapshot().featureEnabled("fault.http.abort.abort_percent", | ||
config_->abortPercent())) { | ||
abortWithHTTPStatus(); | ||
config_->stats().aborts_injected_.inc(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should go in abortWithHTTPStatus() There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason I left it out of that function was because in future, we might add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would just move it for now. We can deal with the future when it happens. :) |
||
} else { | ||
// Continue request processing | ||
callbacks_->continueDecoding(); | ||
} | ||
} | ||
|
||
void FaultFilter::abortWithHTTPStatus() { | ||
// TODO: check http status codes obtained from runtime | ||
Http::HeaderMapPtr response_headers{new HeaderMapImpl{ | ||
{Headers::get().Status, std::to_string(config_->runtime().snapshot().getInteger( | ||
"fault.http.abort.http_status", config_->abortCode()))}}}; | ||
callbacks_->encodeHeaders(std::move(response_headers), true); | ||
} | ||
|
||
void FaultFilter::resetTimerState() { | ||
if (delay_timer_) { | ||
delay_timer_->disableTimer(); | ||
delay_timer_.reset(); | ||
} | ||
} | ||
|
||
void FaultFilter::setDecoderFilterCallbacks(StreamDecoderFilterCallbacks& callbacks) { | ||
callbacks_ = &callbacks; | ||
callbacks_->addResetStreamCallback([this]() -> void { onResetStream(); }); | ||
} | ||
|
||
} // Http |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should go in abortWithHTTPStatus()