Redis fault injection#10784
Conversation
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
b9e5de9 to
827f332
Compare
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
cfdddb0 to
4d83a32
Compare
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
|
Waiting on #10794 so that I can push my branch with latest master/updated version history. Not sure why the converage build fails; looks unrelated to my changes and presumably will work on latest master. |
|
Please use |
|
sorry, the logic looks good but I don't understand C++. |
The best thing for you to review is the logic in the fault manager where we get the command for a fault- specifically, the amortization strategy. |
api/envoy/config/filter/network/redis_proxy/v2/redis_proxy.proto
Outdated
Show resolved
Hide resolved
HenryYYang
left a comment
There was a problem hiding this comment.
Some initial comments
| for (FaultMapType::iterator it = range.first; it != range.second; ++it) { | ||
| envoy::extensions::filters::network::redis_proxy::v3::RedisProxy_RedisFault fault = it->second; | ||
| const uint64_t fault_injection_percentage = calculateFaultInjectionPercentage(fault); | ||
| if (random_number % (100 - amortized_fault) < fault_injection_percentage) { |
There was a problem hiding this comment.
why are we doing the (100 - amortized_fault) here? This doesn't seem right to me. If we have 3 faults: 30%, 30%, 40% each. and the random number is r, we'll be doing these checks:
r % 100 < 30
r % 70 < 30
r % 40 < 40
those are different from the correct checks, which are:
r % 100 < 30
r % 100 < 60
r % 100 < 100
or
r % 100 < 30
(r % 100) - 30 < 30
(r % 100) - 60 < 40
There was a problem hiding this comment.
yeah, that was wrong. I've fixed and updated test.
|
|
||
| int FaultManagerImpl::numberOfFaults() { return fault_map_.size(); } | ||
|
|
||
| uint64_t FaultManagerImpl::calculateFaultInjectionPercentage( |
There was a problem hiding this comment.
It seems like this is duplicating methods already in snapshot class
There was a problem hiding this comment.
Good call, I was worried about null/empty inputs to the snapshot method, but if I pre-calculate the adjusted numerator on fault manager construction I don't need to worry about this.
source/extensions/filters/network/redis_proxy/command_splitter_impl.cc
Outdated
Show resolved
Hide resolved
source/extensions/filters/network/redis_proxy/command_splitter_impl.cc
Outdated
Show resolved
Hide resolved
|
|
||
| // To support delay faults, we allow faults to override the regular command latency | ||
| // recording behavior. | ||
| void delayLatencyMetric() { delay_command_latency_ = true; } |
There was a problem hiding this comment.
why do we need these new interface on the base class instead of just in the delay fault?
There was a problem hiding this comment.
The request that is being delayed records its latency before calling the onResponse() callback. That means unless the request is somehow aware that it should not calculate it and whatever is on the other side of the callback will, we will generate the wrong latency data.
In a request the updateStats() call always precedes callbacks.onResponse():
void SplitRequestBase::updateStats(const bool success) {
if (success) {
command_stats_.success_.inc();
} else {
command_stats_.error_.inc();
}
if (!delay_command_latency_) {
command_latency_->complete();
}
}
Does that make sense?
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
docs/root/configuration/listeners/network_filters/redis_proxy_filter.rst
Outdated
Show resolved
Hide resolved
| if (base_fault.fault_enabled().has_default_value()) { | ||
| if (base_fault.fault_enabled().default_value().denominator() == | ||
| envoy::type::v3::FractionalPercent::HUNDRED) { | ||
| default_value_ = base_fault.fault_enabled().default_value().numerator(); | ||
| } else { | ||
| auto denominator = ProtobufPercentHelper::fractionalPercentDenominatorToInt( | ||
| base_fault.fault_enabled().default_value().denominator()); | ||
| default_value_ = | ||
| (base_fault.fault_enabled().default_value().numerator() * 100) / denominator; | ||
| } | ||
| } | ||
| runtime_key_ = base_fault.fault_enabled().runtime_key(); |
There was a problem hiding this comment.
I don't think this code is needed, you should just use FractionalPercent which wraps RuntimeFractionalPercent
There was a problem hiding this comment.
runtime_.snapshot().getInteger(...) takes the default value as int, like most of the other featureEnabled(...) methods. The featureEnabled(...) methods that take a FractionalPercent default value return a boolean, which doesn't tell us what our amortized probability is. Then, we'd need to do the calculation for the amortized percentage with a check on the denominator.
| } | ||
| }; | ||
|
|
||
| std::vector<std::string> Fault::getCommands( |
There was a problem hiding this comment.
nit: I think this is a redundant value copying, is this only used for initializing the commands_ field?
There was a problem hiding this comment.
Correct, this only for initialization. We want to make all the commands lowercase, hence the method. Would you prefer the lowercasing be a linter check?
| // 2. For each fault, calculate the amortized fault injection percentage. | ||
| absl::optional<std::pair<FaultType, std::chrono::milliseconds>> | ||
| FaultManagerImpl::getFaultForCommandInternal(std::string command) { | ||
| auto random_number = random_.random() % 100; |
There was a problem hiding this comment.
create the random number only if we have a matching command. Also what do we do if the fault_injection_percentage adds up higher than 100?
There was a problem hiding this comment.
Ok. I was under the impression we were going to treat that as user error. I'll add that to the documentation section.
There was a problem hiding this comment.
are we checking for this user error anywhere?
| if (it_outer != fault_map_.end()) { | ||
| for (auto fault_ptr : it_outer->second) { | ||
| auto fault_injection_percentage = runtime_.snapshot().getInteger( | ||
| fault_ptr->runtime_key_.value(), fault_ptr->default_value_.value()); |
There was a problem hiding this comment.
would calling value() on an unset optional throw exception here?
There was a problem hiding this comment.
Good call- I'll add a check.
There was a problem hiding this comment.
I'll also add a test.
| std::chrono::milliseconds delay_ms_; | ||
| const std::vector<std::string> commands_; | ||
| bool has_fault_enabled_; | ||
| absl::optional<int> default_value_; |
There was a problem hiding this comment.
what do we do with unset default_value_ and runtime_key_?
There was a problem hiding this comment.
I'm changing it for default_value_ to not be optional and be zero by default.
| * all commands, we use a special ALL_KEYS entry in the map. | ||
| */ | ||
| class FaultManagerImpl : public FaultManager { | ||
| typedef std::unordered_map<std::string, std::vector<FaultPtr>> FaultMap; |
There was a problem hiding this comment.
use using instead of typedef, also should the faultmap be immutable?
There was a problem hiding this comment.
Fault map gets built at runtime, so not immutable. Using noted.
There was a problem hiding this comment.
I don't think it can be changed post constructor, so it should be immutable. Also the getFaultForCommand method should be marked const.
There was a problem hiding this comment.
Ok done
| envoy::extensions::filters::network::redis_proxy::v3::RedisProxy_RedisFault base_fault); | ||
|
|
||
| public: | ||
| FaultType fault_type_; |
There was a problem hiding this comment.
just noticed these fields are public, this feels like an antipattern
There was a problem hiding this comment.
I'll add getters
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
This reverts commit ef0545a. Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
e099fc4 to
3c33eb1
Compare
|
Master branch format check is broken due to some http codec issue, so I've removed the master merge. |
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
|
@FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg please never force push. It makes reviews very difficult. Thank you! |
|
Sorry! I was trying to get further back on master and broke the cardinal rule! |
This reverts commit 048583b. Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
This PR implements fault injection for Redis; specifically delay and error faults (which themselves can have delays added). I chose not to implement a separate filter after discussing with Henry; we concluded that the faults we felt were useful didn't need many levels- just a delay on top of the original fault, if any. In addition, as the Redis protocol doesn't support headers that makes it a bit different again from Envoy's http fault injection. Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
This PR implements fault injection for Redis; specifically delay and error faults (which themselves can have delays added). I chose not to implement a separate filter after discussing with Henry; we concluded that the faults we felt were useful didn't need many levels- just a delay on top of the original fault, if any. In addition, as the Redis protocol doesn't support headers that makes it a bit different again from Envoy's http fault injection. Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com> Signed-off-by: scheler <santosh.cheler@appdynamics.com>
This reverts commit 048583b. Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
This reverts commit 048583b. Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com> Signed-off-by: chaoqinli <chaoqinli@google.com>
This reverts commit 77e9ed7. Signed-off-by: FAYiEKcbD0XFqF2QK2E4viAHg8rMm2VbjYKdjTg <nflacco@lyft.com>
Description:
This PR implements fault injection for Redis; specifically delay and error faults (which themselves can have delays added). I chose not to implement a separate filter after discussing with Henry; we concluded that the faults we felt were useful didn't need many levels- just a delay on top of the original fault, if any. In addition, as the Redis protocol doesn't support headers that makes it a bit different again from Envoy's http fault injection.
Faults record metrics on the original request- and the delay fault adds extra latency which is included in the command latency for that request. Also, faults can apply only to certain commands.
Future work: Add several other faults, including cache misses and connection failures.
Risk Level: Medium
Testing:
Docs Changes: yes- updated Redis configuration section with notes on metrics and fault injection proper in
docs/root/configuration/listeners/network_filters/redis_proxy_filter.rst.Release Notes: yes, under 1.15 (pending)
@HenryYYang for first pass.