fault: use FractionalPercent for percent#3978
fault: use FractionalPercent for percent#3978mattklein123 merged 16 commits intoenvoyproxy:masterfrom
Conversation
The existing FaultDelay config used uint32 for the percent field which limited the user to configure percentages only in terms of whole numbers between 0 and 100. FractionalPercent allows finer control over the percent values by allowing fractions to be specified in the configuration, for example 0.0001%. Signed-off-by: Venil Noronha <veniln@vmware.com>
mattklein123
left a comment
There was a problem hiding this comment.
Not a full review but a quick drive by to get started. Thank you!
| // on which the delay will be injected. | ||
| uint32 percent = 2 [(validate.rules).uint32.lte = 100]; | ||
| // The percentage of operations/connection requests on which the delay will be injected. | ||
| envoy.type.FractionalPercent percent = 2; |
There was a problem hiding this comment.
We can't just change this. You will need to deprecate the old percent and add a new one.
There was a problem hiding this comment.
call the new one as percentage. And i think there is also some protoc validation with it
| // 10,000. | ||
| // | ||
| // **Example**: 1/10000 = 0.01%. | ||
| TEN_THOUSAND = 1; |
There was a problem hiding this comment.
These can't be changed. You will need to add new ones at the end. Are they really necessary though? Seems unnecessary to me with TEN_THOUSAND and MILLION. I would revert these changes personally.
| { | ||
| "type" : "...", | ||
| "fixed_delay_percent" : "...", | ||
| "fixed_delay_percent" : { |
There was a problem hiding this comment.
We don't accept v1 changes anymore. Please revert all changes to v1 code, schema, docs, etc.
|
@mattklein123 thanks for the quick review! I'll address the issues you mentioned. |
Signed-off-by: Venil Noronha <veniln@vmware.com>
|
@rshriram do you mind doing first pass review on this? |
| values range from 0 to 100. | ||
| *(required, envoy.type.FractionalPercent)* The percentage of requests that will be delayed for | ||
| the duration specified by *fixed_duration_ms*. | ||
|
|
| *(required, integer)* Probability of an eligible MongoDB operation to be affected by the | ||
| injected fault when there is no active fault. Valid values are integers in a range of [0, 100]. | ||
| *(required, envoy.type.FractionalPercent)* Probability of an eligible MongoDB operation to be | ||
| affected by the injected fault when there is no active fault. |
include/envoy/runtime/runtime.h
Outdated
| virtual bool featureEnabled(const std::string& key, uint64_t default_value, uint64_t random_value, | ||
| uint64_t num_buckets) const PURE; | ||
| virtual bool sampleFeatureEnabled(const std::string& key, uint64_t default_value, | ||
| uint64_t random_value, uint64_t num_buckets) const PURE; |
There was a problem hiding this comment.
@mattklein123 how are you guys doing fractional percent with runtimes? IOW, do you need runtime support for fractional percent? If not, all changes to this file can be eliminated
There was a problem hiding this comment.
I think access_log_impl uses runtime for sampling the feature status as seen here. Should we just stick to a similar approach for faults?
There was a problem hiding this comment.
Looking again, why is this being renamed ?
There was a problem hiding this comment.
Because of a signature conflict between the routines highlighted here.
| "percent" : { | ||
| "type" : "integer", | ||
| "minimum" : 0, | ||
| "maximum" : 100 |
rshriram
left a comment
There was a problem hiding this comment.
Thanks for doing this. Left comments inline.
* Revert v1 code, schema & doc changes * Deprecate percent field in FaultDelay * Introduce percentage field in FaultDelay of type FractionalPercent Signed-off-by: Venil Noronha <veniln@vmware.com>
4fd33a0 to
4daaac1
Compare
rshriram
left a comment
There was a problem hiding this comment.
You need to do this for both delay and aborts.
Follow the convert old to new approach instead of a if old_set do x or if new_set do y.
source/common/config/filter_json.cc
Outdated
| if (json_config_delay->hasObject("fixed_delay_percentage")) { | ||
| const auto json_percentage = json_config_delay->getObject("fixed_delay_percentage"); | ||
| JSON_UTIL_SET_FRACTIONALPERCENT(*json_percentage, *delay, percentage); | ||
| } |
There was a problem hiding this comment.
You don’t need this. This is for translation of legacy config only.
source/common/config/filter_json.cc
Outdated
| if (json_fault->hasObject("percentage")) { | ||
| const auto json_percentage = json_fault->getObject("percentage"); | ||
| JSON_UTIL_SET_FRACTIONALPERCENT(*json_percentage, *delay, percentage); | ||
| } |
| return sampleFeatureEnabled(key, default_value, generator_.random(), num_buckets); | ||
| } | ||
|
|
||
| bool SnapshotImpl::sampleFeatureEnabled(const std::string& key, uint64_t default_value, |
There was a problem hiding this comment.
Skip renaming and overload old function name ?
| enabled |= config_->runtime().snapshot().featureEnabled(downstream_cluster_delay_percent_key_, | ||
| bool enabled = false; | ||
| if (fault_settings_->delayPercent() != 0) { | ||
| enabled |= config_->runtime().snapshot().featureEnabled(DELAY_PERCENT_KEY, |
There was a problem hiding this comment.
This is not right. Runtime overrides config. If there is no runtime the config value becomes the default.
| downstream_cluster_delay_percent_key_, fault_settings_->delayPercentage().numerator(), | ||
| ProtobufPercentHelper::fractionalPercentDenominatorToInt( | ||
| fault_settings_->delayPercentage())); | ||
| } |
There was a problem hiding this comment.
Flip the logic. Read new runtime or percentage. If not present read old runtime.
| uint64_t abort_percent_{}; // 0-100 | ||
| uint64_t http_status_{}; // HTTP or gRPC return codes | ||
| uint64_t fixed_delay_percent_{}; // 0-100 | ||
| envoy::type::FractionalPercent fixed_delay_percentage_{}; // 0-100 |
There was a problem hiding this comment.
Here is a simpler way. Convert both percents to fractional (delay and abort).
Then when reading/translating config, if there is no fractional but old percent field is set, read it and convert it to fractions (out of 100).
Then in fault filter you simply have to check for new runtime fields for fractional percentage. If absent check old field. If that’s also absent use the value provided in the config.
There was a problem hiding this comment.
I'd prefer to implement the change to abort in a separate PR.
* Revert v1 changes * Improve the logic that determines whether delay should be enabled Signed-off-by: Venil Noronha <veniln@vmware.com>
|
|
||
| if (fault.has_delay()) { | ||
| fixed_delay_percent_ = fault.delay().percent(); | ||
| fixed_delay_percentage_ = fault.delay().percentage(); |
There was a problem hiding this comment.
do you have validtion set to ensure that this field is not null ?
Also, fold fixed_delay_percent_ into fixed_delay_percentage.. i.e. convert the old percent (if set) to new form..
The problem though is that you can't differentiate between a 0 percent and unset percent in hte old one as its uint32.. Sigh..
Signed-off-by: Venil Noronha <veniln@vmware.com>
* Convert uint32/uint64 percentages to FractionalPercent * Add null validation for FractionalPercent * Document proto changes in version_history Signed-off-by: Venil Noronha <veniln@vmware.com>
rshriram
left a comment
There was a problem hiding this comment.
This is looking very good. A few minor nits related to duplicated tests. Also, please do the same change for aborts as well (abort percentage instead of integer percent)
| // | ||
| // .. attention:: | ||
| // | ||
| // This field is deprecated and `percentage` should be used instead. |
There was a problem hiding this comment.
Use of integer percent value is deprecated. Use fractional percentage field instead
docs/root/intro/version_history.rst
Outdated
| * config: v1 disabled by default. v1 support remains available until October via flipping --v2-config-only=false. | ||
| * config: v1 disabled by default. v1 support remains available until October via setting :option:`--allow-deprecated-v1-api`. | ||
| * fault: deprecated :ref:`percent <envoy_api_field_config.filter.fault.v2.FaultDelay.percent>` in | ||
| favor of :ref:`fractional percentages <envoy_api_field_config.filter.fault.v2.FaultDelay.percentage>` for fault injections. |
There was a problem hiding this comment.
- fault: Added support for fractional percentages...
| std::string upstream_cluster_; // restrict faults to specific upstream cluster | ||
| uint64_t abort_percent_{}; // 0-100 | ||
| uint64_t http_status_{}; // HTTP or gRPC return codes | ||
| envoy::type::FractionalPercent fixed_delay_percentage_{}; // 0-100 |
There was a problem hiding this comment.
fix comment to indicate that this is fractional
| uint64_t fixed_delay_percent_{}; // 0-100 | ||
| uint64_t fixed_duration_ms_{}; // in milliseconds | ||
| std::string upstream_cluster_; // restrict faults to specific upstream cluster | ||
| uint64_t abort_percent_{}; // 0-100 |
There was a problem hiding this comment.
Since you are doing this for delays, why don't you change the aborts as well to fractional ? It makes the implementation streamlined.
There was a problem hiding this comment.
Since this PR is big in itself, should we do it in another PR?
There was a problem hiding this comment.
The abort stuff is small. Confined to http only. If done right, it should be touching same files and not much.
There was a problem hiding this comment.
Yes, I'm on it right now. Figured out it wasn't as big of a change as I thought it would be.
| } | ||
|
|
||
| // The percentage of operations/connection requests on which the delay will be injected. | ||
| envoy.type.FractionalPercent percentage = 4; |
There was a problem hiding this comment.
Can you do the same for aborts as well ?
| EXPECT_EQ(0UL, config_->stats().aborts_injected_.value()); | ||
| } | ||
|
|
||
| TEST_F(FaultFilterTest, V2FixedDelayNonZeroDuration) { |
There was a problem hiding this comment.
rather than copying, modify the test case below directly. JSON will soon be out.
|
|
||
| TEST_F(MongoProxyFilterTest, DelayFaults) { | ||
| setupDelayFault(true); | ||
| initializeFilter(); |
There was a problem hiding this comment.
same comment as above. Simply change this function. You don't need a new function for this.. All you need to test is whether old percent is being respected or not.
You could even change this into DelayFaultWithDeprecatedIntegerPercent and DelayFaultWithFractionalPercent
Signed-off-by: Venil Noronha <veniln@vmware.com>
|
Same pr please. It’s a mass search and replace. Makes it easy to follow patterns |
* Deprecate int percent in FaultAbort proto * Introduce FractionalPercent percentages for aborts * Update abort tests Signed-off-by: Venil Noronha <veniln@vmware.com>
|
Implemented the changes to abort in 3d56b3f. |
|
ping @mattklein123 .. this is ready for a final pass. |
mattklein123
left a comment
There was a problem hiding this comment.
LGTM, some small comments. Thank you!
docs/root/intro/version_history.rst
Outdated
| `google.api.HttpBody <https://github.com/googleapis/googleapis/blob/master/google/api/httpbody.proto>`_. | ||
| * config: v1 disabled by default. v1 support remains available until October via flipping --v2-config-only=false. | ||
| * config: v1 disabled by default. v1 support remains available until October via setting :option:`--allow-deprecated-v1-api`. | ||
| * fault: added support for :ref:`fractional percentages <envoy_api_field_config.filter.fault.v2.FaultDelay.percentage>`. |
There was a problem hiding this comment.
Please note the deprecations in deprecated.md
include/envoy/runtime/runtime.h
Outdated
| * of [0, num_buckets). | ||
| * @return true if the feature is enabled. | ||
| */ | ||
| virtual bool sampleFeatureEnabled(const std::string& key, uint64_t default_value, |
There was a problem hiding this comment.
nit: I see why you had to change the name here, but IMO "sample" is a bit strange in the sense that the other featureEnabled() functions also do sampling. A few options here:
- Get rid of the versions of featureEnabled() that don't require specifying number of buckets. Larger change.
- Some other name.
featureEnabledEx? (Bad, but not sure what would be better).
There was a problem hiding this comment.
@mattklein123 thanks for reviewing! I've addressed your other comments in f197fe2, and I'm working on this one now.
|
|
||
| if (fault.has_abort()) { | ||
| abort_percent_ = fault.abort().percent(); | ||
| if (fault.abort().has_percentage()) { |
There was a problem hiding this comment.
extract to helper function for duplicated code here and below.
| uint64_t fixed_delay_percent_{}; // 0-100 | ||
| uint64_t fixed_duration_ms_{}; // in milliseconds | ||
| std::string upstream_cluster_; // restrict faults to specific upstream cluster | ||
| envoy::type::FractionalPercent abort_percentage_{}; // 0.0-100.0 |
There was a problem hiding this comment.
{} not needed. Also, comment is no longer relevant I think. Same below.
| duration_ms_(PROTOBUF_GET_MS_REQUIRED(fault_config, fixed_delay)) {} | ||
| uint32_t delayPercent() const { return delay_percent_; } | ||
| : duration_ms_(PROTOBUF_GET_MS_REQUIRED(fault_config, fixed_delay)) { | ||
| if (fault_config.has_percentage()) { |
There was a problem hiding this comment.
same utility function extracted above
* Update DEPRECATED and version_history docs * Fix FractionalPercent variable declaration issue * Extract repeating logic into a macro Signed-off-by: Venil Noronha <veniln@vmware.com>
Signed-off-by: Venil Noronha <veniln@vmware.com>
Signed-off-by: Venil Noronha <veniln@vmware.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for the cleanup, LGTM. @rshriram can you take one last pass given the new changes?
|
|
||
| if (!runtime.snapshot().featureEnabled("tracing.global_enabled", | ||
| config.tracingConfig()->overall_sampling_, result)) { | ||
| config.tracingConfig()->overall_sampling_, result, 100)) { |
|
@venilnoronha I think you lost some of your changes. You haven't implemented either option that matt suggested but the call sites seem to have changed |
include/envoy/runtime/runtime.h
Outdated
| */ | ||
| virtual bool featureEnabled(const std::string& key, uint64_t default_value, | ||
| uint64_t random_value) const PURE; | ||
| uint64_t num_buckets) const PURE; |
There was a problem hiding this comment.
did you intend to delete this function and simply use the one below everywhere?
There was a problem hiding this comment.
I wanted to use the function below whenever it was possible to supply a random value. However, there are a few cases where there isn't a random number available at call time, and that's where I use this.
| uint64_t random_value) const { | ||
| return featureEnabled(key, default_value, random_value, 100); | ||
| } | ||
|
|
There was a problem hiding this comment.
These two functions are not equivalent.. The stable random value is not same as number of buckets above. Am I missing something here>?
There was a problem hiding this comment.
random_value and num_buckets aren't being used interchangeably. Either we pass both random_value and num_buckets to featureEnabled, or we only pass num_buckets. There's no way to pass random_value without num_buckets.
|
So I suggest this.. Replace all call sites of featureEnabled() with the longer version that always passes the number of buckets - But do this in a separate PR. This should essentially be a mass search-and-replace. Should have no issues with any reviewer. Then this PR will automatically reduce to the changes related to fault injection. Alternatively, simply rename the sampleFeatureEnabled to something else like featureEnabledEx and retain changes before you followed matt's option |
Signed-off-by: Venil Noronha <veniln@vmware.com>
|
@rshriram I've reverted the changes and have renamed @mattklein123 as @rshriram suggested, I'll take up option 1 in a separate PR. Thanks, |
I would rather do one set of changes. If you want to do option 1, can you just do that PR first, we can review/merge, and then we can merge master into this? |
|
On a second thought, I'd prefer keeping |
include/envoy/runtime/runtime.h
Outdated
| * @return true if the feature is enabled. | ||
| */ | ||
| virtual bool featureEnabledEx(const std::string& key, uint64_t default_value, | ||
| uint64_t num_buckets) const PURE; |
There was a problem hiding this comment.
Is there any callsite that is calling the 3 arg form of featureEnabledEx supplying buckets instead of random value? If not, lets get rid of this function.
|
So here is another easier way to resolve this: Add a random() function to the Loader interface. |
Signed-off-by: Venil Noronha <veniln@vmware.com>
mattklein123
left a comment
There was a problem hiding this comment.
I'm fine with the Ex change, but will defer to @rshriram and you to decide if you want to do something different.
include/envoy/runtime/runtime.h
Outdated
| * @return true if the feature is enabled. | ||
| */ | ||
| virtual bool featureEnabledEx(const std::string& key, uint64_t default_value, | ||
| uint64_t num_buckets) const PURE; |
This commit removes the 3 param sampleFeatureEnabled routine and renames the 4 param sampleFeatureEnabled routine back to featureEnabled. It also exports the RandomGenerator via the Loader in order to provide random values for featureEnabled calls. Signed-off-by: Venil Noronha <veniln@vmware.com>
include/envoy/runtime/runtime.h
Outdated
| /** | ||
| * @return RandomGenerator& the random generator. | ||
| */ | ||
| virtual RandomGenerator& random() PURE; |
There was a problem hiding this comment.
Do we really need to add this to this interface? I guess it's OK, but can't you just pass the random generator to where you need it? It should be in the filter config factory? This is what we do elsewhere.
There was a problem hiding this comment.
Thought that I'd have to pass the RandomGenerator instance a long way, but I was wrong. I've fixed this in f167d92.
54c6ac0 to
ea1d5ea
Compare
This commit removes the random() routine from the Loader and instead passes the RandomGenerator via the constructor to FaultFilterConfig and the ProxyFilter. Signed-off-by: Venil Noronha <veniln@vmware.com>
ea1d5ea to
f167d92
Compare
|
@mattklein123 @rshriram thanks for reviewing/merging. |
* origin/master: (38 commits) test: add tests for corner-cases around sending requests before run() starts or after run() ends. (envoyproxy#4114) perf: reduce the memory usage of LC Trie construction (envoyproxy#4117) test: moving redundant code in websocket_integration_test to utilities (envoyproxy#4127) test: make YamlLoadFromStringFail less picky about error msg. (envoyproxy#4141) rbac: add rbac network filter. (envoyproxy#4083) fuzz: route lookup and header finalization fuzzer. (envoyproxy#4116) Set content-type and content-length (envoyproxy#4113) fault: use FractionalPercent for percent (envoyproxy#3978) test: Fix inverted exact match logic in IntegrationTcpClient::waitForData() (envoyproxy#4134) Added cluster_name to load assignment config for static cluster (envoyproxy#4123) ssl: refactor ContextConfig to use TlsCertificateConfig (envoyproxy#4115) syscall: refactor OsSysCalls for deeper errno latching (envoyproxy#4111) thrift_proxy: fix oneway bugs (envoyproxy#4025) Do not crash when converting YAML to JSON fails (envoyproxy#4110) config: allow unknown fields flag (take 2) (envoyproxy#4096) Use a jittered backoff strategy for handling HdsDelegate stream/connection failures (envoyproxy#4108) bazel: use GCS remote cache (envoyproxy#4050) Add thread local cache of overload action states (envoyproxy#4090) Added TCP healthcheck capabilities to the HdsDelegate (envoyproxy#4079) secret: add secret provider interface and use it for TlsCertificates (envoyproxy#4086) ... Signed-off-by: Snow Pettersen <snowp@squareup.com>
Description:
The existing
FaultDelayconfig useduint32for thepercentfield which limited the user to configure percentages only in terms of whole numbers between 0 and 100.FractionalPercentallows finer control over the percent values by allowing fractions to be specified in the configuration, for example, 0.0001%.Signed-off-by: Venil Noronha veniln@vmware.com
Risk Level: Med
Testing: Added 2 new tests and updated existing tests
Docs Changes: N/A
Release Notes: Added deprecation note and introduced
FractionalPercentpercentages.Fixes #3904
/cc @rshriram