-
Notifications
You must be signed in to change notification settings - Fork 5.3k
runtime: codifying runtime guarded features #6134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5f368a3
0147b76
7f3fc25
3ac22ff
33bbf5e
ad5d657
5b838e8
26bdbd6
1cc4738
b5cf76c
e063208
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -137,6 +137,61 @@ maximize the chances of your PR being merged. | |
| [envoy-filter-example](https://github.com/envoyproxy/envoy-filter-example) (for example making a new | ||
| branch so that CI can pass) it is your responsibility to follow through with merging those | ||
| changes back to master once the CI dance is done. | ||
| * If your PR is a high risk change, the reviewer may ask that you runtime guard | ||
| it. See the section on runtime guarding below. | ||
|
|
||
|
|
||
| # Runtime guarding | ||
|
|
||
| Some high risk changes in Envoy are deemed worthy of runtime guarding. Instead of just replacing | ||
| old code with new code, both code paths are supported for between one Envoy release (if it is | ||
| guarded due to performance concerns) and a full deprecation cycle (if it is a high risk behavioral | ||
| change). | ||
|
|
||
| The canonical way to runtime guard a feature is | ||
| ``` | ||
| if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name")) { | ||
| [new code path] | ||
| } else { | ||
| [old_code_path] | ||
| } | ||
| ``` | ||
| Runtime guarded features named with the "envoy.reloadable_features." prefix must be safe to flip | ||
| true or false on running Envoy instances. In some situations it may make more sense to | ||
| latch the value in a member variable on class creation, for example: | ||
|
|
||
| ``` | ||
| bool use_new_code_path_ = | ||
| Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name") | ||
| ``` | ||
|
|
||
| This should only be done if the lifetime of the object in question is relatively short compared to | ||
| the lifetime of most Envoy instances, i.e. latching state on creation of the | ||
| Http::ConnectionManagerImpl or all Network::ConnectionImpl classes, to ensure that the new behavior | ||
| will be exercised as the runtime value is flipped, and that the old behavior will trail off over | ||
| time. | ||
|
|
||
| Runtime guarded features may either set true (running the new code by default) in the initial PR, | ||
| after a testing interval, or during the next release cycle, at the PR author's and reviewing | ||
| maintainer's discretion. Generally all runtime guarded features will be set true when a | ||
| release is cut, and the old code path will be deprecated at that time. Runtime features | ||
| are set true by default by inclusion in | ||
| [source/common/runtime/runtime_features.h](https://github.com/envoyproxy/envoy/blob/master/source/common/runtime/runtime_features.h) | ||
|
|
||
| There are four suggested options for testing new runtime features: | ||
|
|
||
| 1. Create a per-test Runtime::LoaderSingleton as done in [DeprecatedFieldsTest.IndividualFieldDisallowedWithRuntimeOverride](https://github.com/envoyproxy/envoy/blob/master/test/common/protobuf/utility_test.cc) | ||
| 2. Create a [parameterized test](https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#how-to-write-value-parameterized-tests) | ||
| where the set up of the test sets the new runtime value explicitly to | ||
| GetParam() as outlined in (1). | ||
| 3. Set up integration tests with custom runtime defaults as documented in the | ||
| [integration test README](https://github.com/envoyproxy/envoy/blob/master/test/integration/README.md) | ||
| 4. Run a given unit test with the new runtime value explicitly set true as done | ||
| for [runtime_flag_override_test](https://github.com/envoyproxy/envoy/blob/master/test/common/runtime/BUILD) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it also worth specifically calling out parameterized tests as a useful tool here? I find that particularly useful when the logical behavior is supposed to be the same regardless of flag value (ie, goal is pure refactoring or performance improvement). |
||
|
|
||
| Runtime code is held to the same standard as regular Envoy code, so both the old | ||
| path and the new should have 100% coverage both with the feature defaulting true | ||
| and false. | ||
|
|
||
| # PR review policy for maintainers | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -83,6 +83,13 @@ class Snapshot { | |
| // configuration of "false" in runtime config. | ||
| virtual bool deprecatedFeatureEnabled(const std::string& key) const PURE; | ||
|
|
||
| // Returns true if a runtime feature is enabled. | ||
| // | ||
| // Runtime features are used to easily allow switching between old and new code paths for high | ||
| // risk changes. The intent is for the old code path to be short lived - the old code path is | ||
| // deprecated as the feature is defaulted true, and removed with the following Envoy release. | ||
| virtual bool runtimeFeatureEnabled(const std::string& key) const PURE; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As written I think someone could If they do the inefficient If I switch to string_view I think the map lookup will convert the string piece to the string on every single call, and there's no way to avoid it, so I think that's strictly less performant? I could update the instructions to encourage static string on the canonical non-latch path if we think that makes more sense
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as it is an
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TIL! Ok, got that to work for the flat hash, but unfortunately to string_view all the way up the API we need to also change Runtime::EntryMap in include/envoy/runtime/runtime.h to have a comparator. I'd prefer to land this as-is and do a separate PR converting to string piece, especially as none of the call sites will need to change. What do you think of a TODO and a follow-up?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah fine to leave as TODO.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds fine to me. |
||
|
|
||
| /** | ||
| * Test if a feature is enabled using the built in random generator. This is done by generating | ||
| * a random number in the range 0-99 and seeing if this number is < the value stored in the | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| #include "common/runtime/runtime_features.h" | ||
|
|
||
| namespace Envoy { | ||
| namespace Runtime { | ||
|
|
||
| // Add additional features here to enable the new code paths by default. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry to harp on this, but I think what would help me/readers here is a short description of the behavior of what an entry in runtime_features and disallowed_features do. I know this is described elsewhere, but it's hard enough to keep track of (and important enough) that IMO we should replicate a short version here? |
||
| // | ||
| // Per documentation in CONTRIBUTING.md is is expected that new high risk code paths be guarded | ||
| // by runtime feature guards, i.e | ||
| // | ||
| // if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.my_feature_name")) { | ||
| // [new code path] | ||
| // else { | ||
| // [old_code_path] | ||
| // } | ||
| // | ||
| // Runtime features are false by default, so the old code path is exercised. | ||
| // To make a runtime feature true by default, add it to the array below. | ||
| // New features should be true-by-default for an Envoy release cycle before the | ||
| // old code path is removed. | ||
| // | ||
| // If issues are found that that require a runtime feature to be disabled, it should be reported | ||
| // ASAP by filing a bug on github. Overriding non-buggy code is strongly discouraged to avoid the | ||
| // problem of the bugs being found after the old code path has been removed. | ||
| constexpr const char* runtime_features[] = { | ||
| // Enabled | ||
| "envoy.reloadable_features.test_feature_true", | ||
| }; | ||
|
|
||
| // This is a list of configuration fields which are disallowed by default in Envoy | ||
| // | ||
| // By default, use of proto fields marked as deprecated in their api/.../*.proto file will result | ||
| // in a logged warning, so that Envoy users have a warning that they are using deprecated fields. | ||
| // | ||
| // During the Envoy release cycle, the maintainer team runs a script which will upgrade currently | ||
| // deprecated features to be disallowed (adding them to the list below) at which point use of said | ||
| // feature will cause a hard-failure (ProtoValidationException) instead of a logged warning. | ||
| // | ||
| // The release cycle after a feature has been marked disallowed, it is officially removable, and | ||
| // the maintainer team will run a script creating a tracking issue for proto and code clean up. | ||
| // | ||
| // TODO(alyssawilk) handle deprecation of reloadable_features and update the above comment. Ideally | ||
| // runtime override of a deprecated feature will log(warn) on runtime-load if not deprecated | ||
| // and hard-fail once it has been deprecated. | ||
|
|
||
| constexpr const char* disallowed_features[] = { | ||
| // Acts as both a test entry for deprecated.proto and a marker for the Envoy | ||
| // deprecation scripts. | ||
| "envoy.deprecated_features.deprecated.proto:is_deprecated_fatal", | ||
| }; | ||
|
|
||
| RuntimeFeatures::RuntimeFeatures() { | ||
| for (auto& feature : disallowed_features) { | ||
| disallowed_features_.insert(feature); | ||
| } | ||
| for (auto& feature : runtime_features) { | ||
| enabled_features_.insert(feature); | ||
| } | ||
| } | ||
|
|
||
| } // namespace Runtime | ||
| } // namespace Envoy | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,6 +3,7 @@ licenses(["notice"]) # Apache 2 | |
| load( | ||
| "//bazel:envoy_build_system.bzl", | ||
| "envoy_cc_test", | ||
| "envoy_cc_test_library", | ||
| "envoy_package", | ||
| ) | ||
|
|
||
|
|
@@ -15,6 +16,16 @@ filegroup( | |
| srcs = glob(["test_data/**"]), | ||
| ) | ||
|
|
||
| envoy_cc_test_library( | ||
| name = "utility_lib", | ||
| hdrs = [ | ||
| "utility.h", | ||
| ], | ||
| deps = [ | ||
| "//source/common/runtime:runtime_lib", | ||
| ], | ||
| ) | ||
|
|
||
| envoy_cc_test( | ||
| name = "runtime_impl_test", | ||
| srcs = ["runtime_impl_test.cc"], | ||
|
|
@@ -31,6 +42,18 @@ envoy_cc_test( | |
| ], | ||
| ) | ||
|
|
||
| envoy_cc_test( | ||
| name = "runtime_flag_override_test", | ||
| srcs = ["runtime_flag_override_test.cc"], | ||
| args = [ | ||
| "--runtime-feature-override-for-tests=envoy.reloadable_features.test_feature_false", | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, this seems like a generally useful feature for runtime configuration where people don't want to use the filesystem (CLI/gflags implementation like we discussed). Is it worth adding a TODO or opening a help wanted issue on that?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, if possible can you rename
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think other folks would use this? I thought from prior discussions you thought google was flag happy but most folks wanted to use the existing runtime. Not sure if removing test_feature_false helps - I've clarified the language in both tests to hopefully clarify things.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do think that Google is an outlier in using flags for this type of stuff, but I was just suggesting that this code might be generally useful to someone, so probably worth tracking somewhere in case it comes up again. I think it's totally fine to leave as test only code for now. Thanks for all the comments, it's much easier to understand for me now. |
||
| ], | ||
| coverage = False, | ||
| deps = [ | ||
| "//source/common/runtime:runtime_lib", | ||
| ], | ||
| ) | ||
|
|
||
| envoy_cc_test( | ||
| name = "uuid_util_test", | ||
| srcs = ["uuid_util_test.cc"], | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| #include "common/runtime/runtime_impl.h" | ||
|
|
||
| #include "gmock/gmock.h" | ||
|
|
||
| namespace Envoy { | ||
| namespace Runtime { | ||
|
|
||
| // Features not in runtime_features.cc are false by default (and this particular one is verified to | ||
| // be false in runtime_impl_test.cc). However in in the envoy_cc_test declaration, the flag is set | ||
| // "--runtime-feature-override-for-tests=envoy.reloadable_features.test_feature_false" | ||
| // to override the return value of runtimeFeatureEnabled to true. | ||
| TEST(RuntimeFlagOverrideTest, OverridesWork) { | ||
| EXPECT_TRUE(Runtime::runtimeFeatureEnabled("envoy.reloadable_features.test_feature_false")); | ||
| } | ||
|
|
||
| } // namespace Runtime | ||
| } // namespace Envoy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a different namespace for this type of thing?
envoy.startup_features.or similar? It's a little confusing that the docs point out that everything named like this is reloadable at runtime and then we talk about how it's not always true. :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I need to remove the buffer mentions on this one. My thought for latching is if we latched at the hcm, or at a network::connection or some such, where the lifetime were transient. As called out on the buffer PR that's truly restart-only not reloadable.
I do want to support envoy.restart_features but I also need to talk to you about how to do that. In my priors, restart flags were done on the command line (only when the binary was brought up) and it was illegal to change then when pushing config (would cause that config to get rejected). Otherwise you do a config push with a new-but-buggy restart flag, the config "passes", you hot restart, and your fleet is down. Only with Envoy and hot restarts I'm less clear on how we push config which is restart only. My default plan would be to that test-only runtime tweaking flag to be a actual production runtime tweaking flag, so folks can change restart values at start-up and we can hard-fail attempts to reload them at runtime, but I recall you saying that most people don't do flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah let's chat about that offline and maybe figure out in a follow up? I think for now maybe just clarify somehow that this should only be used for truly reloadable features and if there are features that should be startup configured it should be discussed? (Up to you how to phrase)