Skip to content

Test server dynamic delay#390

Merged
mum4k merged 24 commits intoenvoyproxy:masterfrom
oschaaf:test-server-dynamic-delay
Jul 14, 2020
Merged

Test server dynamic delay#390
mum4k merged 24 commits intoenvoyproxy:masterfrom
oschaaf:test-server-dynamic-delay

Conversation

@oschaaf
Copy link
Copy Markdown
Member

@oschaaf oschaaf commented Jun 29, 2020

Add an extension which implements dynamic delay injection based on the number
of concurrent requests being handled, as well as static delays.

Fixes #389 and also #82. We may consider removing the direct fault filter configs
from our docs as well as emit a log warning when detecting usage to deprecate.
It would be nice if we could add the fault filter extension in code instead of imposing
configuration of that upon the end user, making that an implementation detail.

The following shows a per-request config example.
This header-provided configuration gets merged with any provided static configuration to
determine a final configuration for each request:

curl -H "x-nighthawk-test-server-config:{static_delay:\"1s\"}"  -vv 127.0.0.1:10000

Full yaml config for the http_filters section:

                http_filters:
                  - name: dynamic-delay
                    config:
                      concurrency_based_delay:
                        minimal_delay: 0.05s
                        concurrency_delay_factor: 0.010s                  
                  - name: test-server
                    config:
                      response_body_size: 10
                      response_headers:
                        - { header: { key: "x-nh", value: "1" } }

Signed-off-by: Otto van der Schaaf oschaaf@we-amp.com


State: first stab, manually verified to work. needs

  • discussion
  • unit tests
  • a bit of cleanup and
  • end to end test, to prove the number of active filter instances is tracked correctly and interaction with the fault filter works as intended (filed dynamic-delay filter: end to end test #393 to track).
  • docs.

oschaaf added 4 commits June 29, 2020 12:29
Add an extension which implements dynamic delay injection
based on the delta between target and current guage values.

State: manually verified to work. needs tests, a bit of cleanup and docs.

Fixes envoyproxy#389 and also envoyproxy#82, we may want to remove the direct fault filter configs
from our docs as well as emit a log warning when detecting usage to deprecate.

Per-request config example, this gets merged with any static configuratio
to determine the active configuration:

```
curl -H "x-nighthawk-test-server-config:{static_delay:\"1s\"}"  -vv 127.0.0.1:10000
```

Full yaml config for the http_filters section:

```yaml
                http_filters:
                  - name: dynamic-delay
                    config:
                      stats_based_delay:
                        guage_name: "http.ingress_http.downstream_rq_active"
                        guage_target_value: 10
                        delay_delta: 0.1s
                  - name: envoy.fault
                    config:
                      max_active_faults: 100
                      delay:
                        header_delay: {}
                        percentage:
                          numerator: 100
                  - name: test-server
                    config:
                      response_body_size: 10
                      response_headers:
                        - { header: { key: "x-nh", value: "1" } }
```

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Jun 29, 2020

@mum4k @jmarantz opening this up as a draft to allow for early review. (See the issue description for state), but prevent merging (or the impression of this being in that shape yet).

I would like to invite you to assess the high level approach as well as the algorithm/parameterization to achieve the goal we're after here: being able to induce a sustained/fixed level of parallelism on the downstream side (e.g. a proxy running in front of the test server).

While working on this PR I was mostly focussed on wiring in the overall mechanics of the desired type of functionality,
but not so much the algorithm itself.

@oschaaf oschaaf requested review from jmarantz and mum4k June 29, 2020 13:11
@oschaaf oschaaf added the waiting-for-review A PR waiting for a review. label Jun 29, 2020
Copy link
Copy Markdown

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't been through all of it yet, but using arbitrary gauges brings a bit of complexity.

oschaaf added 2 commits June 30, 2020 00:46
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Jun 29, 2020

@jmarantz @jc1e100 I pushed changes to partially address comments in 277c786, and updated the PR description.
I couldn't directly stick an atomic into the shared filter config, as that implicitly deletes the copy constructor, and prevents the build from succeeding with that. So it is static at this time. Let me know if this looks better.

Copy link
Copy Markdown

@jc1e100 jc1e100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the new simplified config format LGTM.

std::atomic<uint64_t> HttpDynamicDelayDecoderFilterConfig::instances_(0);

HttpDynamicDelayDecoderFilterConfig::HttpDynamicDelayDecoderFilterConfig(
nighthawk::server::ResponseOptions proto_config)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nighthawk::server::ResponseOptions&& ?

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf oschaaf marked this pull request as ready for review July 1, 2020 15:11
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Jul 1, 2020

This is ready for another pass; I punted a bug and a missing functional test to #392 and #393.

@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Jul 1, 2020

Linking the CI coverage report for convenience, this seems ok to me (combined with the issue that tracks that we need covering actual end-to-end functionality).

@mum4k mum4k requested a review from dubious90 July 1, 2020 19:38
@mum4k
Copy link
Copy Markdown
Collaborator

mum4k commented Jul 1, 2020

@dubious90 please review and assign back to me once done.

oschaaf added 2 commits July 2, 2020 12:59
Remove the python-based error flow test in favor of a c++ one.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
jmarantz
jmarantz previously approved these changes Jul 2, 2020
Copy link
Copy Markdown

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good but someone more expert in the NH code should also review.

@oschaaf oschaaf requested a review from htuch July 2, 2020 20:05
- `response_headers` - list of headers to add to response. If `append` is set to
`true`, then the header is appended.
* `echo_request_headers` - if set to `true`, then append the dump of request headers to the response
- `echo_request_headers` - if set to `true`, then append the dump of request headers to the response
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a bullet here for oneof_delay_options, which is your new field in ResponseOptions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed this, unless this is dependent on the result of the larger thread about the use of the fault filter.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The later revisions have the new options documented in a new section right below this one, does that work for you?

Comment on lines +49 to +62
if (base_config_.has_static_delay()) {
delay_ms = Envoy::Protobuf::util::TimeUtil::DurationToMilliseconds(base_config_.static_delay());
} else if (base_config_.has_concurrency_based_delay()) {
const nighthawk::server::ConcurrencyBasedDelay& concurrency_config =
base_config_.concurrency_based_delay();
const uint64_t current_value = config_->approximateInstances();
delay_ms = computeDelayMilliseconds(current_value, concurrency_config.minimal_delay(),
concurrency_config.concurrency_delay_factor());
}
if (delay_ms.has_value() && delay_ms > 0) {
// Emit header to communicate the delay we desire to the fault filter extension.
const Envoy::Http::LowerCaseString key("x-envoy-fault-delay-request");
headers.setCopy(key, absl::StrCat(*delay_ms));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd help the readability here if you abstracted this section into its own function, as this is the part of the function that is actually setting the delays, rather than just decoding headers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, let me know if that looks like what you had in mind (see 8ee78db)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking grab lines 49 - 62 here, so including the part where it determines what the delay is. But I defer to you on what makes the most sense.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, let me know if that looks better.

oschaaf added 2 commits July 6, 2020 23:34
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
oschaaf added 3 commits July 8, 2020 11:29
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
- `response_headers` - list of headers to add to response. If `append` is set to
`true`, then the header is appended.
* `echo_request_headers` - if set to `true`, then append the dump of request headers to the response
- `echo_request_headers` - if set to `true`, then append the dump of request headers to the response
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed this, unless this is dependent on the result of the larger thread about the use of the fault filter.

@dubious90
Copy link
Copy Markdown
Contributor

LGTM, after the two quick comments are resolved. @mum4k to approve.

oschaaf added 3 commits July 9, 2020 11:02
…ver-dynamic-delay

Partially moves to using the fault filter extension as a base class

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Copy link
Copy Markdown
Member Author

@oschaaf oschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion via Slack with @htuch and @dubious90 I changed this to take the fault filter as a base class, so there are some new changes in here.

- `response_headers` - list of headers to add to response. If `append` is set to
`true`, then the header is appended.
* `echo_request_headers` - if set to `true`, then append the dump of request headers to the response
- `echo_request_headers` - if set to `true`, then append the dump of request headers to the response
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The later revisions have the new options documented in a new section right below this one, does that work for you?

Copy link
Copy Markdown
Collaborator

@mum4k mum4k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note - I started reviewing the implementation only after I commented my preference on the inherit / copy / ... the fault injection filter, so I only now noticed that we already invested into the inheritance approach. I would like us to minimize the churn on this PR, since it is already quite large, so let's leave the current approach in.

If you do feel this will add enough maintenance overhead for us to feel it, I think we should consider filing an issue to follow-up on this decision and address it later if we reach a different conclusion.

Since this is a fairly large PR, stopping the review mid stream and providing partial set of comments to avoid blocking you.


} // namespace TestServer

class Utility {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we can improve the naming and structure of this new library. Both "common" and "Utility" are very generic names that a) don't tell the reader much and b) attract pretty much any helper / utility function. After all everything is an utility.

Secondly it doesn't seem necessary for this to be a class at all, since the two methods on it don't really need any state. How would you feel about changing these into regular functions and renaming the library into something more specific.

E.g. response_options.h

Feel free to choose any other name that is specific enough not to attract unrelated functions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 9be5334, let me know if that looks better.


namespace TestServer {

class HeaderNameValues {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the comment below (ideally, please read that one first) - are we moving this definition here just because we don't have a better place for it and this file happens to have a generic enough name? And a similar question - does this have to be a class?

Alternative approach - this currently only seems to be used in a single location - how about we just inline the literal there, is it worth having a this defined as a constant?

If yes, can we define a new header specifically designed for this? And if we are doing that - is attaching these onto a class an idiom used in Envoy, or can we can try to figure out an approach that doesn't require a class? Even an ordinary function returning static constexpr might be preferred.

inline absl::string_view MyString() {
  static constexpr char kHello[] = "Hello";
  return kHello;
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like for above, see 9be5334. However, I kept the Envoy construct, because this is shared/reused across the two extensions, and having the LowerCaseString is convenient -- that is the form that consumers are probably going to end up needing when they want access to this.

@@ -0,0 +1,50 @@
#pragma once
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add unit tests for this library?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are pre-existing unit tests for this. For example, see

TEST_F(HttpTestServerDecoderFilterTest, HeaderMerge) {

@@ -0,0 +1,50 @@
#pragma once
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for the future) Looks like the changes in this library may have been just indirectly related to the work in this PR. If reasonable, can we try to split such changes into their own PR in the future to minimize the size of PRs and the review complexity?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, sorry about that. Will extract these mechanical refactors out into a separate PR next time.

namespace Server {

/**
* Filter configuration container class for the dynamic delay extension.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expand the class doc comment with a few more information, ideally at least an example of use to help the reader put this in context?

The public API of this class has a few functions that aren't obvious (or their use case isn't), so this can help with that too.

class HttpDynamicDelayDecoderFilterConfig {

public:
HttpDynamicDelayDecoderFilterConfig(nighthawk::server::ResponseOptions proto_config,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document the constructor and the arguments it takes?

namespace Server {

/**
* Filter configuration container class for the dynamic delay extension.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we indicate the thread-safety of this class in the comment?

/**
* Increments the number of active instances.
*/
void incrementInstanceCount() { instances()++; }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand what these are for and why is the getter getting an approximate amount rather than exact?

*/
uint64_t approximateInstances() const { return instances(); }

Envoy::Runtime::Loader& runtime() { return runtime_; }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document all public methods?

@mum4k mum4k added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Jul 9, 2020
oschaaf added 2 commits July 13, 2020 20:27
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Jul 13, 2020

@mum4k I pushed a enhancements based on your feedback in 9be5334 and
ad5e34c. Let me know of those two commits sufficiently address it.

@oschaaf oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Jul 13, 2020
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
mum4k
mum4k previously approved these changes Jul 13, 2020
Copy link
Copy Markdown
Collaborator

@mum4k mum4k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just waiting for the CI checks to finish.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-for-review A PR waiting for a review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test server: controlling parallelism via dynamic delays

6 participants