Skip to content

Finalize emission and tracking of latencies in response headers#500

Merged
dubious90 merged 42 commits intoenvoyproxy:masterfrom
oschaaf:origin-timings-tracking-option
Sep 9, 2020
Merged

Finalize emission and tracking of latencies in response headers#500
dubious90 merged 42 commits intoenvoyproxy:masterfrom
oschaaf:origin-timings-tracking-option

Conversation

@oschaaf
Copy link
Copy Markdown
Member

@oschaaf oschaaf commented Aug 31, 2020

Adds a client option and wires it through in TCLAP. Amends tests and code to work with that
instead of the hard coded value. Adds end to end test, and enables test-server extension build.

Follow up to #477

Fixes #360: with this the feature is ready to use.

Signed-off-by: Otto van der Schaaf oschaaf@we-amp.com


TODO:

  • Land Echo back and track origin request-receipt timing deltas #477 and merge it in here
  • Wire up TCLAP
  • Tests for the new option parsing/handling
  • Enable build of the new time-tracking extension into nighthawk_test_server
  • End-to-end tests to prove the new client-side feature can be used to track latencies delivered by response header emitted by the test-server's new feature.
  • CLI & proto description / comments, regen docs.
  • Replace Stopwatch shared_ptr with unique_ptr

oschaaf added 28 commits June 15, 2020 17:31
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Will still fail the test because of mismatched expectations,
but this will help merging master periodically in here to
verify (almost) all is well.

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf oschaaf marked this pull request as ready for review September 1, 2020 18:00
@oschaaf oschaaf added the waiting-for-review A PR waiting for a review. label Sep 1, 2020
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@dubious90 dubious90 requested a review from qqustc September 1, 2020 19:50
@dubious90
Copy link
Copy Markdown
Contributor

@qqustc please review and assign to me once done. Thanks!

@qqustc
Copy link
Copy Markdown
Contributor

qqustc commented Sep 3, 2020

Thanks Otto!
LGTM. Assign back to @dubious90 .

qqustc
qqustc previously approved these changes Sep 3, 2020
…acking-option

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Copy link
Copy Markdown
Contributor

@dubious90 dubious90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature looks cool, Otto. Thanks for your efforts.

README.md Outdated

bazel-bin/nighthawk_client [--stats-flush-interval <uint32_t>]
[--stats-sinks <string>] ...
bazel-bin/nighthawk_client [--response-header-with-latency-input
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes here requested, and I think we're in agreement here, but eventually I think we need to move away from TCLAP automating this, because there are now so many flags that it's become difficult to find what you need (e.g. rps, duration).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree.

README.md Outdated

Where:

--response-header-with-latency-input <string>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the meaning here is a little hard to parse from the name / comment combo.

How do you feel about response_latency_header_name?

README.md Outdated
Where:

--response-header-with-latency-input <string>
Set an optional response header name, whose values will be tracked in
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the description, can I suggest this edit to the first sentence:
Set an optional header name that will be returned in responses, whose values will be tracked in a latency histogram if set.

README.md Outdated

--response-header-with-latency-input <string>
Set an optional response header name, whose values will be tracked in
a latency histogram if set. Can be used in tandem with the test server
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second sentence, can we edit to something like:
"... if set. e.g. Can be used in tandem with the test server's response option "emit_previous_request_delta_in_response_header" to record elapsed time between request arrivals."

(Adding in the word response option clarifies where the reader should look for the definition of that)

Envoy::Random::RandomGenerator& random_generator,
Envoy::Tracing::HttpTracerSharedPtr& http_tracer)
Envoy::Tracing::HttpTracerSharedPtr& http_tracer,
std::string response_header_with_latency_input)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised by your use of a std::string copy with std::move here, as opposed to having this be an absl::string_view (or const std::string&, whicever is more common within our code).

And then having doing response_header_with_latency_input_(response_header_with_latency_input), which I'm pretty sure just works.

This suggestion is more in line with the style I'm familiar with. Is there a strong reason not to do that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'm surprised too, I must have grown tired of the tried and proven path in this repo :-) Using absl::string_view now.

client_ = std::make_unique<Client::BenchmarkClientHttpImpl>(
*api_, *dispatcher_, store_, statistic_, false, cluster_manager_, http_tracer_, "benchmark",
request_generator, true);
request_generator, true, "");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of arguments here. Would it be okay if we added comments helping users track the corresponding parameter under use? For the new one, it'd be
/*response_header_with_latency_input=*/""
And then, the two boolean parameters could also benefit from this.

@@ -0,0 +1,34 @@
admin:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment at the top of this, explaining its purpose in just a sentence, and draw attention to the relevant/unique part of the configuration (which I think is the time-tracking filter)

"""Test emission and tracking of response header latencies.

Run the CLI configured to track latencies delivered by response header from the test-server
which is set up emit those. Ensure the expected histogram is observed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: set up to emit those

"""Test emission and tracking of response header latencies.

Run the CLI configured to track latencies delivered by response header from the test-server
which is set up emit those. Ensure the expected histogram is observed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we instead say ensure the histogram receives the correct number of inputs or something?

])
global_histograms = http_test_server_fixture.getNighthawkGlobalHistogramsbyIdFromJson(parsed_json)
asserts.assertEqual(
int(global_histograms["benchmark_http_client.origin_latency_statistic"]["count"]), 99)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a second integration test that works as a crash test of sorts, for adding in the header but receiving no responses that include that header?

Not sure of the exact correct assertions. We would expect it to work, but for origin_latency_statistic to equal 0, I think. Working could be defined as the responses_xxx counter increasing?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; we now test using two server side configurations, one with the new time-tracking extension enabled, and one that doesn't have it set up. We check expectations based on which one we're testing. Let me know if that looks good to you. (68ed58a)

@dubious90 dubious90 added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Sep 8, 2020
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Sep 9, 2020

@dubious90 comments addressed in 68ed58a.

@oschaaf oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Sep 9, 2020
…acking-option

Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf
Copy link
Copy Markdown
Member Author

oschaaf commented Sep 9, 2020

@dubious90 comments addressed in 68ed58a.

Also, merged master in here and resolved conflicts. Thanks for the review!

Copy link
Copy Markdown
Contributor

@dubious90 dubious90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the one naming comment. Thanks!

README.md Outdated
USAGE:

bazel-bin/nighthawk_client [--stats-flush-interval <uint32_t>]
bazel-bin/nighthawk_client [--response-latency-header-name <string>]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for extra churn here. Realized I gave you a misleading name (implies this would be response latency, which, while possible to do with this, is not the only usecase).

How do you feel about --latency-response-header-name (just inverting those two words).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, done in 594bbbe

])
global_histograms = http_test_server_fixture.getNighthawkGlobalHistogramsbyIdFromJson(parsed_json)
asserts.assertEqual(int(global_histograms["benchmark_http_client.latency_2xx"]["count"]), 100)
# Verify behavior is correct both with and without the timing filter enabled.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very helpful comment here. Thanks

@dubious90 dubious90 added waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. and removed waiting-for-review A PR waiting for a review. labels Sep 9, 2020
Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
@oschaaf oschaaf added waiting-for-review A PR waiting for a review. and removed waiting-for-changes A PR waiting for comments to be resolved and changes to be applied. labels Sep 9, 2020
@dubious90 dubious90 merged commit 365b97a into envoyproxy:master Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-for-review A PR waiting for a review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

idea: track and propagate test server timings for comparison

3 participants