Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions docs/root/statistics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Nighthawk Statistics

## Background
Currently Nighthawk only outputs metrics at the end of a test run. There are no metrics streamed during a test run. In order to collect all the useful Nighthawk metrics, we plan to instrument Nighthawk with the functionality to stream its metrics.


## Statistics in BenchMarkClient
Per worker Statistics are defined in [benchmark_client_impl.h](https://github.com/envoyproxy/nighthawk/blob/master/source/client/benchmark_client_impl.h)

Name | Type | Description
-----| ----- | ----------------
total_req_sent | Counter | Total number of requests sent from Nighthawk
http_xxx | Counter | Total number of response with code xxx
stream_resets | Counter | Total number of sream reset
pool_overflow | Counter | Total number of times connection pool overflowed
pool_connection_failure | Counter | Total number of times pool connection failed
latency_on_success_req | Histogram | Latency (in Microseconds) histogram of successful request with code 2xx
latency_on_error_req | Histogram | Latency (in Microseconds) histogram of error request with code other than 2xx

## Envoy Metrics Model
Since Nighthawk is built on top of [Envoy](https://github.com/envoyproxy/envoy) and similar metrics have been exported from Envoy, it is natural to follow the example in Envoy. Furthermore *Envoy typed metrics are already being used in Nighthawk* ([example](https://github.com/envoyproxy/nighthawk/blob/master/source/client/benchmark_client_impl.h#L33-L46)).


Envoy has 3 types of metrics
- Counters: Unsigned integers (can only increase) represent how many times an event happens, e.g. total number of requests.
- Gauges: Unsigned integers (can increase or decrease), e.g. number of active connections.
- Histograms: Unsigned integers that will yield summarized percentile values. E.g. latency distributions.


In Envoy, the stat [Store](https://github.com/envoyproxy/envoy/blob/74530c92cfa3682b49b540fddf2aba40ac10c68e/include/envoy/stats/store.h#L29) is a singleton and provides a simple interface by which the rest of the code can obtain handles to scopes, counters, gauges, and histograms. Envoy counters and gauges are periodically (configured at ~5 sec interval) flushed to the sinks. Note that currently histogram values are sent directly to the sinks. A stat [Sink](https://github.com/envoyproxy/envoy/blob/74530c92cfa3682b49b540fddf2aba40ac10c68e/include/envoy/stats/sink.h#L48) is an interface that takes generic stat data and translates it into a backend-specific wire format. Currently Envoy supports the TCP and UDP [statsd](https://github.com/b/statsd_spec) protocol (implemented in [statsd.h](https://github.com/envoyproxy/envoy/blob/master/source/extensions/stat_sinks/common/statsd/statsd.h)). Users can create their own Sink subclass to translate Envoy metrics into backend-specific format.

Envoy metrics can be defined using a macro, e.g.
```
#define ALL_CLUSTER_STATS(COUNTER, GAUGE, HISTOGRAM)
COUNTER(upstream_cx_total)
GAUGE(upstream_cx_active, NeverImport)
HISTOGRAM(upstream_cx_length, Milliseconds)
struct ClusterStats {
ALL_CLUSTER_STATS(GENERATE_COUNTER_STRUCT, GENERATE_GAUGE_STRUCT, GENERATE_HISTOGRAM_STRUCT)
};
```

## Envoy Metrics Limitation
Currently Envoy metrics don't support key-value map. As a result, for metrics to be broken down by certain dimensions, we need to define a separate metric for each dimension. For example, currently Nighthawk defines separate [counters](https://github.com/envoyproxy/nighthawk/blob/master/source/client/benchmark_client_impl.h#L35-L40) to monitor the number of responses with corresponding response code.

Due to this limitation, for the latency metric we define 2 metrics: *latency_on_success_req* (to capture latency of successful requests with 2xx) and *latency_on_error_req* (to capture latency of error requests with code other than 2xx).


## Envoy Metrics Flush
Envoy uses a flush timer to periodically flush metrics into stat sinks ([here](https://github.com/envoyproxy/envoy/blob/74530c92cfa3682b49b540fddf2aba40ac10c68e/source/server/server.cc#L479-L480)) at a configured interval (default to 5 sec). For every metric flush, Envoy will call the function [flushMetricsToSinks()](https://github.com/envoyproxy/envoy/blob/74530c92cfa3682b49b540fddf2aba40ac10c68e/source/server/server.cc#L175) to create a metric snapshot from Envoy stat store and flush the snapshot to all sinks through `sink->flush(snapshot)`.


## Metrics Export in Nighthawk
Currently a single Nighthawk can run with multiple workers. In the future, Nighthawk will be extended to be able to run multiple instances together. Since each Nighthawk worker sends requests independently, we decide to export per worker level metrics since it provides several advantages over global level metrics (aggregated across all workers). Notice that *there are existing Nighthawk metrics already defined at per worker level* ([example](https://github.com/envoyproxy/nighthawk/blob/bc72a52efdc489beaa0844b34f136e03394bd355/source/client/benchmark_client_impl.cc#L61)).
- Per worker level metrics provide information about the performance of each worker which will be hidden by the global level metrics.
- Keep the workers independent which makes it easier/efficient to scale up to multiple Nighthawks with large numbers of workers.

Metrics can be defined at per worker level using [Scope](https://github.com/envoyproxy/envoy/blob/e9c2c8c4a0141c9634316e8283f98f412d0dd207/include/envoy/stats/scope.h#L35) ( e.g. `cluster.<worker_id>.total_request_sent`). The dynamic portions of metric (e.g. `worker_id`) can be embedded into the metric name. A [TagSpecifier](https://github.com/envoyproxy/envoy/blob/7a652daf35d7d4a6a6bad5a010fe65947ee4411a/api/envoy/config/metrics/v3/stats.proto#L182) can be specified in the bootstrap configuration, which will transform dynamic portions into tags. When per worker level metrics are exported from Nighthawk, multiple per worker level metrics can be converted into a single metric with a `worker_id` label in the stat Sink if the corresponding backend metric supports key-value map.

## Reference
- [Nighthawk: architecture and key concepts](https://github.com/envoyproxy/nighthawk/blob/master/docs/root/overview.md)
- [Envoy Stats System](https://github.com/envoyproxy/envoy/blob/master/source/docs/stats.md)
- [Envoy Stats blog](https://blog.envoyproxy.io/envoy-stats-b65c7f363342)
15 changes: 13 additions & 2 deletions source/client/benchmark_client_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ BenchmarkClientHttpImpl::BenchmarkClientHttpImpl(
response_statistic_(std::move(response_statistic)),
response_header_size_statistic_(std::move(response_header_size_statistic)),
response_body_size_statistic_(std::move(response_body_size_statistic)), use_h2_(use_h2),
benchmark_client_stats_({ALL_BENCHMARK_CLIENT_STATS(POOL_COUNTER(*scope_))}),
benchmark_client_stats_(
{ALL_BENCHMARK_CLIENT_STATS(POOL_COUNTER(*scope_), POOL_HISTOGRAM(*scope_))}),
cluster_manager_(cluster_manager), http_tracer_(http_tracer),
cluster_name_(std::string(cluster_name)), request_generator_(std::move(request_generator)),
provide_resource_backpressure_(provide_resource_backpressure) {
Expand Down Expand Up @@ -123,6 +124,7 @@ bool BenchmarkClientHttpImpl::tryStartRequest(CompletionCallback caller_completi
*response_body_size_statistic_, request->header(), shouldMeasureLatencies(), content_length,
generator_, http_tracer_);
requests_initiated_++;
benchmark_client_stats_.total_req_sent_.inc();
pool_ptr->newStream(*stream_decoder, *stream_decoder);
return true;
}
Expand Down Expand Up @@ -168,5 +170,14 @@ void BenchmarkClientHttpImpl::onPoolFailure(Envoy::Http::ConnectionPool::PoolFai
}
}

void BenchmarkClientHttpImpl::exportLatency(const uint32_t response_code,
const uint64_t latency_us) {
if (response_code > 199 && response_code <= 299) {
benchmark_client_stats_.latency_on_success_req_.recordValue(latency_us);
} else {
benchmark_client_stats_.latency_on_error_req_.recordValue(latency_us);
}
}

} // namespace Client
} // namespace Nighthawk
} // namespace Nighthawk
12 changes: 8 additions & 4 deletions source/client/benchmark_client_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ using namespace std::chrono_literals;

using namespace Envoy; // We need this because of macro expectations.

#define ALL_BENCHMARK_CLIENT_STATS(COUNTER) \
#define ALL_BENCHMARK_CLIENT_STATS(COUNTER, HISTOGRAM) \
COUNTER(stream_resets) \
COUNTER(http_1xx) \
COUNTER(http_2xx) \
Expand All @@ -39,10 +39,13 @@ using namespace Envoy; // We need this because of macro expectations.
COUNTER(http_5xx) \
COUNTER(http_xxx) \
COUNTER(pool_overflow) \
COUNTER(pool_connection_failure)
COUNTER(pool_connection_failure) \
COUNTER(total_req_sent) \
HISTOGRAM(latency_on_success_req, Microseconds) \
HISTOGRAM(latency_on_error_req, Microseconds)

struct BenchmarkClientStats {
ALL_BENCHMARK_CLIENT_STATS(GENERATE_COUNTER_STRUCT)
ALL_BENCHMARK_CLIENT_STATS(GENERATE_COUNTER_STRUCT, GENERATE_HISTOGRAM_STRUCT)
};

class Http1PoolImpl : public Envoy::Http::Http1::ProdConnPoolImpl {
Expand Down Expand Up @@ -104,6 +107,7 @@ class BenchmarkClientHttpImpl : public BenchmarkClient,
// StreamDecoderCompletionCallback
void onComplete(bool success, const Envoy::Http::ResponseHeaderMap& headers) override;
void onPoolFailure(Envoy::Http::ConnectionPool::PoolFailureReason reason) override;
void exportLatency(const uint32_t response_code, const uint64_t latency_us) override;

// Helpers
Envoy::Http::ConnectionPool::Instance* pool() {
Expand Down Expand Up @@ -142,4 +146,4 @@ class BenchmarkClientHttpImpl : public BenchmarkClient,
};

} // namespace Client
} // namespace Nighthawk
} // namespace Nighthawk
9 changes: 8 additions & 1 deletion source/client/stream_decoder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,14 @@ void StreamDecoder::decodeTrailers(Envoy::Http::ResponseTrailerMapPtr&& headers)
void StreamDecoder::onComplete(bool success) {
ASSERT(!success || complete_);
if (success && measure_latencies_) {
latency_statistic_.addValue((time_source_.monotonicTime() - request_start_).count());
const auto dur = time_source_.monotonicTime() - request_start_;
latency_statistic_.addValue(dur.count());
// At this point StreamDecoder::decodeHeaders() should have been called.
if (stream_info_.response_code_.has_value()) {
decoder_completion_callback_.exportLatency(
stream_info_.response_code_.value(),
std::chrono::duration_cast<std::chrono::microseconds>(dur).count());
}
}
upstream_timing_.onLastUpstreamRxByteReceived(time_source_);
response_body_sizes_statistic_.addValue(stream_info_.bytesSent());
Expand Down
1 change: 1 addition & 0 deletions source/client/stream_decoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ class StreamDecoderCompletionCallback {
virtual ~StreamDecoderCompletionCallback() = default;
virtual void onComplete(bool success, const Envoy::Http::ResponseHeaderMap& headers) PURE;
virtual void onPoolFailure(Envoy::Http::ConnectionPool::PoolFailureReason reason) PURE;
virtual void exportLatency(const uint32_t response_code, const uint64_t latency_us) PURE;
};

// TODO(oschaaf): create a StreamDecoderPool?
Expand Down
2 changes: 1 addition & 1 deletion test/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ envoy_cc_test(
"//test/test_common:environment_lib",
"@envoy//source/common/http:header_map_lib_with_external_headers",
"@envoy//source/common/network:utility_lib_with_external_headers",
"@envoy//source/common/stats:isolated_store_lib_with_external_headers",
"@envoy//source/exe:process_wide_lib_with_external_headers",
"@envoy//test/mocks/runtime:runtime_mocks",
"@envoy//test/mocks/stats:stats_mocks",
"@envoy//test/mocks/stream_info:stream_info_mocks",
"@envoy//test/mocks/thread_local:thread_local_mocks",
"@envoy//test/mocks/upstream:upstream_mocks",
Expand Down
50 changes: 41 additions & 9 deletions test/benchmark_http_client_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
#include "external/envoy/source/common/http/header_map_impl.h"
#include "external/envoy/source/common/network/utility.h"
#include "external/envoy/source/common/runtime/runtime_impl.h"
#include "external/envoy/source/common/stats/isolated_store_impl.h"
#include "external/envoy/source/exe/process_wide.h"
#include "external/envoy/test/mocks/common.h"
#include "external/envoy/test/mocks/runtime/mocks.h"
#include "external/envoy/test/mocks/stats/mocks.h"
#include "external/envoy/test/mocks/stream_info/mocks.h"
#include "external/envoy/test/mocks/thread_local/mocks.h"
#include "external/envoy/test/mocks/upstream/mocks.h"
Expand Down Expand Up @@ -124,7 +124,7 @@ class BenchmarkClientHttpTest : public Test {

void setupBenchmarkClient() {
client_ = std::make_unique<Client::BenchmarkClientHttpImpl>(
*api_, *dispatcher_, store_, std::make_unique<StreamingStatistic>(),
*api_, *dispatcher_, mock_store_, std::make_unique<StreamingStatistic>(),
std::make_unique<StreamingStatistic>(), std::make_unique<StreamingStatistic>(),
std::make_unique<StreamingStatistic>(), false, cluster_manager_, http_tracer_, "benchmark",
request_generator_, true);
Expand All @@ -143,7 +143,8 @@ class BenchmarkClientHttpTest : public Test {
}

Envoy::Event::TestRealTimeSystem time_system_;
Envoy::Stats::IsolatedStoreImpl store_;
// deliverHistogramToSinks() is not implemented in IsolatedStoreImpl so test with a mock store.
Envoy::Stats::MockIsolatedStatsStore mock_store_;
Envoy::Api::ApiPtr api_;
Envoy::Event::DispatcherPtr dispatcher_;
Envoy::Runtime::RandomGeneratorImpl generator_;
Expand All @@ -166,45 +167,76 @@ TEST_F(BenchmarkClientHttpTest, BasicTestH1200) {
response_code_ = "200";
testBasicFunctionality(2, 3, 10);
EXPECT_EQ(5, getCounter("http_2xx"));
EXPECT_EQ(5, getCounter("total_req_sent"));
}

TEST_F(BenchmarkClientHttpTest, BasicTestH1300) {
response_code_ = "300";
testBasicFunctionality(0, 11, 10);
EXPECT_EQ(10, getCounter("http_3xx"));
EXPECT_EQ(10, getCounter("total_req_sent"));
}

TEST_F(BenchmarkClientHttpTest, BasicTestH1404) {
response_code_ = "404";
testBasicFunctionality(0, 1, 10);
EXPECT_EQ(1, getCounter("http_4xx"));
EXPECT_EQ(1, getCounter("total_req_sent"));
}

TEST_F(BenchmarkClientHttpTest, WeirdStatus) {
response_code_ = "601";
testBasicFunctionality(0, 1, 10);
EXPECT_EQ(1, getCounter("http_xxx"));
EXPECT_EQ(1, getCounter("total_req_sent"));
}

TEST_F(BenchmarkClientHttpTest, EnableLatencyMeasurement) {
setupBenchmarkClient();
EXPECT_EQ(false, client_->shouldMeasureLatencies());
EXPECT_CALL(mock_store_, deliverHistogramToSinks(_, _)).Times(0);
testBasicFunctionality(10, 1, 10);
EXPECT_EQ(0, client_->statistics()["benchmark_http_client.queue_to_connect"]->count());
EXPECT_EQ(0, client_->statistics()["benchmark_http_client.request_to_response"]->count());
EXPECT_EQ(10, getCounter("total_req_sent"));
client_->setShouldMeasureLatencies(true);
EXPECT_CALL(mock_store_, deliverHistogramToSinks(_, _)).Times(10);
testBasicFunctionality(10, 1, 10);
EXPECT_EQ(10, client_->statistics()["benchmark_http_client.queue_to_connect"]->count());
EXPECT_EQ(10, client_->statistics()["benchmark_http_client.request_to_response"]->count());
EXPECT_EQ(20, getCounter("total_req_sent"));
}

TEST_F(BenchmarkClientHttpTest, ExportSuccessLatency) {
setupBenchmarkClient();
uint64_t latency = 10;
EXPECT_CALL(mock_store_, deliverHistogramToSinks(Property(&Envoy::Stats::Metric::name,
"benchmark.latency_on_success_req"),
latency))
.Times(1);
EXPECT_CALL(mock_store_,
deliverHistogramToSinks(
Property(&Envoy::Stats::Metric::name, "benchmark.latency_on_error_req"), latency))
.Times(0);
client_->exportLatency(/*response_code=*/200, /*latency_us=*/latency);
}

TEST_F(BenchmarkClientHttpTest, ExportErrorLatency) {
setupBenchmarkClient();
uint64_t latency = 10;
EXPECT_CALL(mock_store_, deliverHistogramToSinks(Property(&Envoy::Stats::Metric::name,
"benchmark.latency_on_success_req"),
latency))
.Times(0);
EXPECT_CALL(mock_store_,
deliverHistogramToSinks(
Property(&Envoy::Stats::Metric::name, "benchmark.latency_on_error_req"), latency))
.Times(1);
client_->exportLatency(/*response_code=*/500, /*latency_us=*/latency);
}

TEST_F(BenchmarkClientHttpTest, StatusTrackingInOnComplete) {
auto store = std::make_unique<Envoy::Stats::IsolatedStoreImpl>();
client_ = std::make_unique<Client::BenchmarkClientHttpImpl>(
*api_, *dispatcher_, *store, std::make_unique<StreamingStatistic>(),
std::make_unique<StreamingStatistic>(), std::make_unique<StreamingStatistic>(),
std::make_unique<StreamingStatistic>(), false, cluster_manager_, http_tracer_, "foo",
request_generator_, true);
setupBenchmarkClient();
Envoy::Http::ResponseHeaderMapPtr header = Envoy::Http::ResponseHeaderMapImpl::create();

header->setStatus(1);
Expand Down
12 changes: 8 additions & 4 deletions test/integration/test_integration_basics.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ def test_http_h1(http_test_server_fixture):
assertCounterEqual(counters, "upstream_rq_pending_total", 1)
assertCounterEqual(counters, "upstream_rq_total", 25)
assertCounterEqual(counters, "default.total_match_count", 1)
assertCounterEqual(counters, "benchmark.total_req_sent", 25)

global_histograms = http_test_server_fixture.getNighthawkGlobalHistogramsbyIdFromJson(parsed_json)
assertEqual(int(global_histograms["benchmark_http_client.response_body_size"]["count"]), 25)
Expand All @@ -47,7 +48,7 @@ def test_http_h1(http_test_server_fixture):
assertEqual(int(global_histograms["benchmark_http_client.response_body_size"]["raw_pstdev"]), 0)
assertEqual(int(global_histograms["benchmark_http_client.response_header_size"]["raw_pstdev"]), 0)

assertEqual(len(counters), 12)
assertEqual(len(counters), 13)


def mini_stress_test(fixture, args):
Expand Down Expand Up @@ -179,7 +180,8 @@ def test_http_h2(http_test_server_fixture):
assertCounterEqual(counters, "upstream_rq_pending_total", 1)
assertCounterEqual(counters, "upstream_rq_total", 25)
assertCounterEqual(counters, "default.total_match_count", 1)
assertEqual(len(counters), 12)
assertCounterEqual(counters, "benchmark.total_req_sent", 25)
assertEqual(len(counters), 13)


def test_http_concurrency(http_test_server_fixture):
Expand Down Expand Up @@ -224,7 +226,8 @@ def test_https_h1(https_test_server_fixture):
assertCounterEqual(counters, "ssl.sigalgs.rsa_pss_rsae_sha256", 1)
assertCounterEqual(counters, "ssl.versions.TLSv1.2", 1)
assertCounterEqual(counters, "default.total_match_count", 1)
assertEqual(len(counters), 17)
assertCounterEqual(counters, "benchmark.total_req_sent", 25)
assertEqual(len(counters), 18)

server_stats = https_test_server_fixture.getTestServerStatisticsJson()
assertEqual(
Expand Down Expand Up @@ -258,7 +261,8 @@ def test_https_h2(https_test_server_fixture):
assertCounterEqual(counters, "ssl.sigalgs.rsa_pss_rsae_sha256", 1)
assertCounterEqual(counters, "ssl.versions.TLSv1.2", 1)
assertCounterEqual(counters, "default.total_match_count", 1)
assertEqual(len(counters), 17)
assertCounterEqual(counters, "benchmark.total_req_sent", 25)
assertEqual(len(counters), 18)


def test_https_h2_multiple_connections(https_test_server_fixture):
Expand Down
Loading