Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions api/envoy/config/bootstrap/v2/bootstrap.proto
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,14 @@ message Bootstrap {

// Optional overload manager configuration.
envoy.config.overload.v2alpha.OverloadManager overload_manager = 15;

// Enable :ref:`stats for event dispatcher <operations_performance>`, defaults to false.
// Note that this records a value for each iteration of the event loop on every thread. This
// should normally be minimal overhead, but when using
// :ref:`statsd <envoy_api_msg_config.metrics.v2.StatsdSink>`, it will send each observed value

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just statsd or also other stats sinks?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just statsd. The Monarch sink appears to just dump histograms on the floor, and the Hystrix and Metrics sinks both do the right thing with parent histograms.

// over the wire individually because the statsd protocol doesn't have any way to represent a
// histogram summary. Be aware that this can be a very large volume of data.
bool enable_dispatcher_stats = 16;
}

// Administration interface :ref:`operations documentation
Expand Down
10 changes: 7 additions & 3 deletions bazel/repository_locations.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,13 @@ REPOSITORY_LOCATIONS = dict(
urls = ["https://github.com/google/benchmark/archive/505be96ab23056580a3a2315abba048f4428b04e.tar.gz"],
),
com_github_libevent_libevent = dict(
sha256 = "53d4bb49b837944893b7caf9ae8eb43e94690ee5babea6469cc4a928722f99b1",
strip_prefix = "libevent-c4fbae3ae6166dddfa126734edd63213afa14dce",
urls = ["https://github.com/libevent/libevent/archive/c4fbae3ae6166dddfa126734edd63213afa14dce.tar.gz"],
sha256 = "ab3af422b7e4c6d9276b3637d87edb6cf628fd91c9206260b759778c3a28b330",
# This SHA includes the new "prepare" and "check" watchers, used for event loop performance
# stats (see https://github.com/libevent/libevent/pull/793) and the fix for a race condition
# in the watchers (see https://github.com/libevent/libevent/pull/802).
# TODO(mergeconflict): Update to v2.2 when it is released.
strip_prefix = "libevent-1cd8830de27c30c5324c75bfb6012c969c09ca2c",
urls = ["https://github.com/libevent/libevent/archive/1cd8830de27c30c5324c75bfb6012c969c09ca2c.tar.gz"],
),
com_github_madler_zlib = dict(
sha256 = "629380c90a77b964d896ed37163f5c3a34f6e6d897311f1df2a7016355c45eff",
Expand Down
1 change: 1 addition & 0 deletions docs/root/intro/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Version history
* access log: added a new field for response code details in :ref:`file access logger<config_access_log_format_response_code_details>` and :ref:`gRPC access logger<envoy_api_field_data.accesslog.v2.HTTPResponseProperties.response_code_details>`.
* dubbo_proxy: support the :ref:`Dubbo proxy filter <config_network_filters_dubbo_proxy>`.
* eds: added support to specify max time for which endpoints can be used :ref:`gRPC filter <envoy_api_msg_ClusterLoadAssignment.Policy>`.
* event: added :ref:`loop duration and poll delay statistics <operations_performance>`.
* ext_authz: added option to `ext_authz` that allows the filter clearing route cache.
* http: mitigated a race condition with the :ref:`delayed_close_timeout<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.delayed_close_timeout>` where it could trigger while actively flushing a pending write buffer for a downstream connection.
* redis: added :ref:`prefix routing <envoy_api_field_config.filter.network.redis_proxy.v2.RedisProxy.prefix_routes>` to enable routing commands based on their key's prefix to different upstream.
Expand Down
1 change: 1 addition & 0 deletions docs/root/operations/operations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ Operations and administration
runtime
fs_flags
traffic_tapping
performance
51 changes: 51 additions & 0 deletions docs/root/operations/performance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
.. _operations_performance:

Performance
===========

Envoy is architected to optimize scalability and resource utilization by running an event loop on a
:ref:`small number of threads <arch_overview_threading>`. The "main" thread is responsible for
control plane processing, and each "worker" thread handles a portion of the data plane processing.
Envoy exposes two statistics to monitor performance of the event loops on all these threads.

* **Loop duration:** Some amount of processing is done on each iteration of the event loop. This
amount will naturally vary with changes in load. However, if one or more threads have an unusually
long-tailed loop duration, it may indicate a performance issue. For example, work might not be
distributed fairly across the worker threads, or there may be a long blocking operation in an
extension that's impeding progress.

* **Poll delay:** On each iteration of the event loop, the event dispatcher polls for I/O events
and "wakes up" either when some I/O events are ready to be processed or when a timeout fires,
whichever occurs first. In the case of a timeout, we can measure the difference between the
expected wakeup time and the actual wakeup time after polling; this difference is called the "poll
delay." It's normal to see some small poll delay, usually equal to the kernel scheduler's "time
slice" or "quantum"---this depends on the specific operating system on which Envoy is
running---but if this number elevates substantially above its normal observed baseline, it likely
indicates kernel scheduler delays.

These statistics can be enabled by setting :ref:`enable_dispatcher_stats <envoy_api_field_config.bootstrap.v2.Bootstrap.enable_dispatcher_stats>`

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is new.

to true.

.. warning::

Note that enabling dispatcher stats records a value for each iteration of the event loop on every
thread. This should normally be minimal overhead, but when using
:ref:`statsd <envoy_api_msg_config.metrics.v2.StatsdSink>`, it will send each observed value over
the wire individually because the statsd protocol doesn't have any way to represent a histogram
summary. Be aware that this can be a very large volume of data.

Statistics
----------

The event dispatcher for the main thread has a statistics tree rooted at *server.dispatcher.*, and
the event dispatcher for each worker thread has a statistics tree rooted at
*listener_manager.worker_<id>.dispatcher.*, each with the following statistics:

.. csv-table::
:header: Name, Type, Description
:widths: 1, 1, 2

loop_duration_us, Histogram, Event loop durations in microseconds
poll_delay_us, Histogram, Polling delays in microseconds

Note that any auxiliary threads are not included here.
27 changes: 27 additions & 0 deletions include/envoy/event/dispatcher.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,29 @@
#include "envoy/network/listen_socket.h"
#include "envoy/network/listener.h"
#include "envoy/network/transport_socket.h"
#include "envoy/stats/scope.h"
#include "envoy/stats/stats_macros.h"
#include "envoy/thread/thread.h"

namespace Envoy {
namespace Event {

/**
* All dispatcher stats. @see stats_macros.h
*/
// clang-format off
#define ALL_DISPATCHER_STATS(HISTOGRAM) \
HISTOGRAM(loop_duration_us) \
HISTOGRAM(poll_delay_us)
// clang-format on

/**
* Struct definition for all dispatcher stats. @see stats_macros.h
*/
struct DispatcherStats {
ALL_DISPATCHER_STATS(GENERATE_HISTOGRAM_STRUCT)
};

/**
* Callback invoked when a dispatcher post() runs.
*/
Expand All @@ -39,6 +57,15 @@ class Dispatcher {
*/
virtual TimeSource& timeSource() PURE;

/**
* Initialize stats for this dispatcher. Note that this can't generally be done at construction
* time, since the main and worker thread dispatchers are constructed before
* ThreadLocalStoreImpl::initializeThreading.
* @param scope the scope to contain the new per-dispatcher stats created here.
* @param prefix the stats prefix to identify this dispatcher.
*/
virtual void initializeStats(Stats::Scope& scope, const std::string& prefix) PURE;

/**
* Clear any items in the deferred deletion queue.
*/
Expand Down
8 changes: 8 additions & 0 deletions include/envoy/server/worker.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,14 @@ class Worker {
*/
virtual void start(GuardDog& guard_dog) PURE;

/**
* Initialize stats for this worker's dispatcher, if available. The worker will output
* thread-specific stats under the given scope.
* @param scope the scope to contain the new per-dispatcher stats created here.
* @param prefix the stats prefix to identify this dispatcher.
*/
virtual void initializeStats(Stats::Scope& scope, const std::string& prefix) PURE;

/**
* Stop the worker thread.
*/
Expand Down
10 changes: 10 additions & 0 deletions source/common/event/dispatcher_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@ DispatcherImpl::DispatcherImpl(Buffer::WatermarkFactoryPtr&& factory, Api::Api&

DispatcherImpl::~DispatcherImpl() {}

void DispatcherImpl::initializeStats(Stats::Scope& scope, const std::string& prefix) {
stats_prefix_ = prefix + "dispatcher";
stats_ = std::make_unique<DispatcherStats>(
DispatcherStats{ALL_DISPATCHER_STATS(POOL_HISTOGRAM_PREFIX(scope, stats_prefix_ + "."))});
base_scheduler_.initializeStats(stats_.get());
}

void DispatcherImpl::clearDeferredDeleteList() {
ASSERT(isThreadSafe());
std::vector<DeferredDeletablePtr>* to_delete = current_to_delete_;
Expand Down Expand Up @@ -158,6 +165,9 @@ void DispatcherImpl::post(std::function<void()> callback) {

void DispatcherImpl::run(RunType type) {
run_tid_ = api_.threadFactory().currentThreadId();
if (!stats_prefix_.empty()) {
ENVOY_LOG(debug, "running {} on thread {}", stats_prefix_, run_tid_->debugString());
}

// Flush all post callbacks before we run the event loop. We do this because there are post
// callbacks that have to get run before the initial event loop starts running. libevent does
Expand Down
5 changes: 5 additions & 0 deletions source/common/event/dispatcher_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
#include <cstdint>
#include <functional>
#include <list>
#include <memory>
#include <vector>

#include "envoy/api/api.h"
#include "envoy/common/time.h"
#include "envoy/event/deferred_deletable.h"
#include "envoy/event/dispatcher.h"
#include "envoy/network/connection_handler.h"
#include "envoy/stats/scope.h"

#include "common/common/logger.h"
#include "common/common/thread.h"
Expand All @@ -36,6 +38,7 @@ class DispatcherImpl : Logger::Loggable<Logger::Id::main>, public Dispatcher {

// Event::Dispatcher
TimeSource& timeSource() override { return api_.timeSource(); }
void initializeStats(Stats::Scope& scope, const std::string& prefix) override;
void clearDeferredDeleteList() override;
Network::ConnectionPtr
createServerConnection(Network::ConnectionSocketPtr&& socket,
Expand Down Expand Up @@ -72,6 +75,8 @@ class DispatcherImpl : Logger::Loggable<Logger::Id::main>, public Dispatcher {
bool isThreadSafe() const { return run_tid_ == nullptr || run_tid_->isCurrentThreadId(); }

Api::Api& api_;
std::string stats_prefix_;
std::unique_ptr<DispatcherStats> stats_;
Thread::ThreadIdPtr run_tid_;
Buffer::WatermarkFactoryPtr buffer_factory_;
LibeventScheduler base_scheduler_;
Expand Down
58 changes: 58 additions & 0 deletions source/common/event/libevent_scheduler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,17 @@
#include "common/common/assert.h"
#include "common/event/timer_impl.h"

#include "event2/util.h"

namespace Envoy {
namespace Event {

namespace {
void recordTimeval(Stats::Histogram& histogram, const timeval& tv) {
histogram.recordValue(tv.tv_sec * 1000000 + tv.tv_usec);
}
} // namespace

LibeventScheduler::LibeventScheduler() : libevent_(event_base_new()) {
// The dispatcher won't work as expected if libevent hasn't been configured to use threads.
RELEASE_ASSERT(Libevent::Global::initialized(), "");
Expand Down Expand Up @@ -41,5 +49,55 @@ void LibeventScheduler::run(Dispatcher::RunType mode) {

void LibeventScheduler::loopExit() { event_base_loopexit(libevent_.get(), nullptr); }

void LibeventScheduler::initializeStats(DispatcherStats* stats) {
stats_ = stats;
// These are thread safe.
evwatch_prepare_new(libevent_.get(), &onPrepare, this);
evwatch_check_new(libevent_.get(), &onCheck, this);
}

void LibeventScheduler::onPrepare(evwatch*, const evwatch_prepare_cb_info* info, void* arg) {
// `self` is `this`, passed in from evwatch_prepare_new.
auto self = static_cast<LibeventScheduler*>(arg);

// Record poll timeout and prepare time for this iteration of the event loop. The timeout is the
// expected polling duration, whereas the actual polling duration will be the difference measured
// between the prepare time and the check time immediately after polling. These are compared in
// onCheck to compute the poll_delay stat.
self->timeout_set_ = evwatch_prepare_get_timeout(info, &self->timeout_);
evutil_gettimeofday(&self->prepare_time_, nullptr);

// If we have a check time available from a previous iteration of the event loop (that is, all but
// the first), compute the loop_duration stat.
if (self->check_time_.tv_sec != 0) {
timeval delta;
evutil_timersub(&self->prepare_time_, &self->check_time_, &delta);
recordTimeval(self->stats_->loop_duration_us_, delta);
}
}

void LibeventScheduler::onCheck(evwatch*, const evwatch_check_cb_info*, void* arg) {
// `self` is `this`, passed in from evwatch_check_new.
auto self = static_cast<LibeventScheduler*>(arg);

// Record check time for this iteration of the event loop. Use this together with prepare time
// from above to compute the actual polling duration, and store it for the next iteration of the
// event loop to compute the loop duration.
evutil_gettimeofday(&self->check_time_, nullptr);
if (self->timeout_set_) {
timeval delta, delay;
evutil_timersub(&self->check_time_, &self->prepare_time_, &delta);
evutil_timersub(&delta, &self->timeout_, &delay);

// Delay can be negative, meaning polling completed early. This happens in normal operation,
// either because I/O was ready before we hit the timeout, or just because the kernel was
// feeling saucy. Disregard negative delays in stats, since they don't indicate anything
// particularly useful.
if (delay.tv_sec >= 0) {
recordTimeval(self->stats_->poll_delay_us_, delay);
}
}
}

} // namespace Event
} // namespace Envoy
15 changes: 15 additions & 0 deletions source/common/event/libevent_scheduler.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "common/event/libevent.h"

#include "event2/event.h"
#include "event2/watch.h"

namespace Envoy {
namespace Event {
Expand Down Expand Up @@ -40,8 +41,22 @@ class LibeventScheduler : public Scheduler {
*/
event_base& base() { return *libevent_; }

/**
* Start writing stats once thread-local storage is ready to receive them (see
* ThreadLocalStoreImpl::initializeThreading).
*/
void initializeStats(DispatcherStats* stats_);

private:
static void onPrepare(evwatch*, const evwatch_prepare_cb_info* info, void* arg);
static void onCheck(evwatch*, const evwatch_check_cb_info*, void* arg);

Libevent::BasePtr libevent_;
DispatcherStats* stats_{}; // stats owned by the containing DispatcherImpl
bool timeout_set_{}; // whether there is a poll timeout in the current event loop iteration
timeval timeout_{}; // the poll timeout for the current event loop iteration, if available
timeval prepare_time_{}; // timestamp immediately before polling
timeval check_time_{}; // timestamp immediately after polling
};

} // namespace Event
Expand Down
2 changes: 1 addition & 1 deletion source/server/config_validation/server.cc
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ void ValidationInstance::initialize(const Options& options,
Configuration::InitialImpl initial_config(bootstrap);
overload_manager_ = std::make_unique<OverloadManagerImpl>(dispatcher(), stats(), threadLocal(),
bootstrap.overload_manager(), *api_);
listener_manager_ = std::make_unique<ListenerManagerImpl>(*this, *this, *this);
listener_manager_ = std::make_unique<ListenerManagerImpl>(*this, *this, *this, false);
thread_local_.registerThread(*dispatcher_, true);
runtime_loader_ = component_factory.createRuntime(*this, initial_config);
secret_manager_ = std::make_unique<Secret::SecretManagerImpl>();
Expand Down
17 changes: 11 additions & 6 deletions source/server/listener_manager_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -671,10 +671,13 @@ void ListenerImpl::setSocket(const Network::SocketSharedPtr& socket) {

ListenerManagerImpl::ListenerManagerImpl(Instance& server,
ListenerComponentFactory& listener_factory,
WorkerFactory& worker_factory)
: server_(server), factory_(listener_factory), stats_(generateStats(server.stats())),
WorkerFactory& worker_factory,
bool enable_dispatcher_stats)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I put this here, rather than as a param in startWorkers() or a new initializeStats() method, to avoid touching the ListenerManager interface. I could do this differently if anybody feels I should...

: server_(server), factory_(listener_factory),
scope_(server.stats().createScope("listener_manager.")), stats_(generateStats(*scope_)),
config_tracker_entry_(server.admin().getConfigTracker().add(
"listeners", [this] { return dumpListenerConfigs(); })) {
"listeners", [this] { return dumpListenerConfigs(); })),
enable_dispatcher_stats_(enable_dispatcher_stats) {
for (uint32_t i = 0; i < server.options().concurrency(); i++) {
workers_.emplace_back(worker_factory.createWorker(server.overloadManager()));
}
Expand Down Expand Up @@ -718,9 +721,7 @@ ProtobufTypes::MessagePtr ListenerManagerImpl::dumpListenerConfigs() {
}

ListenerManagerStats ListenerManagerImpl::generateStats(Stats::Scope& scope) {
const std::string final_prefix = "listener_manager.";
return {ALL_LISTENER_MANAGER_STATS(POOL_COUNTER_PREFIX(scope, final_prefix),
POOL_GAUGE_PREFIX(scope, final_prefix))};
return {ALL_LISTENER_MANAGER_STATS(POOL_COUNTER(scope), POOL_GAUGE(scope))};
}

bool ListenerManagerImpl::addOrUpdateListener(const envoy::api::v2::Listener& config,
Expand Down Expand Up @@ -1006,12 +1007,16 @@ void ListenerManagerImpl::startWorkers(GuardDog& guard_dog) {
ENVOY_LOG(info, "all dependencies initialized. starting workers");
ASSERT(!workers_started_);
workers_started_ = true;
uint32_t i = 0;
for (const auto& worker : workers_) {
ASSERT(warming_listeners_.empty());
for (const auto& listener : active_listeners_) {
addListenerToWorker(*worker, *listener);
}
worker->start(guard_dog);
if (enable_dispatcher_stats_) {
worker->initializeStats(*scope_, fmt::format("worker_{}.", i++));
}
}
}

Expand Down
4 changes: 3 additions & 1 deletion source/server/listener_manager_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ struct ListenerManagerStats {
class ListenerManagerImpl : public ListenerManager, Logger::Loggable<Logger::Id::config> {
public:
ListenerManagerImpl(Instance& server, ListenerComponentFactory& listener_factory,
WorkerFactory& worker_factory);
WorkerFactory& worker_factory, bool enable_dispatcher_stats);

void onListenerWarmed(ListenerImpl& listener);

Expand Down Expand Up @@ -177,9 +177,11 @@ class ListenerManagerImpl : public ListenerManager, Logger::Loggable<Logger::Id:
std::list<DrainingListener> draining_listeners_;
std::list<WorkerPtr> workers_;
bool workers_started_{};
Stats::ScopePtr scope_;
ListenerManagerStats stats_;
ConfigTracker::EntryOwnerPtr config_tracker_entry_;
LdsApiPtr lds_api_;
const bool enable_dispatcher_stats_{};
};

// TODO(mattklein123): Consider getting rid of pre-worker start and post-worker start code by
Expand Down
Loading