-
Notifications
You must be signed in to change notification settings - Fork 5.5k
event: reintroduce dispatcher stats #6659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,3 +13,4 @@ Operations and administration | |
| runtime | ||
| fs_flags | ||
| traffic_tapping | ||
| performance | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| .. _operations_performance: | ||
|
|
||
| Performance | ||
| =========== | ||
|
|
||
| Envoy is architected to optimize scalability and resource utilization by running an event loop on a | ||
| :ref:`small number of threads <arch_overview_threading>`. The "main" thread is responsible for | ||
| control plane processing, and each "worker" thread handles a portion of the data plane processing. | ||
| Envoy exposes two statistics to monitor performance of the event loops on all these threads. | ||
|
|
||
| * **Loop duration:** Some amount of processing is done on each iteration of the event loop. This | ||
| amount will naturally vary with changes in load. However, if one or more threads have an unusually | ||
| long-tailed loop duration, it may indicate a performance issue. For example, work might not be | ||
| distributed fairly across the worker threads, or there may be a long blocking operation in an | ||
| extension that's impeding progress. | ||
|
|
||
| * **Poll delay:** On each iteration of the event loop, the event dispatcher polls for I/O events | ||
| and "wakes up" either when some I/O events are ready to be processed or when a timeout fires, | ||
| whichever occurs first. In the case of a timeout, we can measure the difference between the | ||
| expected wakeup time and the actual wakeup time after polling; this difference is called the "poll | ||
| delay." It's normal to see some small poll delay, usually equal to the kernel scheduler's "time | ||
| slice" or "quantum"---this depends on the specific operating system on which Envoy is | ||
| running---but if this number elevates substantially above its normal observed baseline, it likely | ||
| indicates kernel scheduler delays. | ||
|
|
||
| These statistics can be enabled by setting :ref:`enable_dispatcher_stats <envoy_api_field_config.bootstrap.v2.Bootstrap.enable_dispatcher_stats>` | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This bit is new. |
||
| to true. | ||
|
|
||
| .. warning:: | ||
|
|
||
| Note that enabling dispatcher stats records a value for each iteration of the event loop on every | ||
| thread. This should normally be minimal overhead, but when using | ||
| :ref:`statsd <envoy_api_msg_config.metrics.v2.StatsdSink>`, it will send each observed value over | ||
| the wire individually because the statsd protocol doesn't have any way to represent a histogram | ||
| summary. Be aware that this can be a very large volume of data. | ||
|
|
||
| Statistics | ||
| ---------- | ||
|
|
||
| The event dispatcher for the main thread has a statistics tree rooted at *server.dispatcher.*, and | ||
| the event dispatcher for each worker thread has a statistics tree rooted at | ||
| *listener_manager.worker_<id>.dispatcher.*, each with the following statistics: | ||
|
|
||
| .. csv-table:: | ||
| :header: Name, Type, Description | ||
| :widths: 1, 1, 2 | ||
|
|
||
| loop_duration_us, Histogram, Event loop durations in microseconds | ||
| poll_delay_us, Histogram, Polling delays in microseconds | ||
|
|
||
| Note that any auxiliary threads are not included here. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -671,10 +671,13 @@ void ListenerImpl::setSocket(const Network::SocketSharedPtr& socket) { | |
|
|
||
| ListenerManagerImpl::ListenerManagerImpl(Instance& server, | ||
| ListenerComponentFactory& listener_factory, | ||
| WorkerFactory& worker_factory) | ||
| : server_(server), factory_(listener_factory), stats_(generateStats(server.stats())), | ||
| WorkerFactory& worker_factory, | ||
| bool enable_dispatcher_stats) | ||
|
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: I put this here, rather than as a param in |
||
| : server_(server), factory_(listener_factory), | ||
| scope_(server.stats().createScope("listener_manager.")), stats_(generateStats(*scope_)), | ||
| config_tracker_entry_(server.admin().getConfigTracker().add( | ||
| "listeners", [this] { return dumpListenerConfigs(); })) { | ||
| "listeners", [this] { return dumpListenerConfigs(); })), | ||
| enable_dispatcher_stats_(enable_dispatcher_stats) { | ||
| for (uint32_t i = 0; i < server.options().concurrency(); i++) { | ||
| workers_.emplace_back(worker_factory.createWorker(server.overloadManager())); | ||
| } | ||
|
|
@@ -718,9 +721,7 @@ ProtobufTypes::MessagePtr ListenerManagerImpl::dumpListenerConfigs() { | |
| } | ||
|
|
||
| ListenerManagerStats ListenerManagerImpl::generateStats(Stats::Scope& scope) { | ||
| const std::string final_prefix = "listener_manager."; | ||
| return {ALL_LISTENER_MANAGER_STATS(POOL_COUNTER_PREFIX(scope, final_prefix), | ||
| POOL_GAUGE_PREFIX(scope, final_prefix))}; | ||
| return {ALL_LISTENER_MANAGER_STATS(POOL_COUNTER(scope), POOL_GAUGE(scope))}; | ||
| } | ||
|
|
||
| bool ListenerManagerImpl::addOrUpdateListener(const envoy::api::v2::Listener& config, | ||
|
|
@@ -1006,12 +1007,16 @@ void ListenerManagerImpl::startWorkers(GuardDog& guard_dog) { | |
| ENVOY_LOG(info, "all dependencies initialized. starting workers"); | ||
| ASSERT(!workers_started_); | ||
| workers_started_ = true; | ||
| uint32_t i = 0; | ||
| for (const auto& worker : workers_) { | ||
| ASSERT(warming_listeners_.empty()); | ||
| for (const auto& listener : active_listeners_) { | ||
| addListenerToWorker(*worker, *listener); | ||
| } | ||
| worker->start(guard_dog); | ||
| if (enable_dispatcher_stats_) { | ||
| worker->initializeStats(*scope_, fmt::format("worker_{}.", i++)); | ||
| } | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it just statsd or also other stats sinks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just statsd. The Monarch sink appears to just dump histograms on the floor, and the Hystrix and Metrics sinks both do the right thing with parent histograms.