-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: Runtime stats #3845
rfc: Runtime stats #3845
Conversation
|
||
### I/O and Timer implementations | ||
|
||
Since, the two drivers (I/O and timer) that `tokio` provides are singtons within the runtime there is no need to iterate through their stats like the executor stats. In-addition, it is possible to stream the metrics directly from the driver events rather than needing to batch them like the executor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we are using a lot of atomics in this manner, we should be careful regarding false sharing of the atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on what you mean by false sharing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a bunch of atomic variables stored together (in the same cache line), with many threads writing to them concurrently, then this can impact performance quite a lot, even if the writes are affecting two different counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the issue is true sharing (contention on atomics), so padding stuff out won't solve it either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it a case of true sharing? My read is that there is a single driver and one thread polls it. It can store stats in an atomic and an arbitrary number of stats aggregators can load it.
Contention happens when there are concurrent mutations, which is not (afaik) the case here.
|
||
To avoid any extra overhead in the executor loop, each worker will batch metrics into a `Core` local struct. These values will be incremented or sampled during regular executor cycles when certain operations happen like a work steal attempt or a pop from one of the queues. | ||
|
||
The batches will be streamed via atomics to the stats struct directly. This will reduce any cross CPU work while the executor is running and amortize the cost of having to do cross CPU work. Batches will be sent before the executor attempts to park the thread. This will happen either when there is no work to be done or when the executor has hit the maintance tick. At this point before the thread will park, the executor will submit the batch. Generally, since parking is more expensive then submitting batches there should not be any added latency to the executor cycle in this process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the user supposed to hook up to the stats struct? Read it at regular intervals? Is there a mechanism for being notified when a batch update happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is polling, and described in the next paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of the technical aspects of the RFC make sense, but tightening up the grammar/structure/flow will strengthen the overall proposal, especially since this will end up as documentation read by users.
|
||
### Executor | ||
|
||
The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: structure/flow
The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric. | |
Statistics will be provided on a per-worker basis, whether using the single-threaded or multi-threaded executor. Aggregating and merging these per-worker statistics in a way that makes more sense when used from existing telemetry collection systems will be provided by crates like `tokio-metrics`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To iterate on that:
The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric. | |
Statistics will be provided on a per-worker basis, whether using the single-threaded or multi-threaded executor. Aggregated and merged per-worker statistics, which may be more amenable to existing telemetry collection systems, will be provided by crates like `tokio-metrics`. |
|
||
The values will be updated in batch from the executor to avoid needing to stream the data on every action. This should amortize the cost by only needing to emit stats at a specific executor wall clock time. Where the executor wall clock time is determinetd by a single executor tick rather than actual system time. This allows the collectors to observe the time and the stats to determine how long certain executor cycles took. This removes the need to acquire the time during executor cycles. | ||
|
||
Each worker will expose these stats, updated in batches: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: redundancy
You already mentioned in the above paragraph that these are batched.
|
||
The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric. | ||
|
||
The values will be updated in batch from the executor to avoid needing to stream the data on every action. This should amortize the cost by only needing to emit stats at a specific executor wall clock time. Where the executor wall clock time is determinetd by a single executor tick rather than actual system time. This allows the collectors to observe the time and the stats to determine how long certain executor cycles took. This removes the need to acquire the time during executor cycles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to speak more to the time aspect. This will be important to deriving rates from monotonic counters.
In other words, I know what you're driving at by talking about the executor ticking at a predictable interval, but that needs to be made explicit here in order to drive home the point that it's being used, or could be used, as an invariant, specifically because it ties into the staleness guarantees around specific statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The references to "wall clock time" are also confusing, since wall clock time is, by definition, "real" time and not something that happens in, say, ticks.
- Amount of executor ticks (loop iterations) | ||
- Number of `block_in_place` tasks entered | ||
|
||
The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: structure/flow
I think wording like this could go into the summary/motivation sections.
|
||
The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing. | ||
|
||
Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: structure/flow
Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values. | |
Some of these statistics, such as queue depth, include the minimum and maximum value measured during the given observation window. These statistics can rapidly change under heavy load, and this approach provides a middle ground between streaming all measurements/changes (expensive) and potentially not observing the spikes at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason why we choose min/max/avg over, say, percentiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a question of the runtime overhead caused by this. Percentiles will require more work will collecting data, and you might not be able to use atomic counters anymore.
An external aggregator that consumes the stats (e.g. once per second) could still perform aggregation and do percentiles based on the amount of occurrences inside the sampling period.
However an external aggregator won't be able to capture min/max values in case there are peaks inside that sampling period. E.g. if you want to have a metric which is around "maximum tasks polled inside an executor iteration" and "minimum tasks polled", you couldn't get that if you just have counters of
- eventloop iterations
- tasks polled
I guess for tasks where we find those values useful, it makes sense to add them.
Otherwise it's probably easiest to just add always incrementing counters and let the external application do the diffing and aggregation. You can provide some helpers that allow like:
let mut last_stats = stats.executor();
loop {
std::thread::delay(sampling_time);
let stats = stats.executor();
let delta_stats = stats.diff(last_stats);
my_favorite_metric_system.aggregate_and_emit(delta_stats); // or potentially also the raw stats
last_stats = stats;
E.g. we had issues in the past where some metrics that only had been emitted once per minute didn't show BPS spikes that happened inside some seconds and caused excessive packet drops.
@LucioFranco Might be worthwhile to document that kind of periodic sampling system in the "guide" section, since there had been a few questions on how to use the thing.
|
||
### I/O and Timer implementations | ||
|
||
Since, the two drivers (I/O and timer) that `tokio` provides are singtons within the runtime there is no need to iterate through their stats like the executor stats. In-addition, it is possible to stream the metrics directly from the driver events rather than needing to batch them like the executor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
singtons
-> singletons
|
||
The batches will be streamed via atomics to the stats struct directly. This will reduce any cross CPU work while the executor is running and amortize the cost of having to do cross CPU work. Batches will be sent before the executor attempts to park the thread. This will happen either when there is no work to be done or when the executor has hit the maintance tick. At this point before the thread will park, the executor will submit the batch. Generally, since parking is more expensive then submitting batches there should not be any added latency to the executor cycle in this process. | ||
|
||
This then allows the collector to poll the stats on any interval. Thus, allowing it to drive its own timer to understand an estimate of the duration that a certain amount of ticks too, or how many times the the executor entered the park state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: structure/flow
This sentence has grammatical issues, but I think the bigger problem is that it potentially conflicts with the idea that the executor ticks on a predictable interval. Why would we need to track the duration vs ticks ratio ourselves?
|
||
### `tokio-metrics` | ||
|
||
The `tokio-metrics` crate will provide aggregated metrics based on the `Stats` struct. This will include histograms and other useful aggregated forms of the stats that could be emitted by various metrics implementations. This crate is designed to provide the ability to expose the aggregated stats in an unstable `0.1` method outside of the runtime and allow the ability to iterate on how they are aggregated without the need to follow `tokio`'s strict versioning scheme. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help to expand on this. What specific aggregations will tokio-metrics expose. How do you expect this to be used in practice? What will trigger alerts, how are engineers expected to use the aggregations in their workflow, etc...
- Min local queue depth | ||
- Avg local queue depth | ||
- Queue depth at time of batch emission | ||
- Amount of executor ticks (loop iterations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, you intend to provide information about how long executors spend busy in tasks by allowing collectors to observe the number of ticks which occur in a known time interval. However, I think you also need to measure the amount of time spent parked in order to avoid counting parked time as "busy" time.
|
||
The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing. | ||
|
||
Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require a way to atomically capture the values in a single batch (and observe when a new batch is ready), I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree — unless we want consumers to poll these statistics, we'll need some kind of subscription/notify mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a question if atomicity is important to interpret metrics correctly, or whether you are ok that individual values don't match each other (e.g. the sum of tasks run per worker doesn't match the total tasks run metric).
Since it's "just metrics", I think one can be ok with the latter. It will simplify the implementation.
And polling metrics is reasonable. You can always increase the polling frequency to get more details. Polling mostly isn't feasible if are interested in every single event. But that won't work in this system anyway, if it makes use of internal batching.
|
||
### I/O and Timer implementations | ||
|
||
Since, the two drivers (I/O and timer) that `tokio` provides are singtons within the runtime there is no need to iterate through their stats like the executor stats. In-addition, it is possible to stream the metrics directly from the driver events rather than needing to batch them like the executor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case the issue is true sharing (contention on atomics), so padding stuff out won't solve it either.
|
||
There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats. | ||
|
||
A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was odd to me. "stats" is short for "statistics", which are just as much aggregated as "metrics" are. Would "performance counters" be better"? Or "performance events"? Or maybe "observations" or simply "data"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for "metrics"; I'm not sure the distinction here matters that much. People will want the metrics, try to figure out how it works, see they need an extra crate, and be on their way. Calling it "stats" doesn't imply that, we'll have to spell it out in the documentation. So just go with the more common term of "metrics," IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think of those are fine (and like the "performance counters" too). It should just be consistent
|
||
## Motivation | ||
|
||
When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up too? How can I optimize my application? This RFC intends to provide a foundation to answer these questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up too? How can I optimize my application? This RFC intends to provide a foundation to answer these questions. | |
When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up to? How can I optimize my application? This RFC intends to provide a foundation to answer these questions. |
|
||
Runtime stats will be exposed via a struct that is attainable via the `tokio::runtime::Handle`. Calling `Handle::stats()` will return a reference counted struct that contains raw stat values. Through this, there will be a `tokio-metrics` crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the `metrics` crate. | ||
|
||
```rust= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the =
intentional here? It prevents highlighting.
|
||
### Executor | ||
|
||
The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To iterate on that:
The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric. | |
Statistics will be provided on a per-worker basis, whether using the single-threaded or multi-threaded executor. Aggregated and merged per-worker statistics, which may be more amenable to existing telemetry collection systems, will be provided by crates like `tokio-metrics`. |
|
||
The handle will provide stats for each worker thread or in the case of the single threaded executor will provide a single worker. This provides a detailed view into what each worker is doing. Providing the ability for the `tokio-metrics` crate to expose the stats aggregated as a single metric or as a per-worker metric. | ||
|
||
The values will be updated in batch from the executor to avoid needing to stream the data on every action. This should amortize the cost by only needing to emit stats at a specific executor wall clock time. Where the executor wall clock time is determinetd by a single executor tick rather than actual system time. This allows the collectors to observe the time and the stats to determine how long certain executor cycles took. This removes the need to acquire the time during executor cycles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. The references to "wall clock time" are also confusing, since wall clock time is, by definition, "real" time and not something that happens in, say, ticks.
The blocking pool already tracks the number of idle threads and the total number of threads. These values are currently within a shared mutex but can be moved to be `AtomicUsize` values and then shared with the `Stats` struct to be sampled by the collector. In addition, a counter that is incremented on each task execution will be included. All values will be streamed to the stats struct via atomics. | ||
|
||
Stats from the blocking pool: | ||
- Number of idle threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value also feels like it might vary wildly. Should it also present min/max/avg?
|
||
### Task | ||
|
||
This implementation will avoid tracking stats/metrics at the task level due to the overhead required. This will instead be accomplished by the [tokio console](https://github.com/tokio-rs/console). This will allow the user to attach the console and take the performance hit when they want to explore issues in more detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation will avoid tracking stats/metrics at the task level due to the overhead required. This will instead be accomplished by the [tokio console](https://github.com/tokio-rs/console). This will allow the user to attach the console and take the performance hit when they want to explore issues in more detail. | |
This RFC does not propose tracking stats/metrics at the task level due to the overhead required. Instead, the this is left to projects like the [tokio console](https://github.com/tokio-rs/console), which allows the user to attach the console and take the performance hit when they want to explore issues in more detail. |
|
||
### I/O driver | ||
|
||
Unlike, the executor stats, stats coming from the I/O driver will be streamed directly to the `Stats` struct via atomics. Each value will be incremented (via `AtomicU64::fetch_add`) for each event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't know what "streaming ... via atomics" means.
Also, Stats
struct hasn't been defined.
Unlike, the executor stats, stats coming from the I/O driver will be streamed directly to the `Stats` struct via atomics. Each value will be incremented (via `AtomicU64::fetch_add`) for each event. | ||
|
||
List of stats provided from the I/O driver: | ||
- Amount of compact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, too, "number" seems preferable to "amount".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "compact" mean here?
List of stats provided from the I/O driver: | ||
- Amount of compact | ||
- Amount of "token dispatches" (aka ready events) | ||
- Amount of fd currently registered with `io::Driver` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Amount of fd currently registered with `io::Driver` | |
- Number of file descriptors currently registered with `io::Driver` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call it IO handles or IO resources?
|
||
This RFC proposes a new way to gather understanding from the Tokio runtime. Currently, the runtime does not expose any methods to understand what is happening under the hood. This provides a rough experience when deploying Tokio based applications into production where you would like to understand what is happening to your code. Via this RFC, we will propose a few methods to collect this data at different levels. Beyond what is proposed as implemenation in this RFC, we will also discuss other methods to gather the information a user might need to be successful with Tokio in a production environment. | ||
|
||
There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "A ... stats" seems grammatically weird --- i would just say
There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats. | |
There are two main types of stats that a runtime can expose. Per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime, and per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus implemenation
-> implementation
|
||
There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats. | ||
|
||
A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data. | |
A small note: the term "stats" is used instead of "metrics", because we are only concerned with exposing raw data rather than methods of aggregating and emitting that data. |
|
||
Runtime stats will be exposed via a struct that is attainable via the `tokio::runtime::Handle`. Calling `Handle::stats()` will return a reference counted struct that contains raw stat values. Through this, there will be a `tokio-metrics` crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the `metrics` crate. | ||
|
||
```rust= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```rust= | |
```rust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting started on this! Looking forward to it.
|
||
There are two main types of stats that a runtime can expose. A per-runtime stats (eg `executor_load`, `fds_registered`) that are collected indepdently of the tasks running on the runtime. A per-task stats (eg `poll_duration`, `amount_polls`) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats. | ||
|
||
A small note, the term `stats` is used instead of `metrics` because we are only concerned with exposing raw data rather than methods of aggregating and emiting that data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think of those are fine (and like the "performance counters" too). It should just be consistent
|
||
## Motivation | ||
|
||
When developing and writing Tokio applications, there are many forms of support, be it tutorials or discord. But when running these applications in production there is not much support. Users want to understand what is happening behind the scenes. What is my runtime up too? How can I optimize my application? This RFC intends to provide a foundation to answer these questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is my runtime up too? How can I optimize my application?
I would recommend to make this a bit more concrete, because the "what is going on" duplicates a couple of times in the doc without going much deeper.
Among:
- Why is the latency of the system higher than expected?
- Why does memory utilization grow over time?
- Why does the service run out of the file descriptor limit?
|
||
## Guide-level explanation | ||
|
||
Runtime stats will be exposed via a struct that is attainable via the `tokio::runtime::Handle`. Calling `Handle::stats()` will return a reference counted struct that contains raw stat values. Through this, there will be a `tokio-metrics` crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like the `metrics` crate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what is returned could be a reference-counted accessor for the raw values. But it doesn't have to store the stats itself. It simply can contain a fn stats(&self) -> RealStats
function which returns a POD struct with just values in it. How the accessor handler gets those doesn't matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Through this, there will be a
tokio-metrics
crate that converts these raw stats into proper aggregated metrics that can be consumed by end user metrics collection systems like themetrics
crate.
This is more confusing than helpful to be at the moment. What are "proper aggregated metrics"? What is not proper about the other ones? Maybe its easier to leave that detail out of this proposal, and just mention that metric submission is out of scope because it is application dependent?
let executor = stats.executor(); | ||
|
||
// per-worker stats via the executor. | ||
for worker in executor.workers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that number even static? Is there a unique worker ID?
|
||
Each worker will expose these stats, updated in batches: | ||
|
||
- Amount of futures executed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amount of futures polled sees right to me. I don't think it should be "distinct". If the same future gets scheduled multiple times, it is also work.
|
||
### Task | ||
|
||
This implementation will avoid tracking stats/metrics at the task level due to the overhead required. This will instead be accomplished by the [tokio console](https://github.com/tokio-rs/console). This will allow the user to attach the console and take the performance hit when they want to explore issues in more detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of confusing, since per-task stats seem mentioned in the intro?
A per-task stats (eg
poll_duration
,amount_polls
) that are collected and aggregated at the task level. This RFC will propose an implemenation for implementing per-runtime stats but will also mention methods to capture per-task stats.
Apart from that I'm ok with them not being able in the beginning. Once users have global visibility and see some abnormalities they can always add custom instrumentation to their tasks/futures to figure out the details. The executor-level stats are more tricky because those details are not exposed to users.
List of stats provided from the I/O driver: | ||
- Amount of compact | ||
- Amount of "token dispatches" (aka ready events) | ||
- Amount of fd currently registered with `io::Driver` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call it IO handles or IO resources?
|
||
List of stats provided from the I/O driver: | ||
- Amount of compact | ||
- Amount of "token dispatches" (aka ready events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a tokio/mio concept. "ready events" might be a better term to expose externally
|
||
To avoid any extra overhead in the executor loop, each worker will batch metrics into a `Core` local struct. These values will be incremented or sampled during regular executor cycles when certain operations happen like a work steal attempt or a pop from one of the queues. | ||
|
||
The batches will be streamed via atomics to the stats struct directly. This will reduce any cross CPU work while the executor is running and amortize the cost of having to do cross CPU work. Batches will be sent before the executor attempts to park the thread. This will happen either when there is no work to be done or when the executor has hit the maintance tick. At this point before the thread will park, the executor will submit the batch. Generally, since parking is more expensive then submitting batches there should not be any added latency to the executor cycle in this process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is polling, and described in the next paragraph.
|
||
The main goal of this implementation is to allow a user to run this metrics collection at all times in production with minimal overhead to their application. This would allow users to alarm on any regressions and track how the runtime is performing. | ||
|
||
Some of the stats include min/max (specifically the queue depth stats) this is because the depth of the queues changes throughout the stats batch window. The value could start low, spike up during the middle of the window then come back down. To understand this behavior the executor stats module will aggregate the depth values to reduce the need to stream the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a question of the runtime overhead caused by this. Percentiles will require more work will collecting data, and you might not be able to use atomic counters anymore.
An external aggregator that consumes the stats (e.g. once per second) could still perform aggregation and do percentiles based on the amount of occurrences inside the sampling period.
However an external aggregator won't be able to capture min/max values in case there are peaks inside that sampling period. E.g. if you want to have a metric which is around "maximum tasks polled inside an executor iteration" and "minimum tasks polled", you couldn't get that if you just have counters of
- eventloop iterations
- tasks polled
I guess for tasks where we find those values useful, it makes sense to add them.
Otherwise it's probably easiest to just add always incrementing counters and let the external application do the diffing and aggregation. You can provide some helpers that allow like:
let mut last_stats = stats.executor();
loop {
std::thread::delay(sampling_time);
let stats = stats.executor();
let delta_stats = stats.diff(last_stats);
my_favorite_metric_system.aggregate_and_emit(delta_stats); // or potentially also the raw stats
last_stats = stats;
E.g. we had issues in the past where some metrics that only had been emitted once per minute didn't show BPS spikes that happened inside some seconds and caused excessive packet drops.
@LucioFranco Might be worthwhile to document that kind of periodic sampling system in the "guide" section, since there had been a few questions on how to use the thing.
|
||
Each worker will expose these stats, updated in batches: | ||
|
||
- Amount of futures executed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about futures passed to block_on
? Which thread are they on? What about futures polled in a LocalSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So localset is a good question...
For block_on I would say it doesn't run on the main executor so it doesn't count?
Please see the initial work in #4043 and provide feedback on the direction. |
Thanks for the work. I'm going to close this due to inactivity. If you want to continue this patch, please open a new PR and reference this one. |
Rendered
This RFC proposes the low level stats implementation within tokio to be used by metrics aggregators/collectors to expose within dashboards such as grafana, etc. This low level stats will be the foundation for tokio's future runtime observability goals and do not present a complete story since they will mostly be raw values that are unaggregated.