stats: metric expiry#40395
Conversation
Change-Id: If3a45283b13cfda7d4f9a7bb661a1573f552ed7e Signed-off-by: Kuat Yessenov <kuat@google.com>
|
Not sure who is the best person to review stats these days. |
|
@envoyproxy/senior-maintainers assignee is @mattklein123 |
Change-Id: I6748662507d4b540076381379a26f53a924cb815 Signed-off-by: Kuat Yessenov <kuat@google.com>
|
I have a nit question. The mechanism that istio stats used should works perfectly, why this new PR is necessary? /wait-any |
|
@wbpcode Several issues with Istio scope rotation:
I plan to switch Istio to per-metric expiry with a single shared bounded scope rather than a per-scope expiry. That will give better controls over cardinality. If each filter holds a separate scope, we can't properly bound overall metric size. |
Change-Id: Ib8d17e94db3a92c211a00e506cf6ef2bf9066c5b
hmmm, it's reasonable. Although I initially a little against to add this per metric lifetime management because I think it's complex and the benefit is not that big. But after check your code, I think seems it's fine. Will take a deep look at the code after a while. cc @jmarantz cc @paul-r-gall any suggestion as the stats expert? |
|
I haven't looked at the pr yet but from the description I just want to
double check there is no way to implement the semantics you want without
adding complexity to the base stats system.
For example, you could create a ScopeSharedPtr with an empty name and use
that to control lifetime of all stats created in it, using whatever timers
you want.
…On Fri, Jul 25, 2025, 6:08 AM code ***@***.***> wrote:
*wbpcode* left a comment (envoyproxy/envoy#40395)
<#40395 (comment)>
@wbpcode <https://github.com/wbpcode> Several issues with Istio scope
rotation:
1. It uses an unsafe pointer when switching scopes, there is some
inherent race that never occurs.
2. It busts the fast data path thread local cache on a timer.
Re-creation of these fast path stats causes more CPU spike and use than
necessary.
3. It is only effective when combined with the cardinality limiter,
which we would have to add directly to Scope (an overflow metric). It makes
sense to do both expiry and limiter in the same place.
I plan to switch Istio to per-metric expiry with a single shared bounded
scope rather than a per-scope expiry. That will give better controls over
cardinality. If each filter holds a separate scope, we can't properly bound
overall metric size.
hmmm, it's reasonable. Although I initially a little against to add this
per metric lifetime management because I think the benefit is not hat
large. But after check your code, I think seems it's fine. Will take a deep
look at the code after a while.
cc @jmarantz <https://github.com/jmarantz> cc @paul-r-gall
<https://github.com/paul-r-gall> any suggestion as the stats expert?
—
Reply to this email directly, view it on GitHub
<#40395 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAO2IPLKLJGBGNMS7VOCCV33KH62XAVCNFSM6AAAAACCHIBKP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMJXGE3TOMBVGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
@jmarantz I do want to use scope to control the lifetime of metrics in it. But we also don't want to lose the benefit of thread local cached stats. This is meant to be used in a shared proxy (backends are applications and labels are derived from them, applications change through the life of the proxy). In a steady state, we don't want a forced penalty caused by the whole scope deletion, so recently used stats should remain in caches. I don't see another way to do it except extending the core system. We could utilize this approach in many places, e.g. Wasm (#14070), or Istio-like metrics (#30619). From my reading of opencensus, they also track usage per stat and expire them. It seems necessary for any dynamic plugins which do not have a predefined set of metrics. |
Change-Id: I30ecee110600b394995aadecd4548107d283739e
|
I don't see any reason why this won't work, but FWIW it seems pretty fragile to me in terms of how it relates to flushing. Why not just handle this logic during flushing? You can see if a metric has been used between the last flush and the current flush and then just get rid of it at that point? I've implemented something similar here: https://github.com/bitdriftlabs/shared-core/blob/main/bd-client-stats-store/src/lib.rs |
|
Thanks, that's a great suggestion to combine flush and eviction. I realized
there is a bug in this PR because histogram deltas are cleared during
merge, so we cannot merge outside of flush, and we need merge to detect
usage for eviction.
I think the challenge is making eviction apply only to some counters and
not others (because envoy core keeps them as references and not shared ptrs
unfortunately). And also the default flush is too fast (5s) which will
churn, but we could use a multiple of flush intervals with no problem. I'll
work on the next iteration.
…On Mon, Jul 28, 2025 at 7:57 PM Matt Klein ***@***.***> wrote:
*mattklein123* left a comment (envoyproxy/envoy#40395)
<#40395 (comment)>
See also
https://github.com/bitdriftlabs/shared-core/blob/main/bd-client-stats/src/stats.rs#L373-L386
—
Reply to this email directly, view it on GitHub
<#40395 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACIYRRU4KFYSVRI3LNZDTZD3K3PLJAVCNFSM6AAAAACCHIBKP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMZQGQZTKOBZGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
That change suggested by @mattklein123 seems to clarify things -- I was trying to (without reading the code) reason about a state where we've decided we might want to evict a stat, but want to leave it in the thread-local caches so it could be revived. This also feels a bit like what @stevenzzzz did for lazy stats (#23921 and #27899 are a good starting point). He may want to look at this stuff. I still have questions about desired semantics. Do you want some stats that have not been recently updated to be removed at some point? I assume all stats are looked via sinks or the prom endpoint; in other words nothing will remain unobserved. |
The desired semantics is to remove stats that have not been used in the last O(minutes) from memory. We are not going to use prometheus since it does not support delta observations, but in OTLP you can send only the difference from the last flush, so we don't need to keep aggregate counters indefinitely. |
I think lazy creation of stats is somewhat orthogonal. We would always create stats on-demand based on traffic. |
|
When you say a stat has not been |
For counters, no increment happened, and for histograms, no samples produced. I'm working with the managed OTLP implementation, where Google backends will convert back to cumulative monarch time series (fake deltas). The key is that the state is aggregated outside the proxy. |
|
@kyessenov one other thing that you might consider is creating some entirely new dynamic_stats thing that somehow uses shared_ptr internally so then it would look pretty much exactly like the Rust code I have. Not sure how hard that would be to bolt on top but might be better in the long run. I'm sure there are many places that might use something like this. |
Change-Id: Iccf33fd3a693118560a7adb359dbc340fff69d06
Change-Id: Iacf3f56f29baa7e09fe443001066ce27114f812b
It's fairly difficult because of the pervasive use of stats macros. I think we can gradually transition scope-by-scope though. Updated the code to evict during flush. |
Change-Id: Ia79d583992e9760b0ae14e691019249c27c7528c
| void sub(uint64_t amount) override { | ||
| ASSERT(child_value_ >= amount); | ||
| ASSERT(used() || amount == 0); | ||
| // ASSERT(used() || amount == 0); |
There was a problem hiding this comment.
Reverted. I changed it so that only evictable scope metrics are marked as unused.
Change-Id: Iec9a7db939baed48cfcc52119d331002c4d662fe Signed-off-by: Kuat Yessenov <kuat@google.com>
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
|
Added changelog and an API to enable this feature. Tested with a modified Istio Proxy that puts a random number into the Istio mesh metrics. Run it for 10 minutes with 64 cores and a load of 300qps, allocated memory use during the load is ~70MB, after the load drops down to ~15MB. |
|
@wbpcode are you good with merging this? |
|
I just wanted to say that I looked over this and it looks really well done, thanks @kyessenov ! I'm just wondering about expected usage model and avoiding holding stats by reference. |
The filter implementation would immediately apply a counter operation, e.g. https://github.com/istio/proxy/blob/master/source/extensions/filters/http/istio_stats/istio_stats.cc#L752. We can add a macro or special convenience functions, but we can't hide existing interfaces unfortunately. |
|
Looks like this landed in v1.36.0 |
Change-Id: If3a45283b13cfda7d4f9a7bb661a1573f552ed7e
Commit Message: Introduce mark and sweep eviction of stale metrics in a stats scope.
Additional Description: The intended use case is the high cardinality metrics generated from the request data (e.g. Istio standard metrics). This in combination with the cardinality bounds (future PR) would ensure bounded metric resource usage. The algorithm works as follows:
There are several edge conditions that need to be explained to validate correctness of this algorithm:
A worker attempting to use a stale metric after (3) but before (4) might have its data lost. It will not be lost if 1) the same metric is recreated in the central cache by another worker since all metrics are uniquely indexed in the allocators; or 2) we implement deferred allocator deletions to await for the flush operation.
A worker should not use a stored stale metric after (4). This requires that workers to not store the metrics by reference (hence, this solution will not work for most xDS metrics). Thread local cache references are always deleted before the storage is deleted.
Histograms are handled slightly different because the parent histogram needs to be "merged" to observe usage, and clearing the usage requires updating all "children" histograms. Because we do this during flush, merging is always done first.
A metric that is re-created after eviction would continue having its start time set as the original metric. This is a limitation of Envoy since it does not store the metric start times, but it is not an issue with delta aggregation in OTLP. Delta is the recommended protocol for handling high cardinality or sparse metric data. We could add start_time in a follow-up.
Risk Level: low, requires explicit usage
Testing: unit and a load test with Istio Proxy
Docs Changes: none
Release Notes: none