stats: metric expiry by kyessenov · Pull Request #40395 · envoyproxy/envoy

kyessenov · 2025-07-23T23:43:58Z

Change-Id: If3a45283b13cfda7d4f9a7bb661a1573f552ed7e
Commit Message: Introduce mark and sweep eviction of stale metrics in a stats scope.

Additional Description: The intended use case is the high cardinality metrics generated from the request data (e.g. Istio standard metrics). This in combination with the cardinality bounds (future PR) would ensure bounded metric resource usage. The algorithm works as follows:

An "evictable" scope is allocated by a filter.
A delta stats sink is configured, e.g. OTLP.
At every flush interval, a scope metric that is used (e.g. has observed a data point) is marked as unused. A metric that has not been used is deleted from the central caches.
A notification is sent to all workers to purge scope stale metrics from their thread-local caches.
Once all workers complete, the unused metrics are purged from the allocator.

There are several edge conditions that need to be explained to validate correctness of this algorithm:

A worker attempting to use a stale metric after (3) but before (4) might have its data lost. It will not be lost if 1) the same metric is recreated in the central cache by another worker since all metrics are uniquely indexed in the allocators; or 2) we implement deferred allocator deletions to await for the flush operation.
A worker should not use a stored stale metric after (4). This requires that workers to not store the metrics by reference (hence, this solution will not work for most xDS metrics). Thread local cache references are always deleted before the storage is deleted.
Histograms are handled slightly different because the parent histogram needs to be "merged" to observe usage, and clearing the usage requires updating all "children" histograms. Because we do this during flush, merging is always done first.
A metric that is re-created after eviction would continue having its start time set as the original metric. This is a limitation of Envoy since it does not store the metric start times, but it is not an issue with delta aggregation in OTLP. Delta is the recommended protocol for handling high cardinality or sparse metric data. We could add start_time in a follow-up.

Risk Level: low, requires explicit usage
Testing: unit and a load test with Istio Proxy
Docs Changes: none
Release Notes: none

Change-Id: If3a45283b13cfda7d4f9a7bb661a1573f552ed7e Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov · 2025-07-24T04:27:47Z

Not sure who is the best person to review stats these days.
/assign-from @envoyproxy/senior-maintainers

repokitteh-read-only · 2025-07-24T04:27:52Z

@envoyproxy/senior-maintainers assignee is @mattklein123

🐱

Caused by: a #40395 (comment) was created by @kyessenov.

see: more, trace.

Change-Id: I6748662507d4b540076381379a26f53a924cb815 Signed-off-by: Kuat Yessenov <kuat@google.com>

wbpcode · 2025-07-24T06:34:47Z

I have a nit question. The mechanism that istio stats used should works perfectly, why this new PR is necessary?

/wait-any

kyessenov · 2025-07-24T16:14:50Z

@wbpcode Several issues with Istio scope rotation:

It uses an unsafe pointer when switching scopes, there is some inherent race that never occurs.
It busts the fast data path thread local cache on a timer. Re-creation of these fast path stats causes more CPU spike and use than necessary.
It is only effective when combined with the cardinality limiter, which we would have to add directly to Scope (an overflow metric). It makes sense to do both expiry and limiter in the same place.

I plan to switch Istio to per-metric expiry with a single shared bounded scope rather than a per-scope expiry. That will give better controls over cardinality. If each filter holds a separate scope, we can't properly bound overall metric size.

Change-Id: Ib8d17e94db3a92c211a00e506cf6ef2bf9066c5b

wbpcode · 2025-07-25T10:08:22Z

@wbpcode Several issues with Istio scope rotation:

It uses an unsafe pointer when switching scopes, there is some inherent race that never occurs.

It busts the fast data path thread local cache on a timer. Re-creation of these fast path stats causes more CPU spike and use than necessary.

It is only effective when combined with the cardinality limiter, which we would have to add directly to Scope (an overflow metric). It makes sense to do both expiry and limiter in the same place.

I plan to switch Istio to per-metric expiry with a single shared bounded scope rather than a per-scope expiry. That will give better controls over cardinality. If each filter holds a separate scope, we can't properly bound overall metric size.

hmmm, it's reasonable. Although I initially a little against to add this per metric lifetime management because I think it's complex and the benefit is not that big. But after check your code, I think seems it's fine. Will take a deep look at the code after a while.

cc @jmarantz cc @paul-r-gall any suggestion as the stats expert?

jmarantz · 2025-07-25T12:46:20Z

I haven't looked at the pr yet but from the description I just want to double check there is no way to implement the semantics you want without adding complexity to the base stats system. For example, you could create a ScopeSharedPtr with an empty name and use that to control lifetime of all stats created in it, using whatever timers you want.

…

On Fri, Jul 25, 2025, 6:08 AM code ***@***.***> wrote: *wbpcode* left a comment (envoyproxy/envoy#40395) <#40395 (comment)> @wbpcode <https://github.com/wbpcode> Several issues with Istio scope rotation: 1. It uses an unsafe pointer when switching scopes, there is some inherent race that never occurs. 2. It busts the fast data path thread local cache on a timer. Re-creation of these fast path stats causes more CPU spike and use than necessary. 3. It is only effective when combined with the cardinality limiter, which we would have to add directly to Scope (an overflow metric). It makes sense to do both expiry and limiter in the same place. I plan to switch Istio to per-metric expiry with a single shared bounded scope rather than a per-scope expiry. That will give better controls over cardinality. If each filter holds a separate scope, we can't properly bound overall metric size. hmmm, it's reasonable. Although I initially a little against to add this per metric lifetime management because I think the benefit is not hat large. But after check your code, I think seems it's fine. Will take a deep look at the code after a while. cc @jmarantz <https://github.com/jmarantz> cc @paul-r-gall <https://github.com/paul-r-gall> any suggestion as the stats expert? — Reply to this email directly, view it on GitHub <#40395 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAO2IPLKLJGBGNMS7VOCCV33KH62XAVCNFSM6AAAAACCHIBKP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMJXGE3TOMBVGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kyessenov · 2025-07-25T16:55:48Z

@jmarantz I do want to use scope to control the lifetime of metrics in it. But we also don't want to lose the benefit of thread local cached stats. This is meant to be used in a shared proxy (backends are applications and labels are derived from them, applications change through the life of the proxy). In a steady state, we don't want a forced penalty caused by the whole scope deletion, so recently used stats should remain in caches. I don't see another way to do it except extending the core system. We could utilize this approach in many places, e.g. Wasm (#14070), or Istio-like metrics (#30619).

From my reading of opencensus, they also track usage per stat and expire them. It seems necessary for any dynamic plugins which do not have a predefined set of metrics.

Change-Id: I9c50b09b748e3164f3520bbb0d506b4f0b4916e2 Signed-off-by: Kuat Yessenov <kuat@google.com>

Change-Id: I30ecee110600b394995aadecd4548107d283739e

mattklein123 · 2025-07-29T02:55:38Z

I don't see any reason why this won't work, but FWIW it seems pretty fragile to me in terms of how it relates to flushing. Why not just handle this logic during flushing? You can see if a metric has been used between the last flush and the current flush and then just get rid of it at that point? I've implemented something similar here: https://github.com/bitdriftlabs/shared-core/blob/main/bd-client-stats-store/src/lib.rs

mattklein123 · 2025-07-29T02:57:34Z

See also https://github.com/bitdriftlabs/shared-core/blob/main/bd-client-stats/src/stats.rs#L373-L386

kyessenov · 2025-07-29T18:25:27Z

Thanks, that's a great suggestion to combine flush and eviction. I realized there is a bug in this PR because histogram deltas are cleared during merge, so we cannot merge outside of flush, and we need merge to detect usage for eviction. I think the challenge is making eviction apply only to some counters and not others (because envoy core keeps them as references and not shared ptrs unfortunately). And also the default flush is too fast (5s) which will churn, but we could use a multiple of flush intervals with no problem. I'll work on the next iteration.

…

On Mon, Jul 28, 2025 at 7:57 PM Matt Klein ***@***.***> wrote: *mattklein123* left a comment (envoyproxy/envoy#40395) <#40395 (comment)> See also https://github.com/bitdriftlabs/shared-core/blob/main/bd-client-stats/src/stats.rs#L373-L386 — Reply to this email directly, view it on GitHub <#40395 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACIYRRU4KFYSVRI3LNZDTZD3K3PLJAVCNFSM6AAAAACCHIBKP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMZQGQZTKOBZGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jmarantz · 2025-07-29T20:44:28Z

That change suggested by @mattklein123 seems to clarify things -- I was trying to (without reading the code) reason about a state where we've decided we might want to evict a stat, but want to leave it in the thread-local caches so it could be revived.

This also feels a bit like what @stevenzzzz did for lazy stats (#23921 and #27899 are a good starting point). He may want to look at this stuff.

I still have questions about desired semantics. Do you want some stats that have not been recently updated to be removed at some point? I assume all stats are looked via sinks or the prom endpoint; in other words nothing will remain unobserved.

kyessenov · 2025-07-29T20:53:37Z

I still have questions about desired semantics. Do you want some stats that have not been recently updated to be removed at some point? I assume all stats are looked via sinks or the prom endpoint; in other words nothing will remain unobserved.

The desired semantics is to remove stats that have not been used in the last O(minutes) from memory. We are not going to use prometheus since it does not support delta observations, but in OTLP you can send only the difference from the last flush, so we don't need to keep aggregate counters indefinitely.

kyessenov · 2025-07-29T20:55:13Z

This also feels a bit like what @stevenzzzz did for lazy stats (#23921 and #27899 are a good starting point). He may want to look at this stuff.

I think lazy creation of stats is somewhat orthogonal. We would always create stats on-demand based on traffic.

jmarantz · 2025-07-29T20:58:13Z

When you say a stat has not been used you mean modified? Or in the OTLP flow is there also a way of registering observer interest in a subset of stats?

kyessenov · 2025-07-29T21:04:33Z

When you say a stat has not been used you mean modified? Or in the OTLP flow is there also a way of registering observer interest in a subset of stats?

For counters, no increment happened, and for histograms, no samples produced. I'm working with the managed OTLP implementation, where Google backends will convert back to cumulative monarch time series (fake deltas). The key is that the state is aggregated outside the proxy.

mattklein123 · 2025-07-29T21:18:20Z

@kyessenov one other thing that you might consider is creating some entirely new dynamic_stats thing that somehow uses shared_ptr internally so then it would look pretty much exactly like the Rust code I have. Not sure how hard that would be to bolt on top but might be better in the long run. I'm sure there are many places that might use something like this.

Change-Id: Iccf33fd3a693118560a7adb359dbc340fff69d06

Change-Id: Iacf3f56f29baa7e09fe443001066ce27114f812b

Change-Id: I8ee694725cd151baf80b6760cf687117c502546a Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov · 2025-08-02T00:37:33Z

@kyessenov one other thing that you might consider is creating some entirely new dynamic_stats thing that somehow uses shared_ptr internally so then it would look pretty much exactly like the Rust code I have. Not sure how hard that would be to bolt on top but might be better in the long run. I'm sure there are many places that might use something like this.

It's fairly difficult because of the pervasive use of stats macros. I think we can gradually transition scope-by-scope though. Updated the code to evict during flush.

Change-Id: Ia79d583992e9760b0ae14e691019249c27c7528c

mattklein123

lgtm

/wait

mattklein123 · 2025-08-05T02:25:38Z

  void sub(uint64_t amount) override {
    ASSERT(child_value_ >= amount);
-    ASSERT(used() || amount == 0);
+    // ASSERT(used() || amount == 0);


Reverted. I changed it so that only evictable scope metrics are marked as unused.

Change-Id: Iec9a7db939baed48cfcc52119d331002c4d662fe Signed-off-by: Kuat Yessenov <kuat@google.com>

repokitteh-read-only · 2025-08-05T21:31:46Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #40395 was synchronize by kyessenov.

see: more, trace.

Change-Id: I11d3b854d1197d90482c7f8ef81768555edcd337 Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov · 2025-08-05T21:43:40Z

Added changelog and an API to enable this feature. Tested with a modified Istio Proxy that puts a random number into the Istio mesh metrics. Run it for 10 minutes with 64 cores and a load of 300qps, allocated memory use during the load is ~70MB, after the load drops down to ~15MB.

Change-Id: Ie035c25d3900eab69b4317649e28f12c32189daa Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov · 2025-08-12T15:47:31Z

@wbpcode are you good with merging this?

jmarantz · 2025-08-12T16:07:51Z

I just wanted to say that I looked over this and it looks really well done, thanks @kyessenov !

I'm just wondering about expected usage model and avoiding holding stats by reference.

kyessenov · 2025-08-12T17:09:49Z

I just wanted to say that I looked over this and it looks really well done, thanks @kyessenov !

I'm just wondering about expected usage model and avoiding holding stats by reference.

The filter implementation would immediately apply a counter operation, e.g. https://github.com/istio/proxy/blob/master/source/extensions/filters/http/istio_stats/istio_stats.cc#L752. We can add a macro or special convenience functions, but we can't hide existing interfaces unfortunately.

ringerc · 2025-12-03T22:25:53Z

Looks like this landed in v1.36.0

metric expiry

d4a9bf2

Change-Id: If3a45283b13cfda7d4f9a7bb661a1573f552ed7e Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov requested a review from paul-r-gall July 23, 2025 23:46

repokitteh-read-only Bot assigned mattklein123 Jul 24, 2025

add histogram

c51347e

Change-Id: I6748662507d4b540076381379a26f53a924cb815 Signed-off-by: Kuat Yessenov <kuat@google.com>

repokitteh-read-only Bot added the waiting:any label Jul 24, 2025

repokitteh-read-only Bot removed the waiting:any label Jul 24, 2025

Merge remote-tracking branch 'upstream/main' into stats_expiry

b9794b5

Change-Id: Ib8d17e94db3a92c211a00e506cf6ef2bf9066c5b

wbpcode self-assigned this Jul 25, 2025

kyessenov added 2 commits July 25, 2025 17:02

format

2d138eb

Change-Id: I9c50b09b748e3164f3520bbb0d506b4f0b4916e2 Signed-off-by: Kuat Yessenov <kuat@google.com>

Merge remote-tracking branch 'upstream/main' into stats_expiry

a05956a

Change-Id: I30ecee110600b394995aadecd4548107d283739e

mattklein123 added the waiting label Jul 31, 2025

kyessenov added 3 commits July 31, 2025 21:18

Merge remote-tracking branch 'upstream/main' into stats_expiry

59fb505

Change-Id: Iccf33fd3a693118560a7adb359dbc340fff69d06

Merge remote-tracking branch 'upstream/main' into stats_expiry

dfc2a77

Change-Id: Iacf3f56f29baa7e09fe443001066ce27114f812b

update

8f5bd19

Change-Id: I8ee694725cd151baf80b6760cf687117c502546a Signed-off-by: Kuat Yessenov <kuat@google.com>

repokitteh-read-only Bot removed the waiting label Aug 2, 2025

Merge remote-tracking branch 'upstream/main' into stats_expiry

fdc21bf

Change-Id: Ia79d583992e9760b0ae14e691019249c27c7528c

mattklein123 reviewed Aug 5, 2025

View reviewed changes

repokitteh-read-only Bot added the waiting label Aug 5, 2025

finish api and docs

961e8f5

Change-Id: Iec9a7db939baed48cfcc52119d331002c4d662fe Signed-off-by: Kuat Yessenov <kuat@google.com>

repokitteh-read-only Bot added api and removed waiting labels Aug 5, 2025

merge

5c71a63

Change-Id: I11d3b854d1197d90482c7f8ef81768555edcd337 Signed-off-by: Kuat Yessenov <kuat@google.com>

format

6726d83

Change-Id: Ie035c25d3900eab69b4317649e28f12c32189daa Signed-off-by: Kuat Yessenov <kuat@google.com>

mattklein123 approved these changes Aug 11, 2025

View reviewed changes

repokitteh-read-only Bot removed the api label Aug 11, 2025

wbpcode merged commit 45c3745 into envoyproxy:main Aug 15, 2025
26 checks passed

kyessenov mentioned this pull request Sep 3, 2025

stats: utilize expiration istio/proxy#6529

Merged

zirain mentioned this pull request Sep 22, 2025

telemetry: add bootstrap annotations for stats flush and eviction istio/istio#57736

Merged

ringerc mentioned this pull request Nov 28, 2025

Drop metrics form the Istio using Envoy filter istio/istio#54574

Closed

17 tasks

ringerc mentioned this pull request Dec 4, 2025

annotation sidecar.istio.io/statsEvictionInterval broken for values >60s istio/istio#58500

Closed

2 tasks

TAOXUY mentioned this pull request Feb 5, 2026

Support Gauge in Stats AccessLogger #42226

Merged

Conversation

kyessenov commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kyessenov commented Jul 24, 2025

Uh oh!

repokitteh-read-only Bot commented Jul 24, 2025

Uh oh!

wbpcode commented Jul 24, 2025

Uh oh!

kyessenov commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbpcode commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmarantz commented Jul 25, 2025 via email

Uh oh!

kyessenov commented Jul 25, 2025

Uh oh!

mattklein123 commented Jul 29, 2025

Uh oh!

mattklein123 commented Jul 29, 2025

Uh oh!

kyessenov commented Jul 29, 2025 via email

Uh oh!

jmarantz commented Jul 29, 2025

Uh oh!

kyessenov commented Jul 29, 2025

Uh oh!

kyessenov commented Jul 29, 2025

Uh oh!

jmarantz commented Jul 29, 2025

Uh oh!

kyessenov commented Jul 29, 2025

Uh oh!

mattklein123 commented Jul 29, 2025

Uh oh!

kyessenov commented Aug 2, 2025

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

mattklein123 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

kyessenov Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

repokitteh-read-only Bot commented Aug 5, 2025

Uh oh!

kyessenov commented Aug 5, 2025

Uh oh!

kyessenov commented Aug 12, 2025

Uh oh!

jmarantz commented Aug 12, 2025

Uh oh!

kyessenov commented Aug 12, 2025

Uh oh!

Uh oh!

ringerc commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kyessenov commented Jul 23, 2025 •

edited

Loading

kyessenov commented Jul 24, 2025 •

edited

Loading

wbpcode commented Jul 25, 2025 •

edited

Loading