Stats no vector by pradeepcrao · Pull Request #17909 · envoyproxy/envoy

pradeepcrao · 2021-08-30T16:36:07Z

Improve cpu and memory usage of the sink for counters, gauges and text readouts by:

Iterating over stats in the store to create a snapshot (instead of
creating a vector by iterating over scopes.)

Commit Message:
Additional Description:
Risk Level: Low
Testing: Added benchmark test for stats sink.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

Benchmark test results of server_stats_flush_benchmark_test:

With change:
name cpu/op
bmFlushToSinks/10 778ns ± 1%
bmFlushToSinks/100 2.83µs ± 3%
bmFlushToSinks/1000 39.5µs ± 1%
bmFlushToSinks/10000 409µs ± 4%
bmFlushToSinks/100000 6.07ms ±19%
bmFlushToSinks/1000000 100ms ± 4%

Without change:
name cpu/op
bmFlushToSinks/10 4.44µs ± 4%
bmFlushToSinks/100 31.4µs ± 2%
bmFlushToSinks/1000 376µs ± 2%
bmFlushToSinks/10000 5.40ms ± 7%
bmFlushToSinks/100000 90.1ms ± 4%
bmFlushToSinks/1000000 1.59s ± 4%

readouts) sink by: 1. Iterating over stats in the store to create a snapshot (instead of creating a vector by iterating over scopes. 2. Adding an API to filter stats for sinking. Signed-off-by: Pradeep Rao <pcrao@google.com>

Signed-off-by: Pradeep Rao <pcrao@google.com>

vector of them. Signed-off-by: Pradeep Rao <pcrao@google.com>

envoy/stats/allocator.h

test/server/server_stats_flush_benchmark_test.cc

jmarantz · 2021-08-30T17:08:03Z

test/server/server_stats_flush_benchmark_test.cc

+}
+BENCHMARK(bmLarge)->Unit(::benchmark::kMillisecond);
+
+static void bmSmall(benchmark::State& state) {


Most tests of this sort are called "BM_xxx" which is some benchmarking convention, but violates Envoy style guide, requiring clang-tidy "nolint" comments. What you have might be better. We'll see if anything else has a problem with this.

test/server/server_test.cc

test/server/server_stats_flush_benchmark_test.cc

source/server/server.cc

envoy/stats/allocator.h

test/server/server_stats_flush_benchmark_test.cc

test/server/server_test.cc

Signed-off-by: Pradeep Rao <pcrao@google.com>

Return early in benchmark for unit tests to prevent timeout. Signed-off-by: Pradeep Rao <pcrao@google.com>

Signed-off-by: Pradeep Rao <pcrao@google.com>

…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>

Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz

This basically looks right to me, but I think @mattklein123 needs to weigh in. It so happens that on another PR discussion we were thinking we might want to use info we propose to store in the Scope during the sink process for Prometheus.

test/server/server_stats_flush_benchmark_test.cc

mattklein123

This LGTM modulo the remaining @jmarantz comments. I think if we decide to store something in the scope we can probably figure out a way to do that as a future change? (Since right now we don't provide scope information to the sinks as it is.)

instead of the store. Signed-off-by: Pradeep Rao <pcrao@google.com>

Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz · 2021-09-01T12:29:07Z

/wait

Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz

looks great modulo minor nits!

source/common/stats/allocator_impl.cc

jmarantz · 2021-09-01T13:02:38Z

source/common/stats/allocator_impl.cc

+  for (auto& counter : deleted_counters_) {
+    // Assert that there were no duplicates.
+    ASSERT(counters_.count(counter.get()) == 0);
+    counters_.insert(counter.get());


You can do this in one statement with ASSERT(counters.insert(counter.get()).second), or in two with

auto insertion = counters.insert(counter.get()); ASSERT(insertion.second);

The latter feels more like it more closely mirrors the pattern in removeFromSetLockHeld, and is just as fast (only does one map operation).

jmarantz · 2021-09-01T13:05:02Z

source/common/stats/allocator_impl.cc

    return;
  }
-  ASSERT(!hasStat(deleted_counters_, *iter));
+  // Duplicates are checked in ~AllocatorImpl.


s/checked/ASSERTed/ to make it clearer that doesn't happen in production.

jmarantz · 2021-09-01T13:06:31Z

You still have to change expectations in a the scope deleter test, right?

pradeepcrao · 2021-09-01T13:36:21Z

You still have to change expectations in a the scope deleter test, right?

Are you referring to thread_local_store_test.cc ScopeDelete? I fixed that.

jmarantz

something went wrong in CI (check linux_x64 gcc) and I see this failure in your logs:

2021-09-02T00:45:00.5279225Z [ RUN      ] AllocatorImplTest.ForEachCounter
2021-09-02T00:45:00.5279626Z pure virtual method called
2021-09-02T00:45:00.5280002Z terminate called without an active exception

Not sure why this happens in that compiler variant. I'll let you sort it out :)

…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>

Signed-off-by: Pradeep Rao <pcrao@google.com>

…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>

pradeepcrao · 2021-09-02T13:54:51Z

something went wrong in CI (check linux_x64 gcc) and I see this failure in your logs:
2021-09-02T00:45:00.5279225Z [ RUN      ] AllocatorImplTest.ForEachCounter
2021-09-02T00:45:00.5279626Z pure virtual method called
2021-09-02T00:45:00.5280002Z terminate called without an active exception
Not sure why this happens in that compiler variant. I'll let you sort it out :)

Fixed, and added comment wrt crash.

source/common/stats/allocator_impl.cc

test/common/stats/allocator_impl_test.cc

Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz

Great job! Silently makes everything faster.

@mattklein123 do you want to look any further? Or just merge?

mattklein123 · 2021-09-02T23:38:08Z

@mattklein123 do you want to look any further? Or just merge?

Let me take a pass through tomorrow if that is OK?

jmarantz · 2021-09-03T00:30:39Z

Tomorrow would be great!

jmarantz · 2021-09-03T12:10:33Z

source/common/stats/allocator_impl.cc

+  ASSERT(counter.get() == *iter);
+  // Duplicates are ASSERTed in ~AllocatorImpl.
+  deleted_counters_.emplace_back(*iter);
+  counters_.erase(iter);


I thought of a concern with this tactic for this class: if someone allocates a counter with the same name as one that was marked for deletion, we'll wind with two different counter objects with the same name. That might be OK but we might want to add a test.

It won't actually happen with ThreadLocalStore because it only calls markCounterForDeletion on stats that are rejected by the matcher, and the matcher can't be changed once it's added. The reason that this is needed is that the matcher can be added after some stats are created, and we need to effectively get rid of the now-rejected stats without causing references to freed memory.

Possible remedies I thought of include:

add a test and accept that as a weird use-case for this class, that won't be hit by ThreadLocalStore.

prevent this by checking against deleted_counters_ (which could be a set) during allocation

declare this invalid by asserting against deleted_counters_ for debug builds

use a bit in flags_ to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.

use a bit in flags_ to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.

This approach sounds good to me, fwiw.

Stepping back though, I'm trying to understand why this logic was moved into the allocator versus where it used to be in thread local store? Can you help me understand that?

mattklein123

LGTM with one typo and one question, thanks.

/wait

envoy/stats/allocator.h

mattklein123 · 2021-09-03T15:28:50Z

source/common/stats/allocator_impl.cc

+  ASSERT(counter.get() == *iter);
+  // Duplicates are ASSERTed in ~AllocatorImpl.
+  deleted_counters_.emplace_back(*iter);
+  counters_.erase(iter);


use a bit in flags_ to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.

This approach sounds good to me, fwiw.

Stepping back though, I'm trying to understand why this logic was moved into the allocator versus where it used to be in thread local store? Can you help me understand that?

pradeepcrao · 2021-09-03T15:47:44Z

Hi Matt,

With this change we iterate over stats in the Allocator instead of scopes as this is much faster. However, the Allocator did not know anything about rejected stats. So, we ended up including rejected stats when iterating over all stats.

To avoid iterating over them, I removed these stats from the StatSets in the Allocator. There is logic that looks for the stat in the Allocator StatSet when it's refcount goes to zero. To manage this cleanly, I moved the deleted stats from the Store to the Allocator.

Did that answer your question?

With regards to the issue that Josh mentioned, we decided that his first suggestion might be the best option, given that adding the deleted bit adds a lot of complexity and is only needed for an edge case that currently can't be hit in Envoy.
Does that sound acceptable to you?

mattklein123 · 2021-09-03T15:58:20Z

Did that answer your question?

Yup, thanks.

Does that sound acceptable to you?

Sure, sounds good.

Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz

looks great; just some minor comments.

jmarantz · 2021-09-03T18:14:19Z

envoy/stats/allocator.h


+  /**
+   * Mark rejected stats as deleted by moving them to a different vector, so they don't show up
+   * when iterating over stats, but prevent crashes when trying to access references to them.


Note here the surprising behavior that if you allocate a stat after having done this, you'll get a new one, and that callers should seek to avoid this situation (as ThreadLocalStore does).

jmarantz · 2021-09-03T18:16:26Z

test/common/stats/thread_local_store_test.cc

+  textReadout.set("fortytwo");
+
+  // Ask for the rejected stats again by name.
+  Counter& counter2 = store_->counterFromString("c1");


actually you can ASSERT_EQ(&counter1, &counter2)

jmarantz · 2021-09-03T18:17:28Z

test/common/stats/allocator_impl_test.cc

+  EXPECT_EQ(num_iterations, num_stats);
+}
+
+// Currently, if we ask for a stat from the  Allocator that has already been


extra space after "the"

Signed-off-by: Pradeep Rao <pcrao@google.com>

Commit Message: Improve cpu and memory usage of the sink for counters, gauges and text readouts by: Iterating over stats in the store to create a snapshot (instead of creating a vector by iterating over scopes.) Additional Description: Risk Level: Low Testing: Added benchmark test for stats sink. Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A Benchmark test results of server_stats_flush_benchmark_test: With change: name cpu/op bmFlushToSinks/10 778ns ± 1% bmFlushToSinks/100 2.83µs ± 3% bmFlushToSinks/1000 39.5µs ± 1% bmFlushToSinks/10000 409µs ± 4% bmFlushToSinks/100000 6.07ms ±19% bmFlushToSinks/1000000 100ms ± 4% Without change: name cpu/op bmFlushToSinks/10 4.44µs ± 4% bmFlushToSinks/100 31.4µs ± 2% bmFlushToSinks/1000 376µs ± 2% bmFlushToSinks/10000 5.40ms ± 7% bmFlushToSinks/100000 90.1ms ± 4% bmFlushToSinks/1000000 1.59s ± 4% Signed-off-by: Pradeep Rao <pcrao@google.com> Signed-off-by: pradeepcrao <84025829+pradeepcrao@users.noreply.github.com>

pradeepcrao added 3 commits August 30, 2021 16:28

fix formatting.

9154c94

Signed-off-by: Pradeep Rao <pcrao@google.com>

Iterate over stats to be sinked instead of creating and returning a

8325266

vector of them. Signed-off-by: Pradeep Rao <pcrao@google.com>

junr03 assigned jmarantz Aug 30, 2021

jmarantz reviewed Aug 30, 2021

View reviewed changes

pradeepcrao force-pushed the stats_no_vector branch from 4e95834 to 520bec1 Compare August 30, 2021 19:45

jmarantz reviewed Aug 30, 2021

View reviewed changes

envoy/stats/allocator.h Outdated Show resolved Hide resolved

test/server/server_stats_flush_benchmark_test.cc Outdated Show resolved Hide resolved

test/server/server_stats_flush_benchmark_test.cc Outdated Show resolved Hide resolved

test/server/server_test.cc Outdated Show resolved Hide resolved

pradeepcrao force-pushed the stats_no_vector branch from 520bec1 to 94bdf2c Compare August 30, 2021 20:32

Revert added test for phase 2

d556414

Signed-off-by: Pradeep Rao <pcrao@google.com>

pradeepcrao force-pushed the stats_no_vector branch from 94bdf2c to d556414 Compare August 30, 2021 20:52

jmarantz mentioned this pull request Aug 31, 2021

stats: introduce CustomStatNamespaces. #17357

Merged

pradeepcrao added 5 commits August 31, 2021 02:43

Revert added test for phase 2

f91bb29

Return early in benchmark for unit tests to prevent timeout. Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge remote-tracking branch 'upstream/main' into stats_no_vector

db248ad

Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge branch 'stats_no_vector' of github.com:pradeepcrao/envoy into s…

5be8b36

…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>

Add missing header

c847d1c

Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge remote-tracking branch 'upstream/main' into stats_no_vector

6e38658

Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz reviewed Aug 31, 2021

View reviewed changes

test/server/server_stats_flush_benchmark_test.cc Show resolved Hide resolved

test/server/server_stats_flush_benchmark_test.cc Outdated Show resolved Hide resolved

test/server/server_stats_flush_benchmark_test.cc Outdated Show resolved Hide resolved

mattklein123 self-assigned this Aug 31, 2021

mattklein123 previously approved these changes Aug 31, 2021

View reviewed changes

pradeepcrao added 2 commits September 1, 2021 02:11

Keep rejected counters, gauges and text readouts in the allocator,

291a2ff

instead of the store. Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge remote-tracking branch 'upstream/main' into stats_no_vector

b9039ac

Signed-off-by: Pradeep Rao <pcrao@google.com>

pradeepcrao dismissed mattklein123’s stale review via b9039ac September 1, 2021 02:13

repokitteh-read-only bot added the waiting label Sep 1, 2021

pradeepcrao added 2 commits September 1, 2021 12:55

Remove hasStat. Move deleted stats to stats set in ~AllocatorImpl.

e5a8a82

Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge remote-tracking branch 'upstream/main' into stats_no_vector

b2e5d5f

Signed-off-by: Pradeep Rao <pcrao@google.com>

repokitteh-read-only bot removed the waiting label Sep 1, 2021

jmarantz reviewed Sep 1, 2021

View reviewed changes

jmarantz reviewed Sep 2, 2021

View reviewed changes

pradeepcrao added 3 commits September 2, 2021 13:52

Merge branch 'stats_no_vector' of github.com:pradeepcrao/envoy into s…

65b2c78

…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge remote-tracking branch 'upstream/main' into stats_no_vector

f92e178

Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge branch 'stats_no_vector' of github.com:pradeepcrao/envoy into s…

4a5572d

…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz reviewed Sep 2, 2021

View reviewed changes

source/common/stats/allocator_impl.cc Outdated Show resolved Hide resolved

test/common/stats/allocator_impl_test.cc Show resolved Hide resolved

pradeepcrao added 2 commits September 2, 2021 14:56

Add test, fix crash.

7cad621

Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge remote-tracking branch 'upstream/main' into stats_no_vector

b10d6e1

Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz previously approved these changes Sep 2, 2021

View reviewed changes

jmarantz reviewed Sep 3, 2021

View reviewed changes

mattklein123 requested changes Sep 3, 2021

View reviewed changes

repokitteh-read-only bot added the waiting label Sep 3, 2021

pradeepcrao added 2 commits September 3, 2021 18:07

Add test to document behavior for rejected stats on store and allocator.

0f9ea87

Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge remote-tracking branch 'upstream/main' into stats_no_vector

d6116e8

Signed-off-by: Pradeep Rao <pcrao@google.com>

pradeepcrao dismissed jmarantz’s stale review via d6116e8 September 3, 2021 18:08

repokitteh-read-only bot removed the waiting label Sep 3, 2021

jmarantz reviewed Sep 3, 2021

View reviewed changes

pradeepcrao added 2 commits September 3, 2021 19:12

Address comments.

263205e

Signed-off-by: Pradeep Rao <pcrao@google.com>

Merge remote-tracking branch 'upstream/main' into stats_no_vector

801190f

Signed-off-by: Pradeep Rao <pcrao@google.com>

jmarantz approved these changes Sep 3, 2021

View reviewed changes

mattklein123 approved these changes Sep 3, 2021

View reviewed changes

jmarantz merged commit cc1d41e into envoyproxy:main Sep 4, 2021

pradeepcrao mentioned this pull request Mar 18, 2026

stats: remove storm of refcount bumping during metrics snapshot. #43958

Draft

Conversation

pradeepcrao commented Aug 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

jmarantz commented Sep 1, 2021

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmarantz commented Sep 1, 2021

Uh oh!

pradeepcrao commented Sep 1, 2021

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

pradeepcrao commented Sep 2, 2021

Uh oh!

Uh oh!

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

mattklein123 commented Sep 2, 2021

Uh oh!

jmarantz commented Sep 3, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pradeepcrao commented Sep 3, 2021

Uh oh!

mattklein123 commented Sep 3, 2021

Uh oh!

jmarantz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pradeepcrao commented Aug 30, 2021 •

edited

Loading