Conversation
readouts) sink by: 1. Iterating over stats in the store to create a snapshot (instead of creating a vector by iterating over scopes. 2. Adding an API to filter stats for sinking. Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
vector of them. Signed-off-by: Pradeep Rao <pcrao@google.com>
| } | ||
| BENCHMARK(bmLarge)->Unit(::benchmark::kMillisecond); | ||
|
|
||
| static void bmSmall(benchmark::State& state) { |
There was a problem hiding this comment.
Most tests of this sort are called "BM_xxx" which is some benchmarking convention, but violates Envoy style guide, requiring clang-tidy "nolint" comments. What you have might be better. We'll see if anything else has a problem with this.
4e95834 to
520bec1
Compare
520bec1 to
94bdf2c
Compare
Signed-off-by: Pradeep Rao <pcrao@google.com>
94bdf2c to
d556414
Compare
Return early in benchmark for unit tests to prevent timeout. Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
This basically looks right to me, but I think @mattklein123 needs to weigh in. It so happens that on another PR discussion we were thinking we might want to use info we propose to store in the Scope during the sink process for Prometheus.
mattklein123
left a comment
There was a problem hiding this comment.
This LGTM modulo the remaining @jmarantz comments. I think if we decide to store something in the scope we can probably figure out a way to do that as a future change? (Since right now we don't provide scope information to the sinks as it is.)
instead of the store. Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
|
/wait |
Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
looks great modulo minor nits!
| for (auto& counter : deleted_counters_) { | ||
| // Assert that there were no duplicates. | ||
| ASSERT(counters_.count(counter.get()) == 0); | ||
| counters_.insert(counter.get()); |
There was a problem hiding this comment.
You can do this in one statement with ASSERT(counters.insert(counter.get()).second), or in two with
auto insertion = counters.insert(counter.get());
ASSERT(insertion.second);
The latter feels more like it more closely mirrors the pattern in removeFromSetLockHeld, and is just as fast (only does one map operation).
| return; | ||
| } | ||
| ASSERT(!hasStat(deleted_counters_, *iter)); | ||
| // Duplicates are checked in ~AllocatorImpl. |
There was a problem hiding this comment.
s/checked/ASSERTed/ to make it clearer that doesn't happen in production.
|
You still have to change expectations in a the scope deleter test, right? |
Are you referring to thread_local_store_test.cc ScopeDelete? I fixed that. |
jmarantz
left a comment
There was a problem hiding this comment.
something went wrong in CI (check linux_x64 gcc) and I see this failure in your logs:
2021-09-02T00:45:00.5279225Z [ RUN ] AllocatorImplTest.ForEachCounter
2021-09-02T00:45:00.5279626Z pure virtual method called
2021-09-02T00:45:00.5280002Z terminate called without an active exception
Not sure why this happens in that compiler variant. I'll let you sort it out :)
…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
…tats_no_vector Signed-off-by: Pradeep Rao <pcrao@google.com>
Fixed, and added comment wrt crash. |
Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
Great job! Silently makes everything faster.
@mattklein123 do you want to look any further? Or just merge?
Let me take a pass through tomorrow if that is OK? |
|
Tomorrow would be great! |
| ASSERT(counter.get() == *iter); | ||
| // Duplicates are ASSERTed in ~AllocatorImpl. | ||
| deleted_counters_.emplace_back(*iter); | ||
| counters_.erase(iter); |
There was a problem hiding this comment.
I thought of a concern with this tactic for this class: if someone allocates a counter with the same name as one that was marked for deletion, we'll wind with two different counter objects with the same name. That might be OK but we might want to add a test.
It won't actually happen with ThreadLocalStore because it only calls markCounterForDeletion on stats that are rejected by the matcher, and the matcher can't be changed once it's added. The reason that this is needed is that the matcher can be added after some stats are created, and we need to effectively get rid of the now-rejected stats without causing references to freed memory.
Possible remedies I thought of include:
- add a test and accept that as a weird use-case for this class, that won't be hit by ThreadLocalStore.
- prevent this by checking against deleted_counters_ (which could be a set) during allocation
- declare this invalid by asserting against deleted_counters_ for debug builds
- use a bit in
flags_to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.
There was a problem hiding this comment.
use a bit in flags_ to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.
This approach sounds good to me, fwiw.
Stepping back though, I'm trying to understand why this logic was moved into the allocator versus where it used to be in thread local store? Can you help me understand that?
mattklein123
left a comment
There was a problem hiding this comment.
LGTM with one typo and one question, thanks.
/wait
| ASSERT(counter.get() == *iter); | ||
| // Duplicates are ASSERTed in ~AllocatorImpl. | ||
| deleted_counters_.emplace_back(*iter); | ||
| counters_.erase(iter); |
There was a problem hiding this comment.
use a bit in flags_ to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.
This approach sounds good to me, fwiw.
Stepping back though, I'm trying to understand why this logic was moved into the allocator versus where it used to be in thread local store? Can you help me understand that?
|
Hi Matt, With this change we iterate over stats in the Allocator instead of scopes as this is much faster. However, the Allocator did not know anything about rejected stats. So, we ended up including rejected stats when iterating over all stats. To avoid iterating over them, I removed these stats from the StatSets in the Allocator. There is logic that looks for the stat in the Allocator StatSet when it's refcount goes to zero. To manage this cleanly, I moved the deleted stats from the Store to the Allocator. Did that answer your question? With regards to the issue that Josh mentioned, we decided that his first suggestion might be the best option, given that adding the deleted bit adds a lot of complexity and is only needed for an edge case that currently can't be hit in Envoy. |
Yup, thanks.
Sure, sounds good. |
Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
jmarantz
left a comment
There was a problem hiding this comment.
looks great; just some minor comments.
|
|
||
| /** | ||
| * Mark rejected stats as deleted by moving them to a different vector, so they don't show up | ||
| * when iterating over stats, but prevent crashes when trying to access references to them. |
There was a problem hiding this comment.
Note here the surprising behavior that if you allocate a stat after having done this, you'll get a new one, and that callers should seek to avoid this situation (as ThreadLocalStore does).
| textReadout.set("fortytwo"); | ||
|
|
||
| // Ask for the rejected stats again by name. | ||
| Counter& counter2 = store_->counterFromString("c1"); |
There was a problem hiding this comment.
actually you can ASSERT_EQ(&counter1, &counter2)
| EXPECT_EQ(num_iterations, num_stats); | ||
| } | ||
|
|
||
| // Currently, if we ask for a stat from the Allocator that has already been |
Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
Commit Message: Improve cpu and memory usage of the sink for counters, gauges and text readouts by: Iterating over stats in the store to create a snapshot (instead of creating a vector by iterating over scopes.) Additional Description: Risk Level: Low Testing: Added benchmark test for stats sink. Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A Benchmark test results of server_stats_flush_benchmark_test: With change: name cpu/op bmFlushToSinks/10 778ns ± 1% bmFlushToSinks/100 2.83µs ± 3% bmFlushToSinks/1000 39.5µs ± 1% bmFlushToSinks/10000 409µs ± 4% bmFlushToSinks/100000 6.07ms ±19% bmFlushToSinks/1000000 100ms ± 4% Without change: name cpu/op bmFlushToSinks/10 4.44µs ± 4% bmFlushToSinks/100 31.4µs ± 2% bmFlushToSinks/1000 376µs ± 2% bmFlushToSinks/10000 5.40ms ± 7% bmFlushToSinks/100000 90.1ms ± 4% bmFlushToSinks/1000000 1.59s ± 4% Signed-off-by: Pradeep Rao <pcrao@google.com> Signed-off-by: pradeepcrao <84025829+pradeepcrao@users.noreply.github.com>
Improve cpu and memory usage of the sink for counters, gauges and text readouts by:
Iterating over stats in the store to create a snapshot (instead of
creating a vector by iterating over scopes.)
Commit Message:
Additional Description:
Risk Level: Low
Testing: Added benchmark test for stats sink.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A
Benchmark test results of server_stats_flush_benchmark_test:
With change:
name cpu/op
bmFlushToSinks/10 778ns ± 1%
bmFlushToSinks/100 2.83µs ± 3%
bmFlushToSinks/1000 39.5µs ± 1%
bmFlushToSinks/10000 409µs ± 4%
bmFlushToSinks/100000 6.07ms ±19%
bmFlushToSinks/1000000 100ms ± 4%
Without change:
name cpu/op
bmFlushToSinks/10 4.44µs ± 4%
bmFlushToSinks/100 31.4µs ± 2%
bmFlushToSinks/1000 376µs ± 2%
bmFlushToSinks/10000 5.40ms ± 7%
bmFlushToSinks/100000 90.1ms ± 4%
bmFlushToSinks/1000000 1.59s ± 4%