Skip to content

Stats no vector#17909

Merged
jmarantz merged 35 commits intoenvoyproxy:mainfrom
pradeepcrao:stats_no_vector
Sep 4, 2021
Merged

Stats no vector#17909
jmarantz merged 35 commits intoenvoyproxy:mainfrom
pradeepcrao:stats_no_vector

Conversation

@pradeepcrao
Copy link
Copy Markdown
Contributor

@pradeepcrao pradeepcrao commented Aug 30, 2021

Improve cpu and memory usage of the sink for counters, gauges and text readouts by:

Iterating over stats in the store to create a snapshot (instead of
creating a vector by iterating over scopes.)

Commit Message:
Additional Description:
Risk Level: Low
Testing: Added benchmark test for stats sink.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

Benchmark test results of server_stats_flush_benchmark_test:

With change:
name cpu/op
bmFlushToSinks/10 778ns ± 1%
bmFlushToSinks/100 2.83µs ± 3%
bmFlushToSinks/1000 39.5µs ± 1%
bmFlushToSinks/10000 409µs ± 4%
bmFlushToSinks/100000 6.07ms ±19%
bmFlushToSinks/1000000 100ms ± 4%

Without change:
name cpu/op
bmFlushToSinks/10 4.44µs ± 4%
bmFlushToSinks/100 31.4µs ± 2%
bmFlushToSinks/1000 376µs ± 2%
bmFlushToSinks/10000 5.40ms ± 7%
bmFlushToSinks/100000 90.1ms ± 4%
bmFlushToSinks/1000000 1.59s ± 4%

readouts) sink by:
1. Iterating over stats in the store to create a snapshot (instead of
creating a vector by iterating  over scopes.
2. Adding an API to filter stats for sinking.

Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
vector of them.

Signed-off-by: Pradeep Rao <pcrao@google.com>
}
BENCHMARK(bmLarge)->Unit(::benchmark::kMillisecond);

static void bmSmall(benchmark::State& state) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most tests of this sort are called "BM_xxx" which is some benchmarking convention, but violates Envoy style guide, requiring clang-tidy "nolint" comments. What you have might be better. We'll see if anything else has a problem with this.

Signed-off-by: Pradeep Rao <pcrao@google.com>
Return early in benchmark for unit tests to prevent timeout.

Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
…tats_no_vector

Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically looks right to me, but I think @mattklein123 needs to weigh in. It so happens that on another PR discussion we were thinking we might want to use info we propose to store in the Scope during the sink process for Prometheus.

@mattklein123 mattklein123 self-assigned this Aug 31, 2021
mattklein123
mattklein123 previously approved these changes Aug 31, 2021
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM modulo the remaining @jmarantz comments. I think if we decide to store something in the scope we can probably figure out a way to do that as a future change? (Since right now we don't provide scope information to the sinks as it is.)

instead of the store.

Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Sep 1, 2021

/wait

Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great modulo minor nits!

for (auto& counter : deleted_counters_) {
// Assert that there were no duplicates.
ASSERT(counters_.count(counter.get()) == 0);
counters_.insert(counter.get());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this in one statement with ASSERT(counters.insert(counter.get()).second), or in two with

auto insertion = counters.insert(counter.get());
ASSERT(insertion.second);

The latter feels more like it more closely mirrors the pattern in removeFromSetLockHeld, and is just as fast (only does one map operation).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return;
}
ASSERT(!hasStat(deleted_counters_, *iter));
// Duplicates are checked in ~AllocatorImpl.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/checked/ASSERTed/ to make it clearer that doesn't happen in production.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Sep 1, 2021

You still have to change expectations in a the scope deleter test, right?

@pradeepcrao
Copy link
Copy Markdown
Contributor Author

You still have to change expectations in a the scope deleter test, right?

Are you referring to thread_local_store_test.cc ScopeDelete? I fixed that.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something went wrong in CI (check linux_x64 gcc) and I see this failure in your logs:

2021-09-02T00:45:00.5279225Z [ RUN      ] AllocatorImplTest.ForEachCounter
2021-09-02T00:45:00.5279626Z pure virtual method called
2021-09-02T00:45:00.5280002Z terminate called without an active exception

Not sure why this happens in that compiler variant. I'll let you sort it out :)

…tats_no_vector

Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
…tats_no_vector

Signed-off-by: Pradeep Rao <pcrao@google.com>
@pradeepcrao
Copy link
Copy Markdown
Contributor Author

something went wrong in CI (check linux_x64 gcc) and I see this failure in your logs:

2021-09-02T00:45:00.5279225Z [ RUN      ] AllocatorImplTest.ForEachCounter
2021-09-02T00:45:00.5279626Z pure virtual method called
2021-09-02T00:45:00.5280002Z terminate called without an active exception

Not sure why this happens in that compiler variant. I'll let you sort it out :)

Fixed, and added comment wrt crash.

Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
jmarantz
jmarantz previously approved these changes Sep 2, 2021
Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Silently makes everything faster.

@mattklein123 do you want to look any further? Or just merge?

@mattklein123
Copy link
Copy Markdown
Member

@mattklein123 do you want to look any further? Or just merge?

Let me take a pass through tomorrow if that is OK?

@jmarantz
Copy link
Copy Markdown
Contributor

jmarantz commented Sep 3, 2021

Tomorrow would be great!

ASSERT(counter.get() == *iter);
// Duplicates are ASSERTed in ~AllocatorImpl.
deleted_counters_.emplace_back(*iter);
counters_.erase(iter);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of a concern with this tactic for this class: if someone allocates a counter with the same name as one that was marked for deletion, we'll wind with two different counter objects with the same name. That might be OK but we might want to add a test.

It won't actually happen with ThreadLocalStore because it only calls markCounterForDeletion on stats that are rejected by the matcher, and the matcher can't be changed once it's added. The reason that this is needed is that the matcher can be added after some stats are created, and we need to effectively get rid of the now-rejected stats without causing references to freed memory.

Possible remedies I thought of include:

  • add a test and accept that as a weird use-case for this class, that won't be hit by ThreadLocalStore.
  • prevent this by checking against deleted_counters_ (which could be a set) during allocation
  • declare this invalid by asserting against deleted_counters_ for debug builds
  • use a bit in flags_ to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a bit in flags_ to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.

This approach sounds good to me, fwiw.

Stepping back though, I'm trying to understand why this logic was moved into the allocator versus where it used to be in thread local store? Can you help me understand that?

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one typo and one question, thanks.

/wait

ASSERT(counter.get() == *iter);
// Duplicates are ASSERTed in ~AllocatorImpl.
deleted_counters_.emplace_back(*iter);
counters_.erase(iter);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a bit in flags_ to denote that a stat needs be considered as marked for deletion, and skip that during forEach. You could report the correct size by keeping track of the number of deleted counters in the class. I think this is the biggest change from what you have, but would mean that if you try to allocate a stat with the same name as one that was deleted, you'll get back the previously deleted stat.

This approach sounds good to me, fwiw.

Stepping back though, I'm trying to understand why this logic was moved into the allocator versus where it used to be in thread local store? Can you help me understand that?

@pradeepcrao
Copy link
Copy Markdown
Contributor Author

Hi Matt,

With this change we iterate over stats in the Allocator instead of scopes as this is much faster. However, the Allocator did not know anything about rejected stats. So, we ended up including rejected stats when iterating over all stats.

To avoid iterating over them, I removed these stats from the StatSets in the Allocator. There is logic that looks for the stat in the Allocator StatSet when it's refcount goes to zero. To manage this cleanly, I moved the deleted stats from the Store to the Allocator.

Did that answer your question?

With regards to the issue that Josh mentioned, we decided that his first suggestion might be the best option, given that adding the deleted bit adds a lot of complexity and is only needed for an edge case that currently can't be hit in Envoy.
Does that sound acceptable to you?

@mattklein123
Copy link
Copy Markdown
Member

Did that answer your question?

Yup, thanks.

Does that sound acceptable to you?

Sure, sounds good.

Copy link
Copy Markdown
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great; just some minor comments.


/**
* Mark rejected stats as deleted by moving them to a different vector, so they don't show up
* when iterating over stats, but prevent crashes when trying to access references to them.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note here the surprising behavior that if you allocate a stat after having done this, you'll get a new one, and that callers should seek to avoid this situation (as ThreadLocalStore does).

textReadout.set("fortytwo");

// Ask for the rejected stats again by name.
Counter& counter2 = store_->counterFromString("c1");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually you can ASSERT_EQ(&counter1, &counter2)

EXPECT_EQ(num_iterations, num_stats);
}

// Currently, if we ask for a stat from the Allocator that has already been
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space after "the"

Signed-off-by: Pradeep Rao <pcrao@google.com>
Signed-off-by: Pradeep Rao <pcrao@google.com>
@jmarantz jmarantz merged commit cc1d41e into envoyproxy:main Sep 4, 2021
tyxia pushed a commit to tyxia/envoy that referenced this pull request Sep 21, 2021
Commit Message: Improve cpu and memory usage of the sink for counters, gauges and text readouts by:

Iterating over stats in the store to create a snapshot (instead of
creating a vector by iterating over scopes.)
Additional Description:
Risk Level: Low
Testing: Added benchmark test for stats sink.
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

Benchmark test results of server_stats_flush_benchmark_test:

With change:
name cpu/op
bmFlushToSinks/10 778ns ± 1%
bmFlushToSinks/100 2.83µs ± 3%
bmFlushToSinks/1000 39.5µs ± 1%
bmFlushToSinks/10000 409µs ± 4%
bmFlushToSinks/100000 6.07ms ±19%
bmFlushToSinks/1000000 100ms ± 4%

Without change:
name cpu/op
bmFlushToSinks/10 4.44µs ± 4%
bmFlushToSinks/100 31.4µs ± 2%
bmFlushToSinks/1000 376µs ± 2%
bmFlushToSinks/10000 5.40ms ± 7%
bmFlushToSinks/100000 90.1ms ± 4%
bmFlushToSinks/1000000 1.59s ± 4%

Signed-off-by: Pradeep Rao <pcrao@google.com>

Signed-off-by: pradeepcrao <84025829+pradeepcrao@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants