stats: Factor out creation of cluster-stats StatNames from creation of the stats, to save CPU during xDS updates#14028
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
Hey Matt -- PTAL briefly at this new pattern of using macros to declare StatNames once, then the actual counters, gauges, etc referencing the StatNames. If you think this direction is reasonable I can flesh out with a more complete implementation and specific tests for the new macros. I just wanted to show a proof-of-concept, handling counters and gauges, and not histograms or text-readouts or unit tests for all that yet. I believe this will save some CPU in xDS handling in cases with large cluster sizes. There may be other places where this is needed. This really needs a benchmark to prove it will help, which is tracked in #14005 . See also #8831 which addresses the same issue. The regex overhead that I think that one saves (or at least provides the pattern to save) is quite obvious on the flame graphs we've seen, but this is showing up as well. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
| #define MAKE_STATS_STRUCT_STATNAME_HELPER_(name) | ||
| #define GENERATE_STATNAME_STRUCT(name) | ||
|
|
||
| /** |
There was a problem hiding this comment.
Note that MAKE_STAT_NAMES_STRUCT and MAKE_STATS_STRUCT do not follow the pattern of the existing macros as they require distinct initializers based on data constructed from the ctor params. There's also a bit of hackery where the HELPER_ functions above depend on formal-param names established in other macros. I couldn't figure out how to make something like your FINISH_DECL macros work for this.
mattklein123
left a comment
There was a problem hiding this comment.
At a high level this makes sense to me. LMK when ready to review!
/wait
include/envoy/stats/stats_macros.h
Outdated
| : scope_(scope) \ | ||
| ALL_STATS(MAKE_STATS_STRUCT_COUNTER_HELPER_, MAKE_STATS_STRUCT_GAUGE_HELPER_, \ | ||
| MAKE_STATS_STRUCT_STATNAME_HELPER_) {} \ | ||
| Envoy::Stats::Scope& scope_; \ |
There was a problem hiding this comment.
Does holding the scope reference here actually do anything?
There was a problem hiding this comment.
No. this was done for syntactic purposes only as I needed the macro to have a leading comma. I should add comment about that unfortunate hack.
There was a problem hiding this comment.
I wound up replacing the scope here (which we do not need) with a ref to the StatNames struct, which turns out to be useful (though not essential).
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
Thanks @mattklein123 -- I may back-burner this for a little while as I went back to the flame-graph and am not convinced yet whether the overhead from the original macros is material. Also I was able to get this to work ok for one example, but I don't have a good answer yet for how to structure the class and macros to elegantly handle the 15 possible combinations of a subsystem needing counters, gauges, histograms, or text-readouts. I will circle back after we get some flame-graphs with a hot regex converted to RE2 (#8831 merged now) and maybe also a CDS speed test (#14005) |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…d locks in worker threads on updates. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
… doesn't have to be resymbolized each time Router::FilterConfig is instantiated. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
@mattklein123 this is worth a pass. I should add more info about the macros to stats.md but I can do this in parallel to taking a look at the code. I was motivated to pick this up again by a flame graph we found in a high-thread-count system that exposed a bottleneck on counterFromString on the main thread in Router::FilterConfig's stats. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
| * | ||
| * @return the stat names. | ||
| */ | ||
| virtual const Router::RouterStatNames& routerStatNames() const PURE; |
There was a problem hiding this comment.
we hang the routerStatNames off the clusterManagerFactory because we happen to have that available in the place where we construct the router's FilterConfig. It's arguably not too clean. I could also just add a RouterContext and plumb that through the chain of constructors.
There was a problem hiding this comment.
From a cleanliness/tech debt perspective, we are trying to remove protocol specific stuff from cluster manager (@alyssawilk just started on some of this in terms of moving hard coded HTTP logic out to extensions).
It seems to me the best way to do this would be create a singleton somewhere where the router config is created? Then it would just be created once and lookup?
There was a problem hiding this comment.
I think I have a better place to put it: the Http::Context. I thought of that because of your comment about 'hard coded HTTP logic". Does that seem reasonable?
And I have an Http::Context available to me where I needed the router stat names without excess plumbing.
There was a problem hiding this comment.
This is very router filter specific though. Can't it just be a singleton provided by the router filter factory?
There was a problem hiding this comment.
Per slack, this is not the router filter, it was in source/common/router.
I earlier had this PR in a state where I had put the names needed for Router::FilterStats into Http::Context, and that was pretty straightforward, but wasn't great for separation. Now I have a Router::Context. Plumbing that through was trivial but involved a lot of file edits due to parallel universes for mocks, validation servers, integration test servers, and whatnot.
LMK what you think; I'm OK trying this out as a prod singleton too, but I think it would involve a few implicit dependencies that are now explicit (if verbose). For example in every test method we'd need to re-init the singleton with the test's symbol table.
There was a problem hiding this comment.
Yeah sorry I had forgotten the AsyncClient angle to this. Given that this is basically part of the core I'm fine putting it wherever you think is easiest.
There was a problem hiding this comment.
I think it's a bit cleaner to have the router stats in their won context, though it expanded the number of files in the PR a bit.
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, flushing out a few comments.
/wait
| * ClusterManager. This helps factor out some relatively heavy name | ||
| * construction which occur when there is a large CDS update during operation, | ||
| * relative to recreating all stats from strings on-the-fly. |
There was a problem hiding this comment.
Wouldn't these be Cluster/upstream stats and not cluster manager stats which I think would be global and created once?
There was a problem hiding this comment.
I reverted the ClusterManager stats to their previous state.
| * | ||
| * @return the stat names. | ||
| */ | ||
| virtual const Router::RouterStatNames& routerStatNames() const PURE; |
There was a problem hiding this comment.
From a cleanliness/tech debt perspective, we are trying to remove protocol specific stuff from cluster manager (@alyssawilk just started on some of this in terms of moving hard coded HTTP logic out to extensions).
It seems to me the best way to do this would be create a singleton somewhere where the router config is created? Then it would just be created once and lookup?
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…uter stat names to pass to async clients. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ocess. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…able. Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
I think this is now ready for another round. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks LGTM with small questions and comments.
/wait
…ent.* naming works. Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
Thanks for the review of this PR that evolved into having so many files! |
* master: (41 commits) event: Remove a source of non-determinism by always running deferred deletion before post callbacks (envoyproxy#14293) Fix TSAN bug in integration test (envoyproxy#14327) tracing: Add hostname to Zipkin config. (envoyproxy#14186) (envoyproxy#14187) [conn_pool] fix use after free in H/1 connection pool (envoyproxy#14220) lua: update deprecated lua_open to luaL_newstate (envoyproxy#14297) extension: use bool_flag to control extension link (envoyproxy#14240) stats: Factor out creation of cluster-stats StatNames from creation of the stats, to save CPU during xDS updates (envoyproxy#14028) test: add scaled timer integration test (envoyproxy#14290) [Win32 Signals] Add term and ctrl-c signal handlers (envoyproxy#13954) config: v2 transport API fatal-by-default. (envoyproxy#14223) matcher: fix UB bug caused by dereferencing a bad optional (envoyproxy#14271) test: putting fake upstream config in a struct (envoyproxy#14266) wasm: use Bazel rules from Proxy-Wasm Rust SDK. (envoyproxy#14292) docs: fix typo (envoyproxy#14237) dependencies: allowlist CVE-2018-21270 to prevent false positives. (envoyproxy#14294) typo in redis doc (envoyproxy#14248) access_loggers: removed redundant dep (envoyproxy#14274) fix http2 flaky test (envoyproxy#14261) test: disable flaky xds_integration_test. (envoyproxy#14287) http: add functionality to configure kill header in KillRequest proto (envoyproxy#14288) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
) Signed-off-by: Joshua Marantz <jmarantz@google.com> Commit Message: Capture StatNames in the factory for extra categories of cluster stats. This is a follow-up to #14028 which established the stat-macro infrastructure needed to make this easy most of the time. Additional Description: This still leaves behind the stats for circuit breakers, which are difficult to convert to the new stat-name macro-creation structure due to some pretty creative use of the existing macros. Resolving that can be a follow-up, which I think might require abandoning the macros for circuit-breaker stats and just using direct API calls. Risk Level: low Testing: //test/... Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a
Commit Message: Adds a new set of macros in stat_macros.h for separately declaring the names of stats, to hold in a factory, and then for creating the stats based on the symbolized names. This removes a symbol-table contention bottleneck for stats that are created by name during operation. Two such instances are resolved in this PR: Cluster stats and Route filter stats.
Additional Description: There were two infrastructural issues expanding the file-count in this PR, but hopefully not the conceptual complexity:
Risk Level: low
Testing: //test/...
Docs Changes: will update stats.md as part of this PR
Release Notes: n/a
Platform Specific Features: n/a