stats: Reorganize tags into a map<prefix, std::vector<tag>> to avoid initiating multiple regexes that don't match.#2582
Conversation
…ing multiple regexes that don't match. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
There was some computation in here of prefix strings were not necessarily "."-delimeted, which could potentially used for a prefix-filter independent of the map lookup. However that's not used currently, so just collect the prefixes we are actually going to use for the map. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…void asan issues in test. Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
Assigning to @alyssawilk for final senior maintainer review. |
alyssawilk
left a comment
There was a problem hiding this comment.
Wow, 3x speed up - Fantastic! First review pass below.
Also please keep Release Notes in the PR description even if you think it's n/a. I suspect your entire set of PRs merits a release note but I'm fine if you land the release note in the final PR but having the field reminds us to check in :-)
include/envoy/stats/stats.h
Outdated
| * | ||
| * The storage for the prefix is owned by the TagExtractor. | ||
| * | ||
| * @return absl::string_view returnsthe prefix, or an empty string_view if none was found. |
| // Constructor to fill map. | ||
| TagNameValues() : name_regex_pairs_(getRegexMapping()) {} | ||
| // Returns the list of descriptors. | ||
| const std::vector<Descriptor>& descriptor_vec() const { return descriptor_vec_; } |
There was a problem hiding this comment.
I don't believe Envoy does snake case for accessors, so descriptorVec()
| : name_(name), prefix_(std::string(extractRegexPrefix(regex))), | ||
| regex_(RegexUtil::parseRegex(regex)) {} | ||
|
|
||
| std::string TagExtractorImpl::extractRegexPrefix(absl::string_view regex) { |
There was a problem hiding this comment.
This could totally use some explanatory comments for those of us who haven't been living in this code for a month :-)
There was a problem hiding this comment.
fair enough; I put them in the header though:
/**
* Examines a regex string, looking for the pattern: ^alphanumerics_with_underscores\.
* Returns "alphanumerics_with_underscores" if that pattern is found, empty-string otherwise.
* @param regex absl::string_view the regex to scan for prefixes.
* @return std::string the prefix, or "" if no prefix found.
*/
| throw EnvoyException(fmt::format( | ||
| "No regex specified for tag specifier and no default regex for name: '{}'", name)); | ||
| } | ||
| if (regex.empty()) { |
There was a problem hiding this comment.
It looks like you've moved the default lookup to the function calling createTagExtrator? I believe you should be updating the comments in the .h file accordingly.
There was a problem hiding this comment.
Also this is substantial enough refactor I'd be inclined to call it Medium but your call :-)
There was a problem hiding this comment.
ok, and added comments for all new functions (and some old ones).
source/common/stats/stats_impl.h
Outdated
| std::string produceTags(const std::string& metric_name, std::vector<Tag>& tags) const override; | ||
|
|
||
| void addExtractor(TagExtractorPtr extractor); | ||
| void addExtractorsMatching(absl::string_view name); |
There was a problem hiding this comment.
Please comment up all new functions.
Existing Envoy code has a pretty low bar for comments but I'd like to raise it :-)
It also makes things easier on your reviewers - I'll do one more pass with comments added as it'll be faster than you waiting on me reading and grokking all of the tag code :-)
…changed in this PR. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…than awkward reverse iteration. Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
dnoe
left a comment
There was a problem hiding this comment.
Mostly I just have a bunch of nits. This is cool stuff!
| namespace absl { | ||
| class string_view; | ||
| } | ||
|
|
There was a problem hiding this comment.
Why do this instead of (presumably) including the string_view header?
There was a problem hiding this comment.
Dependency management in bazel, mainly. Let me look at that again.
|
|
||
| return name_regex_pairs; | ||
| void TagNameValues::addRegex(const std::string& name, const std::string& regex) { | ||
| // Descriptor& descriptor = descriptor_map_[name]; |
There was a problem hiding this comment.
yes. Sorry about all these. Internally I usually don't put the space after the "//" and depend on the linter to find these for me so I can clean them before mailing a review. But I've been just having envoy's formatter fix my code and then it just puts the space back in. I need a new pattern for things to clean up before review :)
| descriptor_vec_.emplace_back(); | ||
| Descriptor& descriptor = descriptor_vec_.back(); | ||
| descriptor.name = name; | ||
| descriptor.regex = regex; |
There was a problem hiding this comment.
If Descriptor had a constructor, I think you can eliminate a bunch of boilerplate here and also allow the members of Descriptor to be const.
|
|
||
| // Mapping from the names above to their respective regex strings. | ||
| // std::unordered_map<std::string, Descriptor> descriptor_map_; | ||
| // std::vector<const Descriptor* /* owned by descriptor_map_ */> descriptor_vec_; |
There was a problem hiding this comment.
More vestigial comments? Or hints of what's to come?
There was a problem hiding this comment.
done, just vestigial.
source/common/stats/stats_impl.cc
Outdated
| } | ||
|
|
||
| void TagProducerImpl::addExtractor(TagExtractorPtr extractor) { | ||
| absl::string_view prefix = extractor->prefixToken(); |
source/common/stats/stats_impl.cc
Outdated
|
|
||
| #include <algorithm> | ||
| #include <chrono> | ||
| #include <list> |
There was a problem hiding this comment.
not used. deleted.
test/common/stats/stats_impl_test.cc
Outdated
| std::vector<TagExtractorPtr> tag_extractors_; | ||
| /** | ||
| * Reimplements TagProducerImpl::produceTags, but extracts the tags in reverse order. | ||
| * His helps demonstrate that the order extractors does not matter to the end result, |
There was a problem hiding this comment.
typo: His -> This, and "order of extractors"
| * His helps demonstrate that the order extractors does not matter to the end result, | ||
| * assuming we don't care about tag-order. This is in large part correct by design because | ||
| * stat_name is not mutated until all the extraction is done. | ||
| * @param metric_name std::string a name of Stats::Metric (Counter, Gauge, Histogram). |
There was a problem hiding this comment.
There's no parameter called metric_name, did you mean stat_name?
There was a problem hiding this comment.
I renamed the var to metric_name for consistency with the interface in Stats.
| * stat_name is not mutated until all the extraction is done. | ||
| * @param metric_name std::string a name of Stats::Metric (Counter, Gauge, Histogram). | ||
| * @param tags std::vector a set of Stats::Tag. | ||
| */ |
There was a problem hiding this comment.
Nit: Add a @return doc string. Not normally required for tests, but since the input params are well documented might as well do it for return.
source/common/stats/stats_impl.cc
Outdated
| for (absl::string_view::size_type i = 1; i < regex.size(); ++i) { | ||
| if (!absl::ascii_isalnum(regex[i]) && (regex[i] != '_')) { | ||
| if (i > 1) { | ||
| bool last_char = i == regex.size() - 1; |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
Back to you, @alyssawilk :) |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
Please merge master to pick up #2613, this should fix CI TSAN. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
mrice32
left a comment
There was a problem hiding this comment.
LGTM. Will probably do a final pass tomorrow just to make sure I didn't miss anything, but awesome work!
| addDefaultExtractors(const envoy::config::metrics::v2::StatsConfig& config); | ||
|
|
||
| /** | ||
| * Iterates over every tag extractor that might possibly match stat_name, calling |
There was a problem hiding this comment.
nit: "that might possibly match stat_name" seems a little unspecific. Do you think it might make sense to be specific about what exact criterion this method checks against (prefix for those that have it) to determine the set over which the function is called?
There was a problem hiding this comment.
I'm coy about this implementation detail in the interface doc because I am weighing whether it makes sense to also scan for substrings. In particular, the rq_xxx paterns are slow, and can be eliminated with a quick scan for "rq" on the input string, so it may not be prefixes in the future. I want to do some more targeted tests for that after more of the low hanging fruit is plucked.
There was a problem hiding this comment.
Hmmm, since the primary goal of the method is to select on certain criteria, I think that should be explicitly laid out in the documentation since the only other way to determine it is to check the implementation. Could we maybe create a bullet list of the criteria here, and for now it'll just show the prefix, and later we can add to that if things change? Also, I thought only public methods required doxygen documentation in Envoy. Has this changed?
There was a problem hiding this comment.
RE show what actually happens in the current impl, and update when that changes: SGTM. Coming.
RE doxygen: I would like to know what the actual post-processing flow is. I think there's a bunch of redundancy in the doxygen comments. In the past when I've used doxygen for C++ I've done it somewhat differently, commenting the vars in the formal-list, so you don't have to repeat the variable name or the type. So right now I'm just trying to be consistent but I wonder what the overall benefit is.
However @alyssawilk asked specifically for more function-doc here and in general in Envoy and I agree with that. It helps the reader to more easily follow unfamiliar code so I'm all over that. I wouldn't mind being somewhat terser though.
There was a problem hiding this comment.
I think the history around Doxygen usage in Envoy can be summed up as the fact that Doxygen style was adopted as a convention, but rather than actually running Doxygen on the source Envoy development has historically involved just human reading the Doxygen comments in the header files. So the quirks in usage of Doxygen style can generally be traced to this somewhat docs usage pattern.
| for (const auto& desc : Config::TagNames::get().descriptorVec()) { | ||
| if (desc.name == name) { | ||
| addExtractor(Stats::TagExtractorImpl::createTagExtractor(desc.name, desc.regex)); | ||
| ++num_found; |
There was a problem hiding this comment.
nit: Is this expected to never go above 1? If so, should we toss in an ASSERT at the end of this method? (Or if we're really confident, we could save some time and just break when found :) )
There was a problem hiding this comment.
It is expected to go above 1. That's one of the anticipated changes I have in the wings; my theory (which I need to isolate a test for) is that it's faster to do two different regexes when there are two different ways to generate a tag. I think some of the catastrophic backtracking was due to too many alternatives in one regex, but I haven't totally wrapped my arms around that yet as I was focused on trying to avoid regexes completely.
There was a problem hiding this comment.
I see - that makes sense. Since this is a future feature, can we toss an assert in for now (since it's only a one-liner), and remove if the implementation changes just to keep this PR self-contained?
| std::vector<Tag>& tags) const { | ||
| tags.insert(tags.end(), default_tags_.begin(), default_tags_.end()); | ||
| void TagProducerImpl::addExtractorsMatching(absl::string_view name) { | ||
| int num_found = 0; |
There was a problem hiding this comment.
nit: prefer fixed-size primitives (int64_t for example)
There was a problem hiding this comment.
My reading of https://google.github.io/styleguide/cppguide.html#Integer_Types is that small integers can be int, though I really don't care that much. I don't see an override for that in https://github.com/envoyproxy/envoy/blob/master/STYLE.md -- did I miss it?
There was a problem hiding this comment.
Nope, you're right. I just think it's preferable to use the explicitly sized integers everywhere (especially in non-test code) rather than making a somewhat subjective exception for small values. Not a big deal, and if you want to leave as is, that's fine.
There was a problem hiding this comment.
I don't feel strongly about this at all, but a recursive-grep through envoy source shows lots of existing 'int' usage. Some of them are related to system-calls for that type, but there's plenty of others that look like they are counting small things. So I'm inclined to keep this as 'int'.
source/common/stats/stats_impl.cc
Outdated
| } | ||
| if (num_found == 0) { | ||
| throw EnvoyException(fmt::format( | ||
| "No regex specified for tag specifier and no default regex for name: '{}'", name)); |
There was a problem hiding this comment.
nit: not sure how opinionated others are on this, but I generally prefer checking context-specific errors like this in the caller since this method has nothing to do with checking for the presence of a config-specified regex. WDYT?
There was a problem hiding this comment.
fair enough I can just return num_found.
| const absl::string_view token = absl::string_view(stat_name.data(), dot); | ||
| const auto iter = tag_extractor_prefix_map_.find(token); | ||
| if (iter != tag_extractor_prefix_map_.end()) { | ||
| for (const TagExtractorPtr& tag_extractor : iter->second) { |
There was a problem hiding this comment.
nit: prefer const auto & in range-based loops.
There was a problem hiding this comment.
IMO the fact that iter->second ranges over TagExtractorPtr is not locally obvious and it's more readable to be explicit. Moreover https://google.github.io/styleguide/cppguide.html#auto talks about when 'auto' being acceptable when the actual type name is cumbersome or obvious, and I think neither is the case here.
I don't see an override of this policy in https://github.com/envoyproxy/envoy/blob/master/STYLE.md
There was a problem hiding this comment.
That's a fair point. I didn't notice that the type of the vector wasn't spelled out nearby (as denoted in one of the allowed cases in the style guide).
source/common/stats/stats_impl.cc
Outdated
| tags.insert(tags.end(), default_tags_.begin(), default_tags_.end()); | ||
| IntervalSetImpl<size_t> remove_characters; | ||
| forEachExtractorMatching( | ||
| metric_name, [&remove_characters, &tags, metric_name](const TagExtractorPtr& tag_extractor) { |
There was a problem hiding this comment.
nit: why not &metric_name in the capture?
There was a problem hiding this comment.
Good question...I was thinking it made sense to capture the reference only if it was going to be modified in the lambda body.
The first example in https://google.github.io/styleguide/cppguide.html#Lambda_expressions supports my pattern but doesn't explicitly require it. I don't see any overriding commentary in https://github.com/envoyproxy/envoy/blob/master/STYLE.md
There was a problem hiding this comment.
Hmmmm, since metric_name is a const ref already, I think the only difference would be whether you make a copy or not, since the lambda should retain the constness of the ref. IIUC, this is a similar tradeoff to passing a string by value or by const ref in a function if the function is not going to modify it.
There was a problem hiding this comment.
I agree, there should be no performance difference here. I felt like prefixing a variable with "&" was a way for me to tell the reader "I'm going to be modifying this state in the lambda body". Since I'm not planning on modifying it, I felt like I should omit the &.
There was a problem hiding this comment.
Sorry, I don't think I worded that very well. I think there is a performance difference here. IIUC, without the &, you are making a local copy of the string for the lambda's use, which is unnecessary.
There was a problem hiding this comment.
TIL: const refs get value-ized by C++ in a lambda capture. Right you are. Switched to using & in the capture for metric_name.
…f 0. Also renames structure member variables in Descriptor in well_known_names.cc to end with _ per Envoy style guide (overriding the Google style guide which states that structs don't suffix member variables with _). Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…was a const ref to begin with. Signed-off-by: Joshua Marantz <jmarantz@google.com>
dnoe
left a comment
There was a problem hiding this comment.
LGTM, defer to @alyssawilk for senior maintainer review and merge.
alyssawilk
left a comment
There was a problem hiding this comment.
This looks great - now both fastrer and easier to understand :-) Only two little nits on my side.
| // I anticipate possibly changing the default tag regexes so that more than one regex can | ||
| // yield the same tag, on the theory that this will reduce regex backtracking. But at the | ||
| // moment, this doesn't happen, so the flow isn't well tested this. When we start exploiting | ||
| // this, and it's tested, we can simply remove this assert. |
There was a problem hiding this comment.
Should this be a TODO ? If not I'd suggest rewriting as a statement of what is, since speculation of what will happen may be more confusing than not.
Also
the flow isn't well tested this -> this flow isn't well tested?
| } | ||
| if (regex.empty()) { | ||
| throw EnvoyException(fmt::format( | ||
| "No regex specified for tag specifier and no default regex for name: '{}'", name)); |
There was a problem hiding this comment.
I know you just moved this, but as long as you're in here mind adding a unit test?
…pty regex. Signed-off-by: Joshua Marantz <jmarantz@google.com>
…o avoid initiating multiple regexes that don't match. (envoyproxy#2582)" This reverts commit 9c728a0.
|
Thanks for all the reviewing!
…On Thu, Feb 15, 2018 at 4:31 PM, alyssawilk ***@***.***> wrote:
Merged #2582 <#2582>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2582 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB2kPXoPL7Uby9L1NMSs9BL4dilOMppYks5tVKIXgaJpZM4SBdks>
.
|
Description:
Improves startup speed from about 18 seconds to 7 seconds on the 10k example from #2063.
This is step 5 in the the plan to improve startup performance, and the first one (in time-order) to actually improve speed. This also removes a previously existing restriction that a tag can only be computed from one regex; now a tag could be computed by multiple different regexes, although no such change has occurred in this PR.
Note that the code which collects the regexes has been refactored to be more functional, and thus easier to change in the an upcoming PR, but the regexes have not been changed at all.
Risk Level: Medium -- this is startup critical but I think it's functionally bounded and well tested.
Testing:
//test/...
manually hacked main_common to exit(0) before loop and timed server startup with 10k clusters.
Release Notes:
None yet, will add release notes when the chain of PRs is done.