Refactor Stats::RawStatData into a StatsOptions struct#3629
Refactor Stats::RawStatData into a StatsOptions struct#3629mattklein123 merged 24 commits intoenvoyproxy:masterfrom
Conversation
e8bf899 to
d7ff9ba
Compare
source/common/stats/stats_impl.h
Outdated
There was a problem hiding this comment.
this really could be fixed in a separate PR, but the signature change makes it more obvious; IMO we should spend a few bytes and retain the actual length in the structure, rather than a nul byte, rather than having to call strlen on every access.
I think we need to support >256 characters so we should probably spend a few bytes and retain the length explicitly in the structure.
That will improve the speed (at a moderate cost of memory) and could be done in a very isolated way, reducing the # of changes in this PR.
There was a problem hiding this comment.
I'm kinda on-board with this. Can we not just pre-truncate the name? As in, why can't we just pass an optional length to which we want name_ truncated? As @jmarantz mentioned, this will require us to make name_ null-terminated.
There was a problem hiding this comment.
This gets truncated at write time in initialize() / safeInitialize(); i think we can remove the keyGivenStatsOptions entirely since name_ is already being null-terminated.
There was a problem hiding this comment.
Yes, this works! I will replace all instances of keyGivenStatsOptions with key().
include/envoy/stats/stats.h
Outdated
There was a problem hiding this comment.
Actually I think i'm going to get rid of this function altogether, since it doesn't actually get called anywhere. If we need it later we can put it back.
source/common/stats/stats_impl.h
Outdated
There was a problem hiding this comment.
I'm kinda on-board with this. Can we not just pre-truncate the name? As in, why can't we just pass an optional length to which we want name_ truncated? As @jmarantz mentioned, this will require us to make name_ null-terminated.
source/exe/main_common.cc
Outdated
There was a problem hiding this comment.
nit:
stats_store_ = std::make_unique<Stats::ThreadLocalStoreImpl>(restarter_->statsAllocator());
There was a problem hiding this comment.
nit: can we use std::make_unique here?
test/common/stats/stats_impl_test.cc
Outdated
There was a problem hiding this comment.
Do we have an equivalent test that tests the truncation wherever it happens now?
There was a problem hiding this comment.
Yes, HotRestartImplTest.truncateKey also tests this, but I'll put back in a native RawStatDataTest.Truncate test.
test/server/http/admin_test.cc
Outdated
source/common/stats/stats_impl.cc
Outdated
There was a problem hiding this comment.
If this no longer truncates, the names of the stats that overflow into the heap allocator in the hot restart case will no longer be truncated. This means that the overflowed names will be inconsistent with those that were allocated in the block allocator. Can we make sure that we truncate in that case to avoid the inconsistency?
There was a problem hiding this comment.
This sounds good. Rather than propagate a truncation throughout ThreadLocalStore, I've put the .substr() fn inline where make_stat calls the heap_allocator in `thread_local_store.cc.
|
I'm going to hold off on this until #3606, and maybe even close-and-reopen depending on how big the differential is. |
9daa293 to
1c2620a
Compare
|
@mattklein123 This PR removes truncation from HeapRawStatDataAllocator, since it doesn't have context from the top-level options struct, and we want to get rid of deeply nested statics. Am I correct in thinking that this stat name truncation isn't necessary here? @jmarantz @mrice32 |
I haven't looked at this PR yet. Do you mean is it not necessary because there is no limit inside the heap allocator? I don't think there is any particular requirement to do truncation. |
jmarantz
left a comment
There was a problem hiding this comment.
I think the key question here: does this get a whole lot simpler given that @mattklein123 has indicated that no truncation is required except in the hot-restart allocator.
And in that case, it has the options already, so I think this PR can be a lot simpler.
source/common/stats/stats_impl.h
Outdated
There was a problem hiding this comment.
name could be specified as absl::string_view with no perf or compatibility downside here, I think.
source/common/stats/stats_impl.cc
Outdated
There was a problem hiding this comment.
maybe be more explicit here naming this function:
truncateAndInit()
|
I think this PR doesn't get that much simpler given that only hot restart needs truncation. A lot of the complexity of this PR comes from |
|
@mattklein123 This is ready for review. |
There was a problem hiding this comment.
This is kind of an interesting maze to follow for those of us that don't know this part of the code deeply. But with a bit of grepping I think you can or should be able to pull the stats options from Cluster, which knows about ClusterManager, which knows about a bunch of things. I'm not sure if you can exactly get to the OptionsImpl from any of those, but it given all the things passed into ClusterManagerImpl's ctor, I think it would not be unreasonable to also have it see the options.
source/common/upstream/cluster_manager_impl.h
WDYT?
There was a problem hiding this comment.
I think it would be reasonable for ClusterManager and ListenerManager and HttpConnectionManager (for routes) to know about some higher-level thing with access to a StatsOptions (such as ServerImpl). Right now I've embedded StatsOptions inside Scope, and Scope as a member of a Server::Configuration::FactoryContext, which is already present in a lot of these locations. I'm just not sure it would really save us that much complexity.
d3e0880 to
2de48fe
Compare
|
@mattklein123 Sorry for dropping the ball on this. I've done a merge, so this PR is now ready for a final review. As noted above, the bulk of this PR isn't actually due to the addition of a statsOptions struct. Most of the delta comes from the eradication of a static call inside Utility::checkObjNameLength(). That utility now needs access to a top-level statsOptions struct for consistency, and I've had to modify a number of interfaces to accommodate this. If this makes the PR harder to read, I could tease it out into a separate one, but let me know what you think. |
include/envoy/stats/stats.h
Outdated
There was a problem hiding this comment.
What is the object part and the suffix part of a stat? Can you potentially provide an example in the comments?
There was a problem hiding this comment.
There's more color in /source/common/stats/stats_impl.h, which I'm not sure how much of to duplicate in the include file. So the object part is something like cluster.<cluster_name>.outlier_detection, and the suffix part is one of the statistics listed in the docs here like ejections_enforced_consecutive_gateway_failure. We just want to limit the lengths of these halves separately. Do you think it's worth adding that to the include too?
There was a problem hiding this comment.
Maybe just comment about where the other comments are? In general I would expect the public includes to have the detail so I can understand what things mean, but fine to link to other places. Whatever works. It's mostly that this nomenclature caught me off guard so I think it might confuse others also.
There was a problem hiding this comment.
That sounds good to me. I did some work in 732af4f.
source/common/stats/stats_impl.cc
Outdated
There was a problem hiding this comment.
so where is initialize called now? Does this need some safety guard for cope size? If not, can you add a comment?
There was a problem hiding this comment.
So there are two initialize fns now -- one, RawStatData::initialize(key), which is used in cases where no truncation is performed (like in HeapRawStatDataAllocator::alloc(name)). The other, RawStatData::truncateAndInit(key, stats_options), gets called in BlockMemoryHashSet::insert(key), which has access to a local stats_options_ struct. So the former case doesn't guard for some max size.
There was a problem hiding this comment.
Can we somehow assert that we aren't going to overwrite memory here? It's not super clear to me how we know that this is a safe copy (or if heap allocated pass in the size of the heap allocation and assert that)?
There was a problem hiding this comment.
Instead of storing a reference to the scope here, would it better to store reference to the stat options? We are slightly splitting hairs but it potentially seems a bit clearer/safer to me.
There was a problem hiding this comment.
Makes sense -- I put statsOptions into the scope struct, and found it made more contextual sense to add scope as an argument to a lot of higher-level constructors or factories. But as those factories call lower-level utilities (parseHttpConnectionManagerFromJson, it makes more sense to give them the specific arguments they need, like the statsOptions struct. I've changed the interfaces for CdsJson::translateCluster, LdsJson::translateListener, and some others.
source/server/hot_restart_impl.cc
Outdated
There was a problem hiding this comment.
This change doesn't change the shared memory layout at all, right? Just being extra paranoid I would double check to make sure nothing changes w/ hot restart and with the output versions, as we have had issues in this area in the past. I'm not sure what our test status is on hot restart versions changing.
There was a problem hiding this comment.
It doesn't change that last argument to fmt:;format(), since that value max_stat_name_len used to be the sum of max_obj_name_len and maxStatSuffixLength(), and it still is. But I'll rerun the hot_restart_impl_test on this and upstream master and just make sure.
There was a problem hiding this comment.
I reran this test for master and this branch and the hot restart versionString is the same. 😄 In master right now, the server/mocks.cc sets maxObjNameLength() to return 150 by default. This new test does not, so the versionStrings look different -- but if we set
stats_options_.max_obj_name_length_ = 150
somewhere in TEST_F(HotRestartImplTest, versionString), then the versionStrings are identical.
There was a problem hiding this comment.
This is somewhat unrelated to your change, but WDYT about actually committing the hot restart version string into a test somewhere and verifying that it is what we expect it to be? That would warn us about a change in the hot restart version that we didn't intend? Thoughts?
There was a problem hiding this comment.
(Perhaps this goes in the hot restart test itself)?
There was a problem hiding this comment.
I like this -- I could hardcode that string into hotrestart_test.sh for comparison, merge in that PR first, and then verify that this PR doesn't need any changes to pass that test.
source/server/lds_subscription.h
Outdated
There was a problem hiding this comment.
similar comment here about scope (and options), maybe applies elsewhere also.
There was a problem hiding this comment.
nit: UNREFERENCED_PARAMATER or just delete arg variable name. Same elsewhere if I missed it.
This change allows us to deprecate the statics inside Stats::RawStatData. Some side effects of this change are: a) HeapRawStatDataAllocator no longer performs stat name truncation, b) we now construct BlockMemoryHashSet, HotRestartImpl, C/L/RdsSubscription, and ThreadLocalStoreImpl as functions of this Stats::StatsOptions struct, and c) Stats::RawStatData now looks more like a set of libraries for computing stat padding, as opposed to a source of truth for the maximum allowable name lengths. Finally, a chain of functions starting under server.cc (translateBootstrap, translateClusterManagerBootstrap, translateListener, translateCluster, translateVirtualHost) have had Stats::StatsOptions& added to their interfaces, so that Utility::checkObjNameLength() can be called with the necessary context. Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
LGTM, some small nits. Given the risk of this type of change would definitely like @mrice32 to take another pass though. Thank you!
include/envoy/stats/stats.h
Outdated
| * object name length can be overridden. The default initialization is used in IsolatedStatImpl, and | ||
| * the user-overridden struct is stored in Options. | ||
| * | ||
| * As noted in the comment above StatsOptionsImpl in source/common/stats/stats_impl.h, a stat name |
There was a problem hiding this comment.
nit: errant space, start of sentence
include/envoy/stats/stats.h
Outdated
| * As noted in the comment above StatsOptionsImpl in source/common/stats/stats_impl.h, a stat name | ||
| * often contains both a string whose length is user-defined (cluster_name in the below example), | ||
| * and a specific statistic name generated by Envoy, which are often as long as | ||
| * `upstream_cx_connect_attempts_exceeded`. To make room for growth on both fronts, we limit the max |
There was a problem hiding this comment.
nit: not sure the example here is really helpful, I would just delete it.
source/common/stats/stats_impl.cc
Outdated
There was a problem hiding this comment.
Can we somehow assert that we aren't going to overwrite memory here? It's not super clear to me how we know that this is a safe copy (or if heap allocated pass in the size of the heap allocation and assert that)?
Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
|
@mattklein123 I added an ASSERT in 2aa9d8c. |
mattklein123
left a comment
There was a problem hiding this comment.
LGTM, but would still like a final approval from @mrice32. Thank you!
|
@mattklein123 @mrice32 I want to write a golden file test for the internal representation (as hex maybe? not sure) of a stat just to be extremely sure that this didn't change. Thoughts on the maintainability of such a test? |
As long as it's well commented and someone other than you can understand the test and how to modify it, no objections from me. |
mrice32
left a comment
There was a problem hiding this comment.
This is really cool - I think it's simpler and more cohesive now! A few minor comments. I'm still looking through how the various options are set to make sure the behavior doesn't change, but it looks good so far.
| // expensive to performing a map lookup, since both require copying a truncated version of the | ||
| // string before doing the hash lookup. | ||
| uint64_t num_bytes_to_allocate = RawStatData::sizeGivenName(name); | ||
| RawStatData* data = static_cast<RawStatData*>(::calloc(num_bytes_to_allocate, 1)); |
There was a problem hiding this comment.
I know this wasn't being done before, but shouldn't we return early if calloc returns a nullptr so we don't segfault by trying to write to it? If we're not equipped to handle this failure at the callsite, then maybe we should just throw instead of returning nullptr?
| name_[xfer_size] = '\0'; | ||
| } | ||
|
|
||
| void RawStatData::truncateAndInit(absl::string_view key, const StatsOptions& stats_options) { |
There was a problem hiding this comment.
There may be a subtlety here that I'm missing here, but can we just truncate the string_view, and pass it to initialize()? It seems like the majority of initialize() and truncateAndInit() are the same.
There was a problem hiding this comment.
I actually really like this. I will refactor truncateAndInit() and checkAndInit() to use the same (private) initialize() method under the hood -- this makes the difference in logic very explicit.
source/common/stats/stats_impl.cc
Outdated
| ref_count_ = 1; | ||
|
|
||
| uint64_t xfer_size = key.size(); | ||
| ASSERT(xfer_size <= num_bytes_allocated); |
There was a problem hiding this comment.
This assert seems strange - isn't num_bytes_allocated the number of bytes allocated to store the entire stat object rather than just the portion reserved for name_?
There was a problem hiding this comment.
Oops, you are correct. This should be
ASSERT(sizeof(RawStatData) + xfer_size + 1 <= num_bytes_allocated). I'll change it.
| if (stats_table_prefix.size() > remaining_size) { | ||
| stats_table_prefix = stats_table_prefix.substr(0, remaining_size); | ||
| } | ||
|
|
mrice32
left a comment
There was a problem hiding this comment.
I've convinced myself that the hot restart layout hasn't changed. Outside of the comments above, LGTM.
|
|
||
| EXPECT_CALL(*this, alloc("stats.overflow")); | ||
| store_.reset(new ThreadLocalStoreImpl(*this)); | ||
| store_.reset(new ThreadLocalStoreImpl(options_, *this)); |
There was a problem hiding this comment.
Sorry, tossing in a new comment because replies don't seem to show up at the bottom. nit: can we use std::make_unique here?
|
@mattklein123 See #3843 for the internal memory representation test. I checked locally and the hash doesn't change, but I still want to merge that PR in first as proof. |
…ion logic Signed-off-by: James Buckland <jbuckland@google.com>
Signed-off-by: James Buckland <jbuckland@google.com>
…ntation (#3843) This PR adds a test to ensure that the internal memory representation of a RawStatData hasn't unintentionally changed. We take a hash of a struct by getting the Hex::encode() hex string representation of it, and then taking the HashUtil::xxHash64() hash of it. This will help ensure that #3629 and other PRs which deal with stats don't accidentally change RawStatData. Risk Level: Low Testing: N/A Docs Changes: None Release Notes: None Signed-off-by: James Buckland <jbuckland@google.com>
…ions Signed-off-by: James Buckland <jbuckland@google.com>
source/common/stats/stats_impl.cc
Outdated
|
|
||
| void RawStatData::checkAndInit(absl::string_view key, uint64_t num_bytes_allocated) { | ||
| uint64_t xfer_size = key.size(); | ||
| ASSERT(sizeof(RawStatData) + xfer_size + 1 <= num_bytes_allocated); |
There was a problem hiding this comment.
It seems like we compute the in-memory size of a particular RawStatData often. Do you think it's worth making some utility function that takes the length of the string (without the null terminator) and computes the size so we make sure we do it the same way every time?
There was a problem hiding this comment.
I like this note. I did some refactoring work in aeb9251.
…structSizeWithOptions() Signed-off-by: James Buckland <jbuckland@google.com>
mrice32
left a comment
There was a problem hiding this comment.
LGTM other than a little weirdness in ThreadLocalStoreImpl. Thanks for working on this - this is awesome!
| parent_.num_last_resort_stats_.inc(); | ||
| stat = | ||
| make_stat(parent_.heap_allocator_, name, std::move(tag_extracted_name), std::move(tags)); | ||
| make_stat(parent_.heap_allocator_, name.substr(0, parent_.statsOptions().maxNameLength()), |
There was a problem hiding this comment.
This seems fishy. Since we now throw if the heap allocator fails to allocate, this statement will never be called in the non-hot restart case. However, it's still an embedded assumption about the specific behavior of all allocators - fixed length if they can fail and non-fixed length if they cannot fail. Is there any problem with changing the hot restart allocator to perform the fallback rather than relying on the caller? Since this is technically correct, I think it's fine to leave it this way for now, but please toss in a TODO to get rid of this assumption.
Signed-off-by: James Buckland <jbuckland@google.com>
source/common/stats/stats_impl.cc
Outdated
| RawStatData* data = static_cast<RawStatData*>(::calloc(num_bytes_to_allocate, 1)); | ||
| data->initialize(name, num_bytes_to_allocate); | ||
| if (data == nullptr) { | ||
| throw EnvoyException("HeapRawStatDataAllocator: unable to allocate a new stat"); |
There was a problem hiding this comment.
This might be caught. Can you throw some other type of exception? (Like whatever type new() throws).
There was a problem hiding this comment.
I changed it to std::bad_alloc() in 4cad816.
Signed-off-by: James Buckland <jbuckland@google.com>
…ions Signed-off-by: James Buckland <jbuckland@google.com>
…ions Signed-off-by: James Buckland <jbuckland@google.com>
WIP: Refactor
Stats::RawStatDatainto aStats::StatsOptionsstruct.Description:
This change allows us to deprecate the statics inside
Stats::RawStatData. Some side effects of this change are:HeapRawStatDataAllocatorno longer performs stat name truncation,BlockMemoryHashSet,HotRestartImpl,C/L/RdsSubscription, andThreadLocalStoreImplas functions of thisStats::StatsOptionsstruct, andStats::RawStatDatanow looks more like a set of libraries for computing stat padding, as opposed to a source of truth for the maximum allowable name lengths.server.cc(translateBootstrap,translateClusterManagerBootstrap,translateListener,translateCluster,translateVirtualHost) have hadStats::StatsOptions&added to their interfaces, so thatUtility::checkObjNameLength()can be called with the necessary context.Risk Level: Medium
Fixes: #3508