stats: prevent unused counters from leaking across hot restart#6850
stats: prevent unused counters from leaking across hot restart#6850mattklein123 merged 18 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
|
/assign jmarantz |
… passed anyways Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Awesome, thanks for tackling. A few comments to get started.
/wait
include/envoy/stats/store.h
Outdated
| /** | ||
| * @return whether any known counter exists with this name. | ||
| */ | ||
| virtual bool counterExists(const std::string& counter_name) PURE; |
There was a problem hiding this comment.
This one is test only, so it will definitely be best to just get rid of counterExists, and use those ones in here. That will be a very easy switch; left a todo.
There was a problem hiding this comment.
Would've called this counterExistsForTests, but probably not worth it at this point. Maybe follow-up with a rename?
Also note in the doc that this will take a lock.
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
|
Ran all tests this time. All the CI should be fixed. |
Signed-off-by: Fred Douglas <fredlas@google.com>
|
@fredlas ASAN failure looks legit. |
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
|
Haha sorry, changed my mind. Turns out the state transitions don't 100% make sense to switch to an enum. Just adding the 3rd bool is super simple, so I have now added that to this PR. |
|
/retest |
|
🔨 rebuilding |
mattklein123
left a comment
There was a problem hiding this comment.
Very nice. Will give @jmarantz a chance to take a look.
jmarantz
left a comment
There was a problem hiding this comment.
Nice work!
I'm OK with merging this with my nits addressed in a follow-up, but I'll let Matt make the call?
include/envoy/stats/store.h
Outdated
| /** | ||
| * @return whether any known counter exists with this name. | ||
| */ | ||
| virtual bool counterExists(const std::string& counter_name) PURE; |
There was a problem hiding this comment.
Would've called this counterExistsForTests, but probably not worth it at this point. Maybe follow-up with a rename?
Also note in the doc that this will take a lock.
|
Fine for me to wait for @jmarantz comments. They all seem like great things to do. /wait |
I agree, but why add it in the first place given that we don't know when the other PR will merge? Per @jmarantz we can just have a method directly on the ThreadLocalStore that does what we need and is marked for testing? |
|
Moreover it occurred to me after reflecting on the code that these new functions aren't needed, as there is already TestUtility::findCounter() and findGauge, declared in test/test_common/utility.h |
Signed-off-by: Fred Douglas <fredlas@google.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
|
Nice find, thanks! Switched to those. |
Signed-off-by: Fred Douglas <fredlas@google.com>
| } | ||
| // We accessed 0 and 1 above, but not 2. Now that StatMerger has been destroyed, | ||
| // 2 should be gone. | ||
| EXPECT_TRUE(TestUtility::findCounter(store, "newcounter0")); |
There was a problem hiding this comment.
I'd probably write it as EXPECT_TRUE(TestUtility::findCounter(store, "newcounter0") != nullptr) or maybe EXPECT_NE(TestUtility::findCounter(store, "newcounter0"), nullptr) if that compiles, but meh :)
* master: (88 commits) upstream: Null-deref on TCP health checker if setsockopt fails (envoyproxy#6793) ci: switch macOS CI to azure pipelines (envoyproxy#6889) os syscalls lib: break apart syscalls used for hot restart (envoyproxy#6880) Kafka codec: precompute request size before serialization, so we do n… (envoyproxy#6862) upstream: move static and strict_dns clusters to dedicated files (envoyproxy#6886) Rollforward of api: Add total_issued_requests to Upstream Locality and Endpoint Stats. (envoyproxy#6692) (envoyproxy#6784) fix explicit constructor in copy-initialization (envoyproxy#6884) stats: use tag iterator rather than constructing the tag-array and searching that. (envoyproxy#6853) common: use unscoped build target in generate_version_linkstamp (envoyproxy#6877) Addendum to envoyproxy#6778 (envoyproxy#6882) ci: add minimum Linux build for Azure Pipelines (envoyproxy#6881) grpc: utilities for inter-converting grpc::ByteBuffer and Buffer::Instance. (envoyproxy#6732) upstream: allow excluding hosts from lb calculations until initial health check (envoyproxy#6794) stats: prevent unused counters from leaking across hot restart (envoyproxy#6850) network filters: add `injectDataToFilterChain(data, end_stream)` method to network filter callbacks (envoyproxy#6750) delete things that snuck back in (envoyproxy#6873) config: scoped rds (2b): support delta APIs in ConfigProvider framework (envoyproxy#6781) string == string! (envoyproxy#6868) config: add mssing imports to delta_subscription_state (envoyproxy#6869) protobuf: add missing default case to enum (envoyproxy#6870) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
|
By "drop", do you mean values going to 0, or metrics going from existing to not existing? Either version would be different from the issue solved in this PR, so this discussion should probably move to an issue, but it does sound like something to discuss. |
|
This is how it looks for I am wondering if the move from shared mem to copying stats over IPC would allow for such a drop & recovery to happen. Given that copying stats over takes time... cc: @fishcakez |
|
That problem is not related to this PR; we should discuss it elsewhere. |
|
@mattklein123 yeah lets chat here: #6924 @fredlas thanks! |

During hot restart, stores counters imported from the parent in a temporary scope. When hot restart concludes, that scope is dropped. The result is that any counter the child accessed (independently of the hot restart merging) will be preserved by the separate reference that access created, but any counter not so accessed will be dropped. This brings the hot restart counter behavior back to how it was before shared memory was removed in #5910.
Risk Level: low
Testing: updated test/common/stats/stat_merger_test.cc.
More fully fixes #4974.