-
Notifications
You must be signed in to change notification settings - Fork 5.5k
add deferred_creation util into stats #27899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
6b8b489
8f3119a
17cfb10
f8ba76a
68d3c6b
19d75d8
2bbace4
7d69cbf
8318935
91e47e3
cfc9859
f6bd430
97c2b86
8e9dfe5
bca128a
d1ee36c
448e474
1fa16e7
6e9a861
c1d2aa4
4e72f1e
35e7cb1
0a11cb5
8a80f36
448074d
28db9d3
8134314
c22db02
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -218,5 +218,47 @@ using SizeFn = std::function<void(std::size_t)>; | |
| */ | ||
| template <typename Stat> using StatFn = std::function<void(Stat&)>; | ||
|
|
||
| /** | ||
| * Interface for stats lazy initialization. | ||
| * To save memory and CPU consumption on blocks of stats that are never referenced throughout the | ||
| * process lifetime, they can be encapsulated in a DeferredCreationCompatibleInterface. Then the | ||
| Envoy | ||
| * bootstrap configuration can be set to defer the instantiation of those block. Note that when the | ||
| * blocks of stats are created, they carry an extra ~160 byte overhead (depending on worker thread | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry, change this now to 60-100 bytes to be consistent with the bootstrap comment.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
| * count) due to internal bookkeeping data structures. The overhead when deferred stats are disabled | ||
| * is just 8 bytes. | ||
| * See more context: https://github.com/envoyproxy/envoy/issues/23575 | ||
| */ | ||
| template <typename StatsStructType> class DeferredCreationCompatibleInterface { | ||
| public: | ||
| // Helper function to get-or-create and return the StatsStructType object. | ||
| virtual StatsStructType& instantiate() PURE; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As noted in the comment, I think
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
|
||
| virtual ~DeferredCreationCompatibleInterface() = default; | ||
| }; | ||
|
|
||
| // Template that lazily initializes a StatsStruct. | ||
| // The bootstrap config :ref:`enable_deferred_creation_stats | ||
| // <envoy_v3_api_field_config.bootstrap.v3.Bootstrap.enable_deferred_creation_stats>` decides if | ||
| // stats lazy initialzation is enabled or not. | ||
| template <typename StatsStructType> class DeferredCreation; | ||
| template <typename StatsStructType> class DirectStats; | ||
|
|
||
| // A helper class for a lazy compatible stats struct type. | ||
| template <typename StatsStructType> class DeferredCreationCompatibleStats { | ||
| public: | ||
| DeferredCreationCompatibleStats( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
| std::unique_ptr<DeferredCreationCompatibleInterface<StatsStructType>> d) | ||
| : data_(std::move(d)) {} | ||
| // Allows move construct and assign. | ||
| DeferredCreationCompatibleStats& operator=(DeferredCreationCompatibleStats&&) = default; | ||
| DeferredCreationCompatibleStats(DeferredCreationCompatibleStats&&) = default; | ||
|
|
||
| inline StatsStructType* operator->() { return &data_->instantiate(); }; | ||
| inline StatsStructType& operator*() { return data_->instantiate(); }; | ||
|
|
||
| private: | ||
| std::unique_ptr<DeferredCreationCompatibleInterface<StatsStructType>> data_; | ||
| }; | ||
| } // namespace Stats | ||
| } // namespace Envoy | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -158,16 +158,18 @@ static inline std::string statPrefixJoin(absl::string_view prefix, absl::string_ | |
| */ | ||
| #define MAKE_STATS_STRUCT(StatsStruct, StatNamesStruct, ALL_STATS) \ | ||
| struct StatsStruct { \ | ||
| /* Also referenced in Stats::createDeferredCompatibleStats. */ \ | ||
| using StatNameType = StatNamesStruct; \ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I understand why you need to create this alias. But can you add a comment explaining that?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, added a line comment. |
||
| static const absl::string_view typeName() { return #StatsStruct; } \ | ||
| StatsStruct(const StatNamesStruct& stat_names, Envoy::Stats::Scope& scope, \ | ||
| Envoy::Stats::StatName prefix = Envoy::Stats::StatName()) \ | ||
| : stat_names_(stat_names) \ | ||
| ALL_STATS(MAKE_STATS_STRUCT_COUNTER_HELPER_, MAKE_STATS_STRUCT_GAUGE_HELPER_, \ | ||
| MAKE_STATS_STRUCT_HISTOGRAM_HELPER_, \ | ||
| MAKE_STATS_STRUCT_TEXT_READOUT_HELPER_, \ | ||
| MAKE_STATS_STRUCT_STATNAME_HELPER_) {} \ | ||
| const StatNamesStruct& stat_names_; \ | ||
| const StatNameType& stat_names_; \ | ||
| ALL_STATS(GENERATE_COUNTER_STRUCT, GENERATE_GAUGE_STRUCT, GENERATE_HISTOGRAM_STRUCT, \ | ||
| GENERATE_TEXT_READOUT_STRUCT, GENERATE_STATNAME_STRUCT) \ | ||
| } | ||
|
|
||
| } // namespace Envoy | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| #pragma once | ||
|
|
||
| #include "envoy/common/pure.h" | ||
| #include "envoy/stats/scope.h" | ||
| #include "envoy/stats/stats.h" | ||
|
|
||
| #include "source/common/common/cleanup.h" | ||
| #include "source/common/common/thread.h" | ||
| #include "source/common/stats/symbol_table.h" | ||
| #include "source/common/stats/utility.h" | ||
|
|
||
| namespace Envoy { | ||
| namespace Stats { | ||
|
|
||
| /** | ||
| * Lazy-initialization wrapper for StatsStructType, intended for deferred instantiation of a block | ||
| * of stats that might not be needed in a given Envoy process. | ||
| * | ||
| * This class is thread-safe -- instantiations can occur on multiple concurrent threads. | ||
| * This is used when | ||
| * :ref:`enable_deferred_creation_stats | ||
| * <envoy_v3_api_field_config.bootstrap.v3.Bootstrap.enable_deferred_creation_stats>` is enabled. | ||
| */ | ||
| template <typename StatsStructType> | ||
| class DeferredCreation : public DeferredCreationCompatibleInterface<StatsStructType> { | ||
| public: | ||
| // Capture the stat names object and the scope with a ctor, that can be used to instantiate a | ||
| // StatsStructType object later. | ||
| // Caller should make sure scope and stat_names outlive this object. | ||
| DeferredCreation(const typename StatsStructType::StatNameType& stat_names, | ||
| Stats::ScopeSharedPtr scope) | ||
| : initialized_([&scope]() -> Gauge& { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to capture scope by value here? That this works makes me suspect we don't have a test that removes the scope first and then instantiates the stats.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The constructor accepts the scope by value, so this should be fine, right?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually come to think of it, line 41 will cause the scope originally passed in to the constructor to be deleted. Shouldn't we keep the scope alive for the lifetime of
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the constructor will be out of scope before the lambda gets run, so no help there :( I think maybe this is all moot because StatsStructType does not hold a reference-count for the scope, but holds it by So what will happen in the test scenario I just thought of: I think the code as is will reference freed memory, but with my suggestion it will work, but not be useful :) It will successfully instantiate the stats in the scope, and then the Cleanup thing will remove the scope. So you'll be left with struct full of invalid stats. So if you then do: that will then crash. This is beyond the scope of this PR to resolve (sic) but I'd still vote for capturing scope by value on line 32.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The scope is used first in initialize the gauge, and then moved into the ctor_, could you help me to understand the life cycle issue there? the scope is always supposed to outlive the stats, otherwise the stats will be deleted IIUC.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I articulated the lifecycle issue above. In practice it doesn't matter but my recommendation stands -- just change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not following where this "reference freed memory" is from. Scope is held in ctor_ until it's called. but further references to its member I am not sure, since the last copy of "scope" is deleted. I have the understanding that scope always outlive the individual StatStruct, if that not true, then the previous suggestion of do not capture scope in DeferredCreation need to be revisited.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The lambda in line 32 is not stored anywhere, so capturing scope by value there does not change anything. It's a temporary that is invoked and immediately destroyed. The lambda that is stored is the one on line 38, and the scope there is (effectively) captured by value (moved from the passed in by value object for the constructor of However storing the scope in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With the code as is, the call to
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added comment as asked. |
||
| Stats::StatNamePool pool(scope->symbolTable()); | ||
| return Stats::Utility::gaugeFromElements( | ||
| *scope, {pool.add(StatsStructType::typeName()), pool.add("initialized")}, | ||
| Stats::Gauge::ImportMode::HiddenAccumulate); | ||
| }()), | ||
| ctor_([this, stats_scope = std::move(scope)]() -> StatsStructType* { | ||
| initialized_.inc(); | ||
| // Reset ctor_ to save some RAM. | ||
| Cleanup reset_ctor([this] { ctor_ = nullptr; }); | ||
| return new StatsStructType(stat_names, *stats_scope); | ||
| }) { | ||
| if (initialized_.value() > 0) { | ||
| instantiate(); | ||
| } | ||
| } | ||
| ~DeferredCreation() { | ||
| if (ctor_ == nullptr) { | ||
| initialized_.dec(); | ||
| } | ||
| } | ||
|
|
||
| private: | ||
| inline StatsStructType& instantiate() override { return *internal_stats_.get(ctor_); } | ||
|
|
||
| // In order to preserve stat value continuity across a config reload, we need to automatically | ||
| // re-instantiate lazy stats when they are constructed, if there is already a live instantiation | ||
| // to the same stats. Consider the following alternate scenarios: | ||
|
|
||
| // Scenario 1: a cluster is instantiated but receives no requests, so its traffic-related stats | ||
| // are never instantiated. When this cluster gets reloaded on a config update, a new lazy-init | ||
| // block is created, but the stats should again not be instantiated. | ||
|
|
||
| // Scenario 2: a cluster is instantiated and receives traffic, so its traffic-related stats are | ||
| // instantiated. We must ensure that a new instance for the same cluster gets its lazy-stats | ||
| // instantiated before the previous cluster of the same name is destructed. | ||
|
|
||
| // To do that we keep an "initialized" gauge in the cluster's scope, which will be associated by | ||
| // name to the previous generation's cluster's lazy-init block. We use the value in this shared | ||
| // gauge to determine whether to instantiate the lazy block on construction. | ||
| Gauge& initialized_; | ||
| // TODO(#26957): Clean up this ctor_ by moving its ownership to AtomicPtr, and drop | ||
| // the setter lambda when the nested object is created. | ||
| std::function<StatsStructType*()> ctor_; | ||
| Thread::AtomicPtr<StatsStructType, Thread::AtomicPtrAllocMode::DeleteOnDestruct> internal_stats_; | ||
| }; | ||
|
|
||
| // Non-DeferredCreation wrapper over StatsStructType. This is used when | ||
| // :ref:`enable_deferred_creation_stats | ||
| // <envoy_v3_api_field_config.bootstrap.v3.Bootstrap.enable_deferred_creation_stats>` is not | ||
| // enabled. | ||
| template <typename StatsStructType> | ||
| class DirectStats : public DeferredCreationCompatibleInterface<StatsStructType> { | ||
| public: | ||
| DirectStats(const typename StatsStructType::StatNameType& stat_names, Stats::Scope& scope) | ||
| : stats_(stat_names, scope) {} | ||
|
|
||
| private: | ||
| inline StatsStructType& instantiate() override { return stats_; } | ||
| StatsStructType stats_; | ||
| }; | ||
|
|
||
| template <typename StatsStructType> | ||
| DeferredCreationCompatibleStats<StatsStructType> | ||
| createDeferredCompatibleStats(Stats::ScopeSharedPtr scope, | ||
| const typename StatsStructType::StatNameType& stat_names, | ||
| bool deferred_creation) { | ||
| if (deferred_creation) { | ||
| return {std::make_unique<DeferredCreation<StatsStructType>>(stat_names, scope)}; | ||
| } else { | ||
| return {std::make_unique<DirectStats<StatsStructType>>(stat_names, *scope)}; | ||
| } | ||
| } | ||
|
|
||
| } // namespace Stats | ||
| } // namespace Envoy | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we ever want to free the deferred-creation stats if they aren't used for some period of time? If so, would it change how we structure this config? Or would there be any other related config for this setting?
Would it be worth having an empty message here, which when set enables this feature? Then we could put additional config for it in that message if/when we need it, such as options for selecting which stats are deferred-creation, timer for freeing them after non-use, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmarantz as well.
My thinking is that this is a all-or-none setting that controls the deferred creation characteristics for all stats that's supports deferred creation, but I'd let Josh to chime in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Greg's suggestion: by making this a message with a bool in it, rather than just a bool, you present an API that can be extended in the future to maybe include a timeout.
I think a timeout would be useful in situations like the one you have in mind, but where a cluster might get a request, bringing in all its stats, and then not get any for many days after that. We might prefer to drop those stats then have them consume memory until process exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will change it to a submessage field.
Is there any preference on how to name the new sub-message?
one QQ: is the timeout idea also applicable to the "current " non-deferred-created stats, that are always populated, but not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like
SparseStatAllocationfor the name?I don't think the timeout is applicable to current stats, unless we wanted to change the definition of
used()to beused within last X duration.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ALL!
@jmarantz proposed to use DeferredStatOptions in the bootstrap config.
PR updated, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for DeferredStatOptions but I'm OK with SparseStatAllocation as well. I thought we had gone with Deferred in the code in lieu of Lazy, and it would be preferable to be consistent. But I don't feel too strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: format failure on protobuf; I think the placement of the nested message.
https://dev.azure.com/cncf/envoy/_build/results?buildId=140116&view=logs&j=c95d62a5-6bb9-58f0-2a02-7575fd482a15&t=76c9b3e6-0d40-57e9-06f3-2219e9228a0a