Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 1 addition & 13 deletions docs/root/configuration/cluster_manager/cds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,4 @@ clusters depending on what is required.
Statistics
----------

CDS has a statistics tree rooted at *cluster_manager.cds.* with the following statistics:

.. csv-table::
:header: Name, Type, Description
:widths: 1, 1, 2

config_reload, Counter, Total API fetches that resulted in a config reload due to a different config
update_attempt, Counter, Total API fetches attempted
update_success, Counter, Total API fetches completed successfully
update_failure, Counter, Total API fetches that failed because of network errors
update_rejected, Counter, Total API fetches that failed because of schema/validation errors
version, Gauge, Hash of the contents from the last successful API fetch
control_plane.connected_state, Gauge, A boolean (1 for connected and 0 for disconnected) that indicates the current connection state with management server
CDS has a :ref:`statistics <subscription_statistics>` tree rooted at *cluster_manager.cds.*
1 change: 1 addition & 0 deletions docs/root/configuration/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Configuration reference
rate_limit
runtime
statistics
xds_subscription_stats
tools/router_check
overload_manager/overload_manager
secret
Expand Down
15 changes: 2 additions & 13 deletions docs/root/configuration/http_conn_man/rds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,6 @@ fetch its own route configuration via the API.
Statistics
----------

RDS has a statistics tree rooted at *http.<stat_prefix>.rds.<route_config_name>.*.
RDS has a :ref:`statistics <subscription_statistics>` tree rooted at *http.<stat_prefix>.rds.<route_config_name>.*.
Any ``:`` character in the ``route_config_name`` name gets replaced with ``_`` in the
stats tree. The stats tree contains the following statistics:

.. csv-table::
:header: Name, Type, Description
:widths: 1, 1, 2

config_reload, Counter, Total API fetches that resulted in a config reload due to a different config
update_attempt, Counter, Total API fetches attempted
update_success, Counter, Total API fetches completed successfully
update_failure, Counter, Total API fetches that failed because of network errors
update_rejected, Counter, Total API fetches that failed because of schema/validation errors
version, Gauge, Hash of the contents from the last successful API fetch
stats tree.
14 changes: 1 addition & 13 deletions docs/root/configuration/listeners/lds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,4 @@ Configuration
Statistics
----------

LDS has a statistics tree rooted at *listener_manager.lds.* with the following statistics:

.. csv-table::
:header: Name, Type, Description
:widths: 1, 1, 2

config_reload, Counter, Total API fetches that resulted in a config reload due to a different config
update_attempt, Counter, Total API fetches attempted
update_success, Counter, Total API fetches completed successfully
update_failure, Counter, Total API fetches that failed because of network errors
update_rejected, Counter, Total API fetches that failed because of schema/validation errors
version, Gauge, Hash of the contents from the last successful API fetch
control_plane.connected_state, Gauge, A boolean (1 for connected and 0 for disconnected) that indicates the current connection state with management server
LDS has a :ref:`statistics <subscription_statistics>` tree rooted at *listener_manager.lds.*
23 changes: 23 additions & 0 deletions docs/root/configuration/xds_subscription_stats.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
.. _subscription_statistics:

xDS subscription statistics
===========================

Envoy discovers its various dynamic resources via discovery
services referred to as *xDS*. Resources are requested via :ref:`subscriptions <xds_protocol>`,
by specifying a filesystem path to watch, initiating gRPC streams or polling a REST-JSON URL.

The following statistics are generated for all subscriptions.

.. csv-table::
:header: Name, Type, Description
:widths: 1, 1, 2

config_reload, Counter, Total API fetches that resulted in a config reload due to a different config
init_fetch_timeout, Counter, Total :ref:`initial fetch timeouts <envoy_api_field_core.ConfigSource.initial_fetch_timeout>`
update_attempt, Counter, Total API fetches attempted
update_success, Counter, Total API fetches completed successfully
update_failure, Counter, Total API fetches that failed because of network errors
update_rejected, Counter, Total API fetches that failed because of schema/validation errors
version, Gauge, Hash of the contents from the last successful API fetch
control_plane.connected_state, Gauge, A boolean (1 for connected and 0 for disconnected) that indicates the current connection state with management server
1 change: 1 addition & 0 deletions docs/root/intro/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Version history
* config: added access log :ref:`extension filter<envoy_api_field_config.filter.accesslog.v2.AccessLogFilter.extension_filter>`.
* config: async data access for local and remote data source.
* config: changed the default value of :ref:`initial_fetch_timeout <envoy_api_field_core.ConfigSource.initial_fetch_timeout>` from 0s to 15s. This is a change in behaviour in the sense that Envoy will move to the next initialization phase, even if the first config is not delivered in 15s. Refer to :ref:`initialization process <arch_overview_initialization>` for more details.
* config: added stat :ref:`init_fetch_timeout <config_cluster_manager_cds>`.
* fault: added overrides for default runtime keys in :ref:`HTTPFault <envoy_api_msg_config.filter.http.fault.v2.HTTPFault>` filter.
* grpc-json: added support for :ref:`ignoring unknown query parameters<envoy_api_field_config.filter.http.transcoder.v2.GrpcJsonTranscoder.ignore_unknown_query_parameters>`.
* http: added the ability to reject HTTP/1.1 requests with invalid HTTP header values, using the runtime feature `envoy.reloadable_features.strict_header_validation`.
Expand Down
1 change: 1 addition & 0 deletions include/envoy/config/subscription.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ using SubscriptionPtr = std::unique_ptr<Subscription>;
* Per subscription stats. @see stats_macros.h
*/
#define ALL_SUBSCRIPTION_STATS(COUNTER, GAUGE) \
COUNTER(init_fetch_timeout) \
COUNTER(update_attempt) \
COUNTER(update_failure) \
COUNTER(update_rejected) \
Expand Down
1 change: 1 addition & 0 deletions source/common/config/delta_subscription_state.cc
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ DeltaSubscriptionState::DeltaSubscriptionState(const std::string& type_url,
void DeltaSubscriptionState::setInitFetchTimeout(Event::Dispatcher& dispatcher) {
if (init_fetch_timeout_.count() > 0 && !init_fetch_timeout_timer_) {
init_fetch_timeout_timer_ = dispatcher.createTimer([this]() -> void {
stats_.init_fetch_timeout_.inc();
ENVOY_LOG(warn, "delta config: initial fetch timed out for {}", type_url_);
callbacks_.onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::FetchTimedout,
nullptr);
Expand Down
1 change: 1 addition & 0 deletions source/common/config/grpc_mux_subscription_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ void GrpcMuxSubscriptionImpl::onConfigUpdateFailed(ConfigUpdateFailureReason rea
ENVOY_LOG(debug, "gRPC update for {} failed", type_url_);
break;
case Envoy::Config::ConfigUpdateFailureReason::FetchTimedout:
stats_.init_fetch_timeout_.inc();
disableInitFetchTimeoutTimer();
ENVOY_LOG(warn, "gRPC config: initial fetch timed out for {}", type_url_);
break;
Expand Down
1 change: 1 addition & 0 deletions source/common/config/http_subscription_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ void HttpSubscriptionImpl::start(const std::set<std::string>& resource_names) {
if (init_fetch_timeout_.count() > 0) {
init_fetch_timeout_timer_ = dispatcher_.createTimer([this]() -> void {
ENVOY_LOG(warn, "REST config: initial fetch timed out for", path_);
stats_.init_fetch_timeout_.inc();
callbacks_.onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::FetchTimedout,
nullptr);
});
Expand Down
8 changes: 4 additions & 4 deletions test/common/config/filesystem_subscription_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,20 +18,20 @@ class FilesystemSubscriptionImplTest : public testing::Test,
// Validate that the client can recover from bad JSON responses.
TEST_F(FilesystemSubscriptionImplTest, BadJsonRecovery) {
startSubscription({"cluster0", "cluster1"});
EXPECT_TRUE(statsAre(1, 0, 0, 0, 0));
EXPECT_TRUE(statsAre(1, 0, 0, 0, 0, 0));
EXPECT_CALL(callbacks_,
onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::ConnectionFailure, _));
updateFile(";!@#badjso n");
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0, 0));
deliverConfigUpdate({"cluster0", "cluster1"}, "0", true);
EXPECT_TRUE(statsAre(3, 1, 0, 1, 7148434200721666028));
EXPECT_TRUE(statsAre(3, 1, 0, 1, 0, 7148434200721666028));
}

// Validate that a file that is initially available results in a successful update.
TEST_F(FilesystemSubscriptionImplTest, InitialFile) {
updateFile("{\"versionInfo\": \"0\", \"resources\": []}", false);
startSubscription({"cluster0", "cluster1"});
EXPECT_TRUE(statsAre(1, 1, 0, 0, 7148434200721666028));
EXPECT_TRUE(statsAre(1, 1, 0, 0, 0, 7148434200721666028));
}

// Validate that if we fail to set a watch, we get a sensible warning.
Expand Down
5 changes: 3 additions & 2 deletions test/common/config/filesystem_subscription_test_harness.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,11 @@ class FilesystemSubscriptionTestHarness : public SubscriptionTestHarness {
}

AssertionResult statsAre(uint32_t attempt, uint32_t success, uint32_t rejected, uint32_t failure,
uint64_t version) override {
uint32_t init_fetch_timeout, uint64_t version) override {
// The first attempt always fail unless there was a file there to begin with.
return SubscriptionTestHarness::statsAre(attempt, success, rejected,
failure + (file_at_start_ ? 0 : 1), version);
failure + (file_at_start_ ? 0 : 1), init_fetch_timeout,
version);
}

void expectConfigUpdateFailed() override {
Expand Down
24 changes: 12 additions & 12 deletions test/common/config/grpc_subscription_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ TEST_F(GrpcSubscriptionImplTest, StreamCreationFailure) {
EXPECT_CALL(random_, random());
EXPECT_CALL(*timer_, enableTimer(_));
subscription_->start({"cluster0", "cluster1"});
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0, 0));
// Ensure this doesn't cause an issue by sending a request, since we don't
// have a gRPC stream.
subscription_->updateResources({"cluster2"});
Expand All @@ -30,50 +30,50 @@ TEST_F(GrpcSubscriptionImplTest, StreamCreationFailure) {

expectSendMessage({"cluster2"}, "");
timer_cb_();
EXPECT_TRUE(statsAre(3, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(3, 0, 0, 1, 0, 0));
verifyControlPlaneStats(1);
}

// Validate that the client can recover from a remote stream closure via retry.
TEST_F(GrpcSubscriptionImplTest, RemoteStreamClose) {
startSubscription({"cluster0", "cluster1"});
EXPECT_TRUE(statsAre(1, 0, 0, 0, 0));
EXPECT_TRUE(statsAre(1, 0, 0, 0, 0, 0));
EXPECT_CALL(callbacks_,
onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::ConnectionFailure, _));
EXPECT_CALL(*timer_, enableTimer(_));
EXPECT_CALL(random_, random());
subscription_->grpcMux().grpcStreamForTest().onRemoteClose(Grpc::Status::GrpcStatus::Canceled,
"");
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0, 0));
verifyControlPlaneStats(0);

// Retry and succeed.
EXPECT_CALL(*async_client_, startRaw(_, _, _)).WillOnce(Return(&async_stream_));
expectSendMessage({"cluster0", "cluster1"}, "");
timer_cb_();
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0, 0));
}

// Validate that When the management server gets multiple requests for the same version, it can
// ignore later ones. This allows the nonce to be used.
TEST_F(GrpcSubscriptionImplTest, RepeatedNonce) {
InSequence s;
startSubscription({"cluster0", "cluster1"});
EXPECT_TRUE(statsAre(1, 0, 0, 0, 0));
EXPECT_TRUE(statsAre(1, 0, 0, 0, 0, 0));
// First with the initial, empty version update to "0".
updateResources({"cluster2"});
EXPECT_TRUE(statsAre(2, 0, 0, 0, 0));
EXPECT_TRUE(statsAre(2, 0, 0, 0, 0, 0));
deliverConfigUpdate({"cluster0", "cluster2"}, "0", false);
EXPECT_TRUE(statsAre(3, 0, 1, 0, 0));
EXPECT_TRUE(statsAre(3, 0, 1, 0, 0, 0));
deliverConfigUpdate({"cluster0", "cluster2"}, "0", true);
EXPECT_TRUE(statsAre(4, 1, 1, 0, 7148434200721666028));
EXPECT_TRUE(statsAre(4, 1, 1, 0, 0, 7148434200721666028));
// Now with version "0" update to "1".
updateResources({"cluster3"});
EXPECT_TRUE(statsAre(5, 1, 1, 0, 7148434200721666028));
EXPECT_TRUE(statsAre(5, 1, 1, 0, 0, 7148434200721666028));
deliverConfigUpdate({"cluster3"}, "1", false);
EXPECT_TRUE(statsAre(6, 1, 2, 0, 7148434200721666028));
EXPECT_TRUE(statsAre(6, 1, 2, 0, 0, 7148434200721666028));
deliverConfigUpdate({"cluster3"}, "1", true);
EXPECT_TRUE(statsAre(7, 2, 2, 0, 13237225503670494420U));
EXPECT_TRUE(statsAre(7, 2, 2, 0, 0, 13237225503670494420U));
}

} // namespace
Expand Down
20 changes: 10 additions & 10 deletions test/common/config/http_subscription_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ TEST_F(HttpSubscriptionImplTest, OnRequestReset) {
EXPECT_CALL(callbacks_,
onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::ConnectionFailure, _));
http_callbacks_->onFailure(Http::AsyncClient::FailureReason::Reset);
EXPECT_TRUE(statsAre(1, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(1, 0, 0, 1, 0, 0));
timerTick();
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0, 0));
deliverConfigUpdate({"cluster0", "cluster1"}, "0", true);
EXPECT_TRUE(statsAre(3, 1, 0, 1, 7148434200721666028));
EXPECT_TRUE(statsAre(3, 1, 0, 1, 0, 7148434200721666028));
}

// Validate that the client can recover from bad JSON responses.
Expand All @@ -36,28 +36,28 @@ TEST_F(HttpSubscriptionImplTest, BadJsonRecovery) {
EXPECT_CALL(callbacks_,
onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::ConnectionFailure, _));
http_callbacks_->onSuccess(std::move(message));
EXPECT_TRUE(statsAre(1, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(1, 0, 0, 1, 0, 0));
request_in_progress_ = false;
timerTick();
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0));
EXPECT_TRUE(statsAre(2, 0, 0, 1, 0, 0));
deliverConfigUpdate({"cluster0", "cluster1"}, "0", true);
EXPECT_TRUE(statsAre(3, 1, 0, 1, 7148434200721666028));
EXPECT_TRUE(statsAre(3, 1, 0, 1, 0, 7148434200721666028));
}

TEST_F(HttpSubscriptionImplTest, ConfigNotModified) {
startSubscription({"cluster0", "cluster1"});

EXPECT_TRUE(statsAre(1, 0, 0, 0, 0));
EXPECT_TRUE(statsAre(1, 0, 0, 0, 0, 0));
timerTick();
EXPECT_TRUE(statsAre(2, 0, 0, 0, 0));
EXPECT_TRUE(statsAre(2, 0, 0, 0, 0, 0));

// accept and modify.
deliverConfigUpdate({"cluster0", "cluster1"}, "0", true, true, "200");
EXPECT_TRUE(statsAre(3, 1, 0, 0, 7148434200721666028));
EXPECT_TRUE(statsAre(3, 1, 0, 0, 0, 7148434200721666028));

// accept and does not modify.
deliverConfigUpdate({"cluster0", "cluster1"}, "0", true, false, "304");
EXPECT_TRUE(statsAre(4, 1, 0, 0, 7148434200721666028));
EXPECT_TRUE(statsAre(4, 1, 0, 0, 0, 7148434200721666028));
}

} // namespace
Expand Down
Loading