Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
f143d0c
conn_pool: support max stream duration for upstream connections
esmet Feb 17, 2021
4e8e73d
Formatting
esmet Feb 17, 2021
9d7d810
Only drain non-closed connections
esmet Feb 17, 2021
0964e31
Pipe down maxConnectionDuration from CommonHttpProtocolOptions
esmet Feb 18, 2021
7909f3f
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Aug 31, 2021
301e2a0
Fix proto comments
esmet Aug 31, 2021
4a776fe
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Aug 31, 2021
d61139e
Fix ClusterInfo to contain maxConnectionDuration
esmet Aug 31, 2021
8276bb6
Add tests
esmet Aug 31, 2021
51ecc21
Fix
esmet Sep 1, 2021
3abf8cd
Add back some bits
esmet Sep 2, 2021
e65ce1d
Fix changelog
esmet Sep 2, 2021
be1493b
Update max connection duration tests to use a real dispatcher and sim…
esmet Sep 6, 2021
6934dff
Fix format
esmet Sep 6, 2021
7976234
Add back two expects
esmet Sep 6, 2021
e0b793a
Remove the addition of closed_, since the lifetime of that bit is not…
esmet Sep 7, 2021
adf76d8
Fix format
esmet Sep 7, 2021
53963b4
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Sep 7, 2021
5ce63a6
Fix test
esmet Sep 7, 2021
35e6e70
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Sep 9, 2021
477d857
Factor out common test functionality.
esmet Sep 21, 2021
34eb063
Continue to factor out test mechanics from test cases
esmet Sep 21, 2021
e87d39c
Formatting
esmet Sep 21, 2021
0c4a16c
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Sep 21, 2021
a4aecec
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Sep 23, 2021
5b155f8
Fix typo
esmet Sep 23, 2021
90391aa
Refactor upstream_impl_test.cc
esmet Sep 23, 2021
f676853
Test improvements
esmet Sep 23, 2021
374e1c6
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Sep 24, 2021
24afd92
Fix changelog ordering
esmet Sep 24, 2021
e29ead0
Style comments
esmet Sep 26, 2021
c8882f2
FOrmatting
esmet Sep 26, 2021
8605363
ASSERT -> ENVOY_BUG
esmet Sep 27, 2021
e805385
Add doc for upstream_cx_max_duration_exceeded
esmet Sep 27, 2021
05e4599
Update tests
esmet Sep 27, 2021
59fccef
Add field reflink to changelog. Clarify docs to mention that drain_ti…
esmet Sep 28, 2021
1e95ac2
Fix spelling
esmet Sep 28, 2021
c3d16f1
Fix spacing
esmet Sep 28, 2021
4ab0322
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Sep 29, 2021
54117b5
Naming fixups
esmet Sep 30, 2021
187d823
Formatting
esmet Sep 30, 2021
ce6fa32
More formatting
esmet Sep 30, 2021
8637e46
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Oct 7, 2021
097525a
Fix merge on changelog
esmet Oct 7, 2021
0c64dcb
Add a basic integration test
esmet Oct 7, 2021
c0f413d
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Oct 7, 2021
230a177
Fix format
esmet Oct 7, 2021
926989c
Remove unused boilerplate
esmet Oct 7, 2021
8b99184
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Oct 7, 2021
5b23fb5
Fix changelog merge
esmet Oct 7, 2021
6474914
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Oct 13, 2021
ca9fa76
Fix changelog
esmet Oct 13, 2021
2bbef6c
Merge remote-tracking branch 'upstream/main' into max-upstream-connec…
esmet Oct 14, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions api/envoy/config/core/v3/protocol.proto
Original file line number Diff line number Diff line change
Expand Up @@ -157,11 +157,10 @@ message HttpProtocolOptions {

// The maximum duration of a connection. The duration is defined as a period since a connection
// was established. If not set, there is no max duration. When max_connection_duration is reached
// and if there are no active streams, the connection will be closed. If there are any active streams,
// the drain sequence will kick-in, and the connection will be force-closed after the drain period.
// See :ref:`drain_timeout
// and if there are no active streams, the connection will be closed. If the connection is a
// downstream connection and there are any active streams, the drain sequence will kick-in,
// and the connection will be force-closed after the drain period. See :ref:`drain_timeout
// <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.drain_timeout>`.
// Note: This feature is not yet implemented for the upstream connections.
google.protobuf.Duration max_connection_duration = 3;

// The maximum number of headers. If unconfigured, the default
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Every cluster has a statistics tree rooted at *cluster.<name>.* with the followi
upstream_cx_connect_fail, Counter, Total connection failures
upstream_cx_connect_timeout, Counter, Total connection connect timeouts
upstream_cx_idle_timeout, Counter, Total connection idle timeouts
upstream_cx_max_duration_reached, Counter, Total connections closed due to max duration reached
upstream_cx_connect_attempts_exceeded, Counter, Total consecutive connection failures exceeding configured connection attempts
upstream_cx_overflow, Counter, Total times that the cluster's connection circuit breaker overflowed
upstream_cx_connect_ms, Histogram, Connection establishment milliseconds
Expand Down
24 changes: 13 additions & 11 deletions docs/root/faq/configuration/timeouts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,18 +32,20 @@ Connection timeouts apply to the entire HTTP connection and all streams the conn

* The HTTP protocol :ref:`max_connection_duration <envoy_v3_api_field_config.core.v3.HttpProtocolOptions.max_connection_duration>`
is defined in a generic message used by both the HTTP connection manager as well as upstream cluster
HTTP connections but is currently only implemented for the downstream connections. The maximum
connection duration is the time after which a downstream connection will be drained and/or closed,
starting from when it first got established. If there are no active streams, the connection will be
closed. If there are any active streams, the drain sequence will kick-in, and the connection will be
force-closed after the drain period. The default value of max connection duration is *0* or unlimited,
which means that the connections will never be closed due to aging. It could be helpful in scenarios
when you are running a pool of Envoy edge-proxies and would want to close a downstream connection after
some time to prevent sticky-ness. It could also help to better load balance the overall traffic among
this pool, especially if the size of this pool is dynamically changing. To modify the max connection
duration for downstream connections use the
HTTP connections. The maximum connection duration is the time after which a downstream or upstream
connection will be drained and/or closed, starting from when it was first established. If there are no
active streams, the connection will be closed. If there are any active streams, the drain sequence will
kick-in, and the connection will be force-closed after the drain period. The default value of max connection
duration is *0* or unlimited, which means that the connections will never be closed due to aging. It could
be helpful in scenarios when you are running a pool of Envoy edge-proxies and would want to close a
downstream connection after some time to prevent stickiness. It could also help to better load balance the
overall traffic among this pool, especially if the size of this pool is dynamically changing. Finally, it
may help with upstream connections when using a DNS name whose resolved addresses may change even if the
upstreams stay healthly. Forcing a maximum upstream lifetime in this scenario prevents holding onto healthy
connections even after they would otherwise be undiscoverable. To modify the max connection duration for downstream connections use the
:ref:`common_http_protocol_options <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.common_http_protocol_options>`
field in the HTTP connection manager configuration.
field in the HTTP connection manager configuration. To modify the max connection duration for upstream connections use the
:ref:`common_http_protocol_options <envoy_v3_api_field_config.cluster.v3.Cluster.common_http_protocol_options>` field in the cluster configuration.

See :ref:`below <faq_configuration_timeouts_transport_socket>` for other connection timeouts.

Expand Down
1 change: 1 addition & 0 deletions docs/root/version_history/current.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ New Features
* ext_authz: added :ref:`query_parameters_to_set <envoy_v3_api_field_service.auth.v3.OkHttpResponse.query_parameters_to_set>` and :ref:`query_parameters_to_remove <envoy_v3_api_field_service.auth.v3.OkHttpResponse.query_parameters_to_remove>` for adding and removing query string parameters when using a gRPC authorization server.
* http: added support for :ref:`retriable health check status codes <envoy_v3_api_field_config.core.v3.HealthCheck.HttpHealthCheck.retriable_statuses>`.
* thrift_proxy: add upstream response zone metrics in the form ``cluster.cluster_name.zone.local_zone.upstream_zone.thrift.upstream_resp_success``.
* upstream: added the ability to :ref:`configure max connection duration <envoy_v3_api_field_config.core.v3.HttpProtocolOptions.max_connection_duration>` for upstream clusters.
* vcl_socket_interface: added VCL socket interface extension for fd.io VPP integration to :ref:`contrib images <install_contrib>`. This can be enabled via :ref:`VCL <envoy_v3_api_msg_extensions.vcl.v3alpha.VclSocketInterface>` configuration.

Deprecated
Expand Down
6 changes: 6 additions & 0 deletions envoy/upstream/upstream.h
Original file line number Diff line number Diff line change
Expand Up @@ -553,6 +553,7 @@ class PrioritySet {
COUNTER(upstream_cx_http2_total) \
COUNTER(upstream_cx_http3_total) \
COUNTER(upstream_cx_idle_timeout) \
COUNTER(upstream_cx_max_duration_reached) \
COUNTER(upstream_cx_max_requests) \
COUNTER(upstream_cx_none_healthy) \
COUNTER(upstream_cx_overflow) \
Expand Down Expand Up @@ -744,6 +745,11 @@ class ClusterInfo {
*/
virtual const absl::optional<std::chrono::milliseconds> idleTimeout() const PURE;

/**
* @return optional maximum connection duration timeout for manager connections.
*/
virtual const absl::optional<std::chrono::milliseconds> maxConnectionDuration() const PURE;

/**
* @return how many streams should be anticipated per each current stream.
*/
Expand Down
49 changes: 48 additions & 1 deletion source/common/conn_pool/conn_pool_base.cc
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,16 @@ void ConnPoolImplBase::onConnectionEvent(ActiveClient& client, absl::string_view
// this forces part of its cleanup to happen now.
client.releaseResources();

// Again, since we know this object is going to be deferredDelete'd(), we take
// this opportunity to disable and reset the connection duration timer so that
// it doesn't trigger while on the deferred delete list. In theory it is safe
// to handle the CLOSED state in onConnectionDurationTimeout, but we handle
// it here for simplicity and safety anyway.
if (client.connection_duration_timer_) {
client.connection_duration_timer_->disableTimer();
client.connection_duration_timer_.reset();
}

dispatcher_.deferredDelete(client.removeFromList(owningList(client.state())));

checkForIdleAndCloseIdleConnsIfDraining();
Expand All @@ -473,6 +483,15 @@ void ConnPoolImplBase::onConnectionEvent(ActiveClient& client, absl::string_view
ASSERT(client.state() == ActiveClient::State::CONNECTING);
transitionActiveClientState(client, ActiveClient::State::READY);

// Now that the active client is ready, set up a timer for max connection duration.
const absl::optional<std::chrono::milliseconds> max_connection_duration =
client.parent_.host()->cluster().maxConnectionDuration();
if (max_connection_duration.has_value()) {
client.connection_duration_timer_ = client.parent_.dispatcher().createTimer(
[&client]() { client.onConnectionDurationTimeout(); });
client.connection_duration_timer_->enableTimer(max_connection_duration.value());
}

// At this point, for the mixed ALPN pool, the client may be deleted. Do not
// refer to client after this point.
onConnected(client);
Expand Down Expand Up @@ -562,7 +581,7 @@ ActiveClient::ActiveClient(ConnPoolImplBase& parent, uint32_t lifetime_stream_li
uint32_t concurrent_stream_limit)
: parent_(parent), remaining_streams_(translateZeroToUnlimited(lifetime_stream_limit)),
concurrent_stream_limit_(translateZeroToUnlimited(concurrent_stream_limit)),
connect_timer_(parent_.dispatcher().createTimer([this]() -> void { onConnectTimeout(); })) {
connect_timer_(parent_.dispatcher().createTimer([this]() { onConnectTimeout(); })) {
conn_connect_ms_ = std::make_unique<Stats::HistogramCompletableTimespanImpl>(
parent_.host()->cluster().stats().upstream_cx_connect_ms_, parent_.dispatcher().timeSource());
conn_length_ = std::make_unique<Stats::HistogramCompletableTimespanImpl>(
Expand Down Expand Up @@ -596,6 +615,34 @@ void ActiveClient::onConnectTimeout() {
close();
}

void ActiveClient::onConnectionDurationTimeout() {
// The connection duration timer should only have started after we left the CONNECTING state.
ENVOY_BUG(state_ != ActiveClient::State::CONNECTING,
"max connection duration reached while connecting");

// The connection duration timer should have been disabled and reset in onConnectionEvent
// for closing connections.
ENVOY_BUG(state_ != ActiveClient::State::CLOSED, "max connection duration reached while closed");

// There's nothing to do if the client is connecting, closed or draining.
// Two of these cases are bugs (see above), but it is safe to no-op either way.
if (state_ == ActiveClient::State::CONNECTING || state_ == ActiveClient::State::CLOSED ||
state_ == ActiveClient::State::DRAINING) {
return;
}

ENVOY_CONN_LOG(debug, "max connection duration reached, DRAINING", *this);
parent_.host()->cluster().stats().upstream_cx_max_duration_reached_.inc();
parent_.transitionActiveClientState(*this, Envoy::ConnectionPool::ActiveClient::State::DRAINING);

// Close out the draining client if we no longer have active streams.
// We have to do this here because there won't be an onStreamClosed (because there are
// no active streams) to do it for us later.
if (numActiveStreams() == 0) {
close();
}
}

void ActiveClient::drain() {
if (currentUnusedCapacity() <= 0) {
return;
Expand Down
4 changes: 4 additions & 0 deletions source/common/conn_pool/conn_pool_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ class ActiveClient : public LinkedObject<ActiveClient>,
// Called if the connection does not complete within the cluster's connectTimeout()
void onConnectTimeout();

// Called if the maximum connection duration is reached.
void onConnectionDurationTimeout();

// Returns the concurrent stream limit, accounting for if the total stream limit
// is less than the concurrent stream limit.
uint32_t effectiveConcurrentStreamLimit() const {
Expand Down Expand Up @@ -105,6 +108,7 @@ class ActiveClient : public LinkedObject<ActiveClient>,
Stats::TimespanPtr conn_connect_ms_;
Stats::TimespanPtr conn_length_;
Event::TimerPtr connect_timer_;
Event::TimerPtr connection_duration_timer_;
bool resources_released_{false};
bool timed_out_{false};

Expand Down
10 changes: 10 additions & 0 deletions source/common/upstream/upstream_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -921,6 +921,16 @@ ClusterInfoImpl::ClusterInfoImpl(
idle_timeout_ = std::chrono::hours(1);
}

if (http_protocol_options_->common_http_protocol_options_.has_max_connection_duration()) {
max_connection_duration_ = std::chrono::milliseconds(DurationUtil::durationToMilliseconds(
http_protocol_options_->common_http_protocol_options_.max_connection_duration()));
if (max_connection_duration_.value().count() == 0) {
max_connection_duration_ = absl::nullopt;
}
} else {
max_connection_duration_ = absl::nullopt;
}

if (config.has_eds_cluster_config()) {
if (config.type() != envoy::config::cluster::v3::Cluster::EDS) {
throw EnvoyException("eds_cluster_config set in a non-EDS cluster");
Expand Down
4 changes: 4 additions & 0 deletions source/common/upstream/upstream_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,9 @@ class ClusterInfoImpl : public ClusterInfo,
const absl::optional<std::chrono::milliseconds> idleTimeout() const override {
return idle_timeout_;
}
const absl::optional<std::chrono::milliseconds> maxConnectionDuration() const override {
return max_connection_duration_;
}
float perUpstreamPreconnectRatio() const override { return per_upstream_preconnect_ratio_; }
float peekaheadRatio() const override { return peekahead_ratio_; }
uint32_t perConnectionBufferLimitBytes() const override {
Expand Down Expand Up @@ -769,6 +772,7 @@ class ClusterInfoImpl : public ClusterInfo,
const uint32_t max_response_headers_count_;
const std::chrono::milliseconds connect_timeout_;
absl::optional<std::chrono::milliseconds> idle_timeout_;
absl::optional<std::chrono::milliseconds> max_connection_duration_;
const float per_upstream_preconnect_ratio_;
const float peekahead_ratio_;
const uint32_t per_connection_buffer_limit_bytes_;
Expand Down
Loading