Skip to content

Cleanup idle connection pools (disabled by default)#17403

Merged
ggreenway merged 16 commits intoenvoyproxy:mainfrom
ggreenway:rerevert-conn-pool-cleanup
Jul 26, 2021
Merged

Cleanup idle connection pools (disabled by default)#17403
ggreenway merged 16 commits intoenvoyproxy:mainfrom
ggreenway:rerevert-conn-pool-cleanup

Conversation

@ggreenway
Copy link
Member

Delete connection pools when they have no connections anymore. This
fixes unbounded memory use for cases where a new connection pool is
needed for each downstream connection, such as when using upstream
PROXY protocol.

Fixes #16682

This reverts commit b7bc539.
This reverts PR #17319, by re-adding #17302 and #16948.

Signed-off-by: Greg Greenway ggreenway@apple.com
Co-authored-by: Craig Radcliffe craig.radcliffe@broadcom.com

Risk Level: High! Connection pools used to live until the cluster was destroyed. If anything is caching a pool that now gets deleted sooner, this could crash, corrupt memory, etc.
Testing: Existing and new tests pass
Docs Changes: Updated
Release Notes: Added
Platform Specific Features: None
Runtime guard: envoy.reloadable_features.conn_pool_delete_when_idle
[Optional Deprecated:]
[Optional API Considerations:]

Delete connection pools when they have no connections anymore. This
fixes unbounded memory use for cases where a new connection pool is
needed for each downstream connection, such as when using upstream
PROXY protocol.

Fixes envoyproxy#16682

This reverts commit b7bc539.
This reverts PR envoyproxy#17319, by re-adding envoyproxy#17302 and envoyproxy#16948.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
@ggreenway
Copy link
Member Author

@bianpengyuan were you able to figure out any more information about the crash you saw here?

@ggreenway
Copy link
Member Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17403 (comment) was created by @ggreenway.

see: more, trace.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one high level comment first - what do you think of landing this default off, making sure it sticks, and then flipping it on after a week or so? If we've verified that the runtime guard works we can simply roll back the runtime default if there are other issues discovered.

fix test failures
add new integration test

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
avoid tsan issues.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid, though (I'll be the first to admit it's possible there are more bugs that I wouldn't catch with visual inspection =P)

Signed-off-by: Greg Greenway <ggreenway@apple.com>
…cleanup

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway ggreenway changed the title Cleanup idle connection pools Cleanup idle connection pools (disable by default) Jul 22, 2021
@ggreenway ggreenway changed the title Cleanup idle connection pools (disable by default) Cleanup idle connection pools (disabled by default) Jul 22, 2021
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@rgs1
Copy link
Member

rgs1 commented Jul 24, 2021

We can happily smoke test this, if that's still needed (given it's now disabled by default).

@ggreenway ggreenway merged commit 3bd1db4 into envoyproxy:main Jul 26, 2021
ggreenway added a commit to ggreenway/envoy that referenced this pull request Jul 28, 2021
The idle callback was incorrectly going through the
ThreadLocalClusterManagerImpl::ClusterEntry to reach the
ThreadLocalClusterManager. The pool is owned by the
ThreadLocalClusterManager, not by the ClusterEntry, so the reference
to the ClusterEntry could be dangling, resulting in a crash.

fixes commit envoyproxy#17403

Signed-off-by: Greg Greenway <ggreenway@apple.com>
ggreenway added a commit that referenced this pull request Jul 28, 2021
#17522)

The idle callback was incorrectly going through the
ThreadLocalClusterManagerImpl::ClusterEntry to reach the
ThreadLocalClusterManager. The pool is owned by the
ThreadLocalClusterManager, not by the ClusterEntry, so the reference
to the ClusterEntry could be dangling, resulting in a crash.

fixes commit #17403

Signed-off-by: Greg Greenway <ggreenway@apple.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
Delete connection pools when they have no connections anymore. This
fixes unbounded memory use for cases where a new connection pool is
needed for each downstream connection, such as when using upstream
PROXY protocol.

This reverts commit b7bc539.
This reverts PR envoyproxy#17319, by re-adding envoyproxy#17302 and envoyproxy#16948.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Co-authored-by: Craig Radcliffe <craig.radcliffe@broadcom.com>
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
envoyproxy#17522)

The idle callback was incorrectly going through the
ThreadLocalClusterManagerImpl::ClusterEntry to reach the
ThreadLocalClusterManager. The pool is owned by the
ThreadLocalClusterManager, not by the ClusterEntry, so the reference
to the ClusterEntry could be dangling, resulting in a crash.

fixes commit envoyproxy#17403

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upstream PROXY protocol results in unbound number of connection pools

4 participants