-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream TCP connection buffer and read buffer limits (#150). #571
Upstream TCP connection buffer and read buffer limits (#150). #571
Conversation
As with fd58242, but on the upstream cluster side.
99ac310
to
d6d41c9
Compare
@@ -37,6 +38,10 @@ connect_timeout_ms | |||
*(required, integer)* The timeout for new network connections to hosts in the cluster specified | |||
in milliseconds. | |||
|
|||
per_connection_buffer_limit_bytes | |||
*(optional, integer)* Soft limit on size of the cluster's connections read and write buffers. | |||
If unspecified, an implementation defined default is applied (1MB). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/MB/MiB (can you fix this on the listener side also)
@@ -264,6 +264,7 @@ ConnPoolImpl::ActiveClient::ActiveClient(ConnPoolImpl& parent) | |||
parent_.conn_connect_ms_ = | |||
parent_.host_->cluster().stats().upstream_cx_connect_ms_.allocateSpan(); | |||
Upstream::Host::CreateConnectionData data = parent_.host_->createConnection(parent_.dispatcher_); | |||
data.connection_->setReadBufferLimit(parent_.host_->cluster().perConnectionBufferLimitBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty fragile that every caller that calls createConnection() needs to know to then set the read buffer limit. Can we have the createConnection() function do it? The host has access to its cluster and cluster info.
Sorry my approval is premature, I clicked the wrong button. |
Network::ClientConnectionPtr connection = | ||
cluster.sslContext() ? dispatcher.createSslClientConnection(*cluster.sslContext(), address) | ||
: dispatcher.createClientConnection(address); | ||
if (cluster_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall why this function is/was static, but you can use the passed in cluster, you don't need to use cluster_ (or check if nullptr, I don't think it can ever be nullptr).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's nicer, but there's a bug (probably just in tests?) that allows cluster_ to be null:
Program received signal SIGSEGV, Segmentation fault.
0x0000000001dfbbf8 in Upstream::HostImpl::createConnection (this=0x35cc8c0, dispatcher=..., cluster=..., address=...) at /source/source/common/upstream/upstream_impl.cc:36
36 connection->setReadBufferLimit(cluster.perConnectionBufferLimitBytes());
#0 0x0000000001dfbbf8 in Upstream::HostImpl::createConnection (this=0x35cc8c0, dispatcher=..., cluster=..., address=...) at /source/source/common/upstream/upstream_impl.cc:36
#1 0x0000000001dfba6d in Upstream::HostImpl::createConnection (this=0x35cc8c0, dispatcher=...) at /source/source/common/upstream/upstream_impl.cc:27
#2 0x0000000001dd490a in Upstream::HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onInterval (this=0x33f3800) at /source/source/common/upstream/health_checker_impl.cc:227
#3 0x0000000001dd455a in Upstream::HttpHealthCheckerImpl::HttpActiveHealthCheckSession::HttpActiveHealthCheckSession (this=0x33f3800, parent=..., host=...) at /source/source/common/upstream/health_checker_impl.cc:193
#4 0x0000000001dd43d7 in Upstream::HttpHealthCheckerImpl::start (this=0x34b3400) at /source/source/common/upstream/health_checker_impl.cc:186
#5 0x000000000186d2d6 in Upstream::HttpHealthCheckerImplTest_Success_Test::TestBody (this=0x35e6c00) at /source/test/common/upstream/health_checker_impl_test.cc:169
#6 0x000000000205bcd6 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ()
#7 0x0000000002056a18 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ()
#8 0x000000000203d965 in testing::Test::Run() ()
#9 0x000000000203e1f1 in testing::TestInfo::Run() ()
#10 0x000000000203e882 in testing::TestCase::Run() ()
#11 0x000000000204502d in testing::internal::UnitTestImpl::RunAllTests() ()
#12 0x000000000205cdd6 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ()
#13 0x00000000020576da in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ()
#14 0x0000000002043cb7 in testing::UnitTest::Run() ()
#15 0x00000000019845c4 in RUN_ALL_TESTS () at /thirdparty_build/include/gtest/gtest.h:2233
#16 0x0000000001983709 in main (argc=1, argv=0x7fffffffec18) at /source/test/main.cc:21
Starting epoch 0
Looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'm guessing that's just stupid test bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out it was the connection, not cluster that could be null, and that we should still test for.
Network::ClientConnectionPtr connection = | ||
cluster.sslContext() ? dispatcher.createSslClientConnection(*cluster.sslContext(), address) | ||
: dispatcher.createClientConnection(address); | ||
if (connection) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC in the prod code this can't actually return nullptr, so I think this must be due to a test issue? Can we fix the test and avoid the if check here? (Please verify though).
@htuch +1 looks good, but I think there are merge conflicts now. |
Signed-off-by: John Plevyak <[email protected]>
At the moment, we leak all `EnvoyHTTPStreamImpl` and `EnvoyHTTPCallbacks` instances with every stream due to the deliberate retain cycle used to keep the stream in memory and pass callbacks through to the platform layer while the stream is active. This ends up also leaking the callback closures specified by the end consumer, along with anything captured by those closures. To resolve this problem, we will release the strong reference cycle that the stream holds to itself when the stream completes. The state/callback for stream completion has to be stored somewhere in the `ios_context` type and retained since the `ios_context` is a struct and doesn't retain its members. We considered having the `EnvoyHTTPCallbacks` hold a reference to the `EnvoyHTTPStreamImpl` rather than having `EnvoyHTTPStreamImpl` hold a strong reference to itself. However, this posed a few problems: - `EnvoyHTTPCallbacks` is designed to support being shared by multiple streams, and thus cannot have only 1 reference to a stream - `EnvoyHTTPCallbacks` is instantiated by the Swift layer, which means we'd have to leak implementation details of how the stream is kept in memory and shift logic out of the stream implementation With these in mind, we decided to have `EnvoyHTTPStreamImpl` manage its own lifecycle based on the state of the network stream. Signed-off-by: Michael Rebello <[email protected]> Signed-off-by: JP Simard <[email protected]>
At the moment, we leak all `EnvoyHTTPStreamImpl` and `EnvoyHTTPCallbacks` instances with every stream due to the deliberate retain cycle used to keep the stream in memory and pass callbacks through to the platform layer while the stream is active. This ends up also leaking the callback closures specified by the end consumer, along with anything captured by those closures. To resolve this problem, we will release the strong reference cycle that the stream holds to itself when the stream completes. The state/callback for stream completion has to be stored somewhere in the `ios_context` type and retained since the `ios_context` is a struct and doesn't retain its members. We considered having the `EnvoyHTTPCallbacks` hold a reference to the `EnvoyHTTPStreamImpl` rather than having `EnvoyHTTPStreamImpl` hold a strong reference to itself. However, this posed a few problems: - `EnvoyHTTPCallbacks` is designed to support being shared by multiple streams, and thus cannot have only 1 reference to a stream - `EnvoyHTTPCallbacks` is instantiated by the Swift layer, which means we'd have to leak implementation details of how the stream is kept in memory and shift logic out of the stream implementation With these in mind, we decided to have `EnvoyHTTPStreamImpl` manage its own lifecycle based on the state of the network stream. Signed-off-by: Michael Rebello <[email protected]> Signed-off-by: JP Simard <[email protected]>
As with fd58242, but on the upstream cluster side.