tcp_proxy: wait for CONNECT response before start streaming data#14317
tcp_proxy: wait for CONNECT response before start streaming data#14317mattklein123 merged 18 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
|
Refers to #13293 (comment) |
|
cc. @wez470 @alyssawilk |
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
|
The solution proposed here has to be considered as a base for discussion. I think this is not a good approach, but I'm still looking for a solution that does not require too much refactoring. Any idea is welcome. |
|
yeah, stashing the params (maybe in a wrapper struct) seems like the only thing I can think of either. |
ok, I'll proceed this way then ;-) |
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
alyssawilk
left a comment
There was a problem hiding this comment.
This looks great - I love how self contained it ended up being.
| // Wait a bit, no data should go through. | ||
| ASSERT_FALSE(upstream_request_->waitForData(*dispatcher_, 1, std::chrono::milliseconds(100))); | ||
|
|
||
| upstream_request_->encodeHeaders(default_response_headers_, false); |
There was a problem hiding this comment.
I think we could have a test for failure modes too - either a disconnect and/or not 200-ok headers.
There was a problem hiding this comment.
Added test for non 200 response
| @@ -119,6 +119,7 @@ void HttpUpstream::resetEncoder(Network::ConnectionEvent event, bool inform_down | |||
| if (inform_downstream) { | |||
| upstream_callbacks_.onEvent(event); | |||
There was a problem hiding this comment.
can we end up having an event callback "from the upstream connection" before there is a connection established. That's a bit weird. I think it might be cleaner if inform_downstream is true, that if we have a deferer we do pool failure, and if not we do the onEvent. WDYT?
Thx @alyssawilk, I'm trying to avoid having the cyclical ownership issue with: https://github.com/irozzo-1A/envoy/blob/d867f50fcf75e5c6efc15826e2c9e7534680e21f/source/common/tcp_proxy/upstream.h#L178-L180 After that, I'll address your remarks ;-) |
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
| @@ -129,6 +157,9 @@ class HttpUpstream : public GenericUpstream, protected Http::StreamCallbacks { | |||
| void decodeHeaders(Http::ResponseHeaderMapPtr&& headers, bool end_stream) override { | |||
| if (!parent_.isValidResponse(*headers) || end_stream) { | |||
| parent_.resetEncoder(Network::ConnectionEvent::LocalClose); | |||
There was a problem hiding this comment.
I think we still have a weird corner case where we resetEncoder where if "inform_downstream" is true, we send an upstream event downstream before the pool knows it has an upstream associated.
I think we could simplify this by dropping onGenericPoolFailure below, and instead in resetEncoder,
if(inform_downstream) {
if (deferrer) {
deferrer.onGenericPoolFailure()
} else {
former logic.
}
}
I think we always reset the encoder on failure (be it decodeHeaders failing, or disconnect) and then resetEncoder would take care of making it look like an event, or like a pool failure. WDYT? It's definitely worth an integration test and/or unit tests since it's tricky timing.
There was a problem hiding this comment.
yep, I saw your previous comment and I was still figuring out how to do it. I pushed what I did so far, not tested yet. If you have the time to take a look let me know if I'm on the good path ;-)
alyssawilk
left a comment
There was a problem hiding this comment.
Almost there, and thanks for adding the new test!
I'm on vacation starting today through the new year, so I'm going to pass this off to snow/Matt. I hopefully explained the corner case enough they can help you for the few days until the envoy project goes 'on vacation' but if not I'll pick it up when I get back!
|
oh, and given the change to the data plane, we should probably runtime guard this (see CONTRIBUTING.md). Sorry, should have called that out before. |
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Thx for the info @alyssawilk, enjoy your vacation ;-)
No worries, I'll take a look at this. |
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
…t-for-connect-response Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
…t-for-connect-response Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
|
@alyssawilk I think I addressed your points. At the moment the feature flag is enabled by default, let me know if you prefer the opposite. |
alyssawilk
left a comment
There was a problem hiding this comment.
Looks great! Just a few more thoughts and I think you're good to go.
| CONNECT request, and a second one listening on 10001, stripping the CONNECT headers, and forwarding the | ||
| original TCP upstream, in this case to google.com. | ||
|
|
||
| When runtime flag ``envoy.reloadable_features.http_upstream_wait_connect_response`` is set to ``true``, Envoy waits for |
There was a problem hiding this comment.
I'd suggest removing mentions of the flag and just saying that Envoy will wait.
There was a problem hiding this comment.
s/wait/now waits/
the envoy. -> the runtime guard envoy.
| ASSERT_TRUE(upstream_request_->waitForHeadersComplete()); | ||
|
|
||
| // Wait a bit, no data should go through. | ||
| ASSERT_FALSE(upstream_request_->waitForData(*dispatcher_, 1, std::chrono::milliseconds(100))); |
There was a problem hiding this comment.
is it possible to move this down, to ensure no data is read before or after the bad headers?
Also I'm surprised this passes given you're running it with and without your change. Shouldn't it fail without your change?
There was a problem hiding this comment.
is it possible to move this down, to ensure no data is read before or after the bad headers?
Sure, makes sense.
Also I'm surprised this passes given you're running it with and without your change. Shouldn't it fail without your change?
Actually, I'm running it with my change only:
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
alyssawilk
left a comment
There was a problem hiding this comment.
Awesome! Thanks for your patience getting this landed!
@snowp you up for second pass or should I toss it over to @mattklein123?
|
I can take a look since Snow is sick. |
mattklein123
left a comment
There was a problem hiding this comment.
From a code perspective this LGTM. I assume that this has no effect on buffering/watermarking or timeouts because the code already had to correctly handle buffering/watermarking and timeouts while waiting for the pool to be ready? Also, do we have any observability around streams that are connected but waiting for CONNECT? It's not explicitly required but something we might want in the future. cc @alyssawilk
|
Yeah, watermarking already connects the upstream (TCP or HTTP) watermarks to downstream, and vice versa. Timeouts are mildly interesting because the connection looks established from the connection pool perspective but the TCP proxy session doesn't know the upstream connection is assigned. Given we handle the appropriate (wait for upstream connect) timeout in the connection pool, I think the only arguably error is if you had say a 5s keepalive timeout, and you got a 200-Ok in 3s and data in 3s you'd time out where it shouldn't. I think this is orthogonal of this change, but if we cared we could fix by sending an empty date frame or some other ping to tickle the keepalive when the 200 came in. |
Yeah I think this is fine. I just wanted to make sure we don't somehow lose the timeout entirely. |
* master: (48 commits) Resolve 14506, avoid libidn2 for our curl dependency (envoyproxy#14601) fix new/free mismatch in Mainthread utility (envoyproxy#14596) opencensus: deprecate Zipkin configuration. (envoyproxy#14576) upstream: clean up code location (envoyproxy#14580) configuration impl: add cast for ios compilation (envoyproxy#14590) buffer impl: add cast for android compilation (envoyproxy#14589) ratelimit: add dynamic metadata to ratelimit response (envoyproxy#14508) tcp_proxy: wait for CONNECT response before start streaming data (envoyproxy#14317) stream info: cleanup address handling (envoyproxy#14432) [deps] update upb to latest commit (envoyproxy#14582) Add utility to check whether the execution is in main thread. (envoyproxy#14457) listener: undeprecate bind_to_port (envoyproxy#14480) Fix data race in overload integration test (envoyproxy#14586) deps: update PGV (envoyproxy#14571) dependencies: update cve_scan.py for some libcurl 7.74.0 false positives. (envoyproxy#14572) Network::Connection: Add L4 crash dumping support (envoyproxy#14509) ssl: remember stat names for configured ciphers. (envoyproxy#14534) formatter: add custom date formatting to downstream cert start and end dates (envoyproxy#14502) feat(lua): allow setting response body when the upstream response body is empty (envoyproxy#14486) Generalize the gRPC access logger base classes (envoyproxy#14469) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Commit Message: The TCP proxy used in tunneling mode now waits for the CONNECT response before start sending the data. Before the data was transmitted right after the CONNECT request.
Additional Description:
Risk Level: medium
Testing: unit test, integration, manual test
Docs Changes: https://github.com/irozzo-1A/envoy/blob/wait-for-connect-response/docs/root/intro/arch_overview/http/upgrades.rst#tunneling-tcp-over-http
Release Notes: https://github.com/irozzo-1A/envoy/blob/wait-for-connect-response/docs/root/version_history/current.rst#minor-behavior-changes
Platform Specific Features: NONE
Runtime guard:
http_upstream_wait_connect_response