Skip to content

tcp_proxy: wait for CONNECT response before start streaming data#14317

Merged
mattklein123 merged 18 commits intoenvoyproxy:masterfrom
irozzo-1A:wait-for-connect-response
Jan 7, 2021
Merged

tcp_proxy: wait for CONNECT response before start streaming data#14317
mattklein123 merged 18 commits intoenvoyproxy:masterfrom
irozzo-1A:wait-for-connect-response

Conversation

@irozzo-1A
Copy link
Contributor

@irozzo-1A irozzo-1A commented Dec 8, 2020

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Commit Message: The TCP proxy used in tunneling mode now waits for the CONNECT response before start sending the data. Before the data was transmitted right after the CONNECT request.
Additional Description:
Risk Level: medium
Testing: unit test, integration, manual test
Docs Changes: https://github.com/irozzo-1A/envoy/blob/wait-for-connect-response/docs/root/intro/arch_overview/http/upgrades.rst#tunneling-tcp-over-http
Release Notes: https://github.com/irozzo-1A/envoy/blob/wait-for-connect-response/docs/root/version_history/current.rst#minor-behavior-changes
Platform Specific Features: NONE
Runtime guard: http_upstream_wait_connect_response

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
@irozzo-1A
Copy link
Contributor Author

Refers to #13293 (comment)

@dio
Copy link
Member

dio commented Dec 8, 2020

cc. @wez470 @alyssawilk

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
@irozzo-1A
Copy link
Contributor Author

The solution proposed here has to be considered as a base for discussion. I think this is not a good approach, but I'm still looking for a solution that does not require too much refactoring. Any idea is welcome.

@alyssawilk alyssawilk self-assigned this Dec 8, 2020
@alyssawilk
Copy link
Contributor

yeah, stashing the params (maybe in a wrapper struct) seems like the only thing I can think of either.

@irozzo-1A
Copy link
Contributor Author

yeah, stashing the params (maybe in a wrapper struct) seems like the only thing I can think of either.

ok, I'll proceed this way then ;-)

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great - I love how self contained it ended up being.

// Wait a bit, no data should go through.
ASSERT_FALSE(upstream_request_->waitForData(*dispatcher_, 1, std::chrono::milliseconds(100)));

upstream_request_->encodeHeaders(default_response_headers_, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could have a test for failure modes too - either a disconnect and/or not 200-ok headers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for non 200 response

@@ -119,6 +119,7 @@ void HttpUpstream::resetEncoder(Network::ConnectionEvent event, bool inform_down
if (inform_downstream) {
upstream_callbacks_.onEvent(event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we end up having an event callback "from the upstream connection" before there is a connection established. That's a bit weird. I think it might be cleaner if inform_downstream is true, that if we have a deferer we do pool failure, and if not we do the onEvent. WDYT?

@irozzo-1A
Copy link
Contributor Author

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
@@ -129,6 +157,9 @@ class HttpUpstream : public GenericUpstream, protected Http::StreamCallbacks {
void decodeHeaders(Http::ResponseHeaderMapPtr&& headers, bool end_stream) override {
if (!parent_.isValidResponse(*headers) || end_stream) {
parent_.resetEncoder(Network::ConnectionEvent::LocalClose);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still have a weird corner case where we resetEncoder where if "inform_downstream" is true, we send an upstream event downstream before the pool knows it has an upstream associated.

I think we could simplify this by dropping onGenericPoolFailure below, and instead in resetEncoder,
if(inform_downstream) {
if (deferrer) {
deferrer.onGenericPoolFailure()
} else {
former logic.
}
}
I think we always reset the encoder on failure (be it decodeHeaders failing, or disconnect) and then resetEncoder would take care of making it look like an event, or like a pool failure. WDYT? It's definitely worth an integration test and/or unit tests since it's tricky timing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I saw your previous comment and I was still figuring out how to do it. I pushed what I did so far, not tested yet. If you have the time to take a look let me know if I'm on the good path ;-)

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, and thanks for adding the new test!

I'm on vacation starting today through the new year, so I'm going to pass this off to snow/Matt. I hopefully explained the corner case enough they can help you for the few days until the envoy project goes 'on vacation' but if not I'll pick it up when I get back!

@alyssawilk
Copy link
Contributor

oh, and given the change to the data plane, we should probably runtime guard this (see CONTRIBUTING.md). Sorry, should have called that out before.

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
@irozzo-1A
Copy link
Contributor Author

I'm on vacation starting today through the new year, so I'm going to pass this off to snow/Matt. I hopefully explained the corner case enough they can help you for the few days until the envoy project goes 'on vacation' but if not I'll pick it up when I get back!

Thx for the info @alyssawilk, enjoy your vacation ;-)

oh, and given the change to the data plane, we should probably runtime guard this (see CONTRIBUTING.md). Sorry, should have called that out before.

No worries, I'll take a look at this.

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
…t-for-connect-response

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
@irozzo-1A irozzo-1A changed the title [WIP] Wait for CONNECT response before start streaming data tcp_proxy: wait for CONNECT response before start streaming data Jan 5, 2021
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
…t-for-connect-response

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
@irozzo-1A
Copy link
Contributor Author

@alyssawilk I think I addressed your points. At the moment the feature flag is enabled by default, let me know if you prefer the opposite.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a few more thoughts and I think you're good to go.

CONNECT request, and a second one listening on 10001, stripping the CONNECT headers, and forwarding the
original TCP upstream, in this case to google.com.

When runtime flag ``envoy.reloadable_features.http_upstream_wait_connect_response`` is set to ``true``, Envoy waits for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest removing mentions of the flag and just saying that Envoy will wait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/wait/now waits/
the envoy. -> the runtime guard envoy.

ASSERT_TRUE(upstream_request_->waitForHeadersComplete());

// Wait a bit, no data should go through.
ASSERT_FALSE(upstream_request_->waitForData(*dispatcher_, 1, std::chrono::milliseconds(100)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to move this down, to ensure no data is read before or after the bad headers?

Also I'm surprised this passes given you're running it with and without your change. Shouldn't it fail without your change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to move this down, to ensure no data is read before or after the bad headers?

Sure, makes sense.

Also I'm surprised this passes given you're running it with and without your change. Shouldn't it fail without your change?

Actually, I'm running it with my change only:

https://github.com/irozzo-1A/envoy/blob/ae297ffbf644a4ceeea3705787d9d379134754a1/test/integration/tcp_tunneling_integration_test.cc#L858-L860

Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Signed-off-by: Iacopo Rozzo <iacopo@kubermatic.com>
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for your patience getting this landed!

@snowp you up for second pass or should I toss it over to @mattklein123?

@mattklein123 mattklein123 assigned mattklein123 and unassigned snowp Jan 6, 2021
@mattklein123
Copy link
Member

I can take a look since Snow is sick.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code perspective this LGTM. I assume that this has no effect on buffering/watermarking or timeouts because the code already had to correctly handle buffering/watermarking and timeouts while waiting for the pool to be ready? Also, do we have any observability around streams that are connected but waiting for CONNECT? It's not explicitly required but something we might want in the future. cc @alyssawilk

@alyssawilk
Copy link
Contributor

Yeah, watermarking already connects the upstream (TCP or HTTP) watermarks to downstream, and vice versa.

Timeouts are mildly interesting because the connection looks established from the connection pool perspective but the TCP proxy session doesn't know the upstream connection is assigned. Given we handle the appropriate (wait for upstream connect) timeout in the connection pool, I think the only arguably error is if you had say a 5s keepalive timeout, and you got a 200-Ok in 3s and data in 3s you'd time out where it shouldn't. I think this is orthogonal of this change, but if we cared we could fix by sending an empty date frame or some other ping to tickle the keepalive when the 200 came in.

@mattklein123
Copy link
Member

Timeouts are mildly interesting because the connection looks established from the connection pool perspective but the TCP proxy session doesn't know the upstream connection is assigned. Given we handle the appropriate (wait for upstream connect) timeout in the connection pool, I think the only arguably error is if you had say a 5s keepalive timeout, and you got a 200-Ok in 3s and data in 3s you'd time out where it shouldn't. I think this is orthogonal of this change, but if we cared we could fix by sending an empty date frame or some other ping to tickle the keepalive when the 200 came in.

Yeah I think this is fine. I just wanted to make sure we don't somehow lose the timeout entirely.

@mattklein123 mattklein123 merged commit 395ca99 into envoyproxy:master Jan 7, 2021
mpuncel added a commit to mpuncel/envoy that referenced this pull request Jan 8, 2021
* master: (48 commits)
  Resolve 14506, avoid libidn2 for our curl dependency (envoyproxy#14601)
  fix new/free mismatch in Mainthread utility (envoyproxy#14596)
  opencensus: deprecate Zipkin configuration. (envoyproxy#14576)
  upstream: clean up code location (envoyproxy#14580)
  configuration impl: add cast for ios compilation (envoyproxy#14590)
  buffer impl: add cast for android compilation (envoyproxy#14589)
  ratelimit: add dynamic metadata to ratelimit response (envoyproxy#14508)
  tcp_proxy: wait for CONNECT response before start streaming data (envoyproxy#14317)
  stream info: cleanup address handling (envoyproxy#14432)
  [deps] update upb to latest commit (envoyproxy#14582)
  Add utility to check whether the execution is in main thread. (envoyproxy#14457)
  listener: undeprecate bind_to_port (envoyproxy#14480)
  Fix data race in overload integration test (envoyproxy#14586)
  deps: update PGV (envoyproxy#14571)
  dependencies: update cve_scan.py for some libcurl 7.74.0 false positives. (envoyproxy#14572)
  Network::Connection: Add L4 crash dumping support (envoyproxy#14509)
  ssl: remember stat names for configured ciphers. (envoyproxy#14534)
  formatter: add custom date formatting to downstream cert start and end dates (envoyproxy#14502)
  feat(lua): allow setting response body when the upstream response body is empty (envoyproxy#14486)
  Generalize the gRPC access logger base classes (envoyproxy#14469)
  ...

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants