Skip to content

thrift-proxy: fix crash#9089

Merged
zuercher merged 10 commits intoenvoyproxy:masterfrom
rgs1:fix-issue-9037
Nov 22, 2019
Merged

thrift-proxy: fix crash#9089
zuercher merged 10 commits intoenvoyproxy:masterfrom
rgs1:fix-issue-9037

Conversation

@rgs1
Copy link
Member

@rgs1 rgs1 commented Nov 20, 2019

After Router::onEvent() handles a local or remote close, it should also
reset the upstream request to ensure another close() isn't issued
when Router::onDestroy() is called.

Added a unit test and also validated with @fishcakez's repro that this works
properly.

Fixes: #9037

Signed-off-by: Raul Gutierrez Segales rgs@pinterest.com

After Router::onEvent() handles a local or remote close, it should also
reset the upstream request to ensure another close() isn't issued
when Router::onDestroy() is called.

No tests added, but validated with @fishcakez's repro that this works
properly.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
@rgs1 rgs1 requested a review from zuercher as a code owner November 20, 2019 21:46
Raul Gutierrez Segales added 5 commits November 20, 2019 18:56
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
* add unit test to track regression
* properly release resources, avoid ASAN/TSAN issues

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
When calling onDestroy -> resetStream, a call to onEvent()
might be generated by the close() call. This in turn will
free upstream_request_ which is a problem given that
resetStream is running and it's a member of upstream_request_.

This probably needs more work -- once ASAN/TSAN is happy --
to make it less convoluted.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Leftover.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
@zuercher zuercher self-assigned this Nov 21, 2019
Raul Gutierrez Segales added 2 commits November 21, 2019 19:18
Also adds Stephan's integration test. Validated this with our repro,
it's good to go.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
@rgs1
Copy link
Member Author

rgs1 commented Nov 21, 2019

@zuercher failures are unrelated:

//test/integration:stats_integration_test                                FAILED in 1 out of 2 in 25.5s
  Stats over 2 runs: max = 25.5s, min = 20.7s, avg = 23.1s, dev = 2.4s
  /build/tmp/_bazel_bazel/b570b5ccd0454dc9af9f65ab1833764d/execroot/envoy/bazel-out/k8-opt/testlogs/test/integration/stats_integration_test/shard_1_of_2/test.log

Mind re-kicking the azure jobs?

@zuercher
Copy link
Member

/azp run envoy-linux

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rgs1
Copy link
Member Author

rgs1 commented Nov 22, 2019

The macos failure is also unrelated:

//test/common/network:listener_impl_test                                TIMEOUT in 301.2s
  /private/var/tmp/_bazel_runner/94a1ff02ff90c163128b3290c28c4cd9/execroot/envoy/bazel-out/darwin-fastbuild/testlogs/test/common/network/listener_impl_test/test.log

Copy link
Member

@zuercher zuercher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zuercher zuercher merged commit 40970bd into envoyproxy:master Nov 22, 2019
sthagen added a commit to sthagen/envoyproxy-envoy that referenced this pull request Nov 22, 2019
rgs1 pushed a commit to rgs1/envoy that referenced this pull request Jan 31, 2020
We've been using this in prod for a couple of months and for
a few different services.

A few critical bugs have been fixed (envoyproxy#9089 and envoyproxy#6549) and some
necessary stats/features have been added (envoyproxy#9203 and envoyproxy#8994), so
it's probably a good time to graduate this filter.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
rgs1 pushed a commit to rgs1/envoy that referenced this pull request Feb 1, 2020
We've been using this in prod for a couple of months and for
a few different services.

A few critical bugs have been fixed (envoyproxy#9089 and envoyproxy#6549) and some
necessary stats/features have been added (envoyproxy#9203 and envoyproxy#8994), so
it's probably a good time to graduate at least the proxy
and the router filter.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
mattklein123 pushed a commit that referenced this pull request Feb 1, 2020
We've been using this in prod for a couple of months and for
a few different services.

A few critical bugs have been fixed (#9089 and #6549) and some
necessary stats/features have been added (#9203 and #8994), so
it's probably a good time to graduate at least the proxy
and the router filter.

Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thrift connection manager crash on remote close during active request

2 participants