refactor router filter to store upstream requests in a list. by mpuncel · Pull Request #6540 · envoyproxy/envoy

mpuncel · 2019-04-10T15:59:12Z

This is in preparation for implementing #5841 which will introduce
request racing. As of this commit there is no situation where there will
be more than one upstream request in flight, however it organizes the
code in such a way that doing so will cause less code churn.

Signed-off-by: Michael Puncel mpuncel@squareup.com

Description: Change upstream request storage to list from pointer in router
Risk Level: Medium
Testing: Existing unit tests
Docs Changes: N/A
Release Notes: N/A

This is a subset of the changes in https://github.com/envoyproxy/envoy/pull/6228/files which is implementing #5841

This is in preparation for implementing envoyproxy#5841 which will introduce request racing. As of this commit there is no situation where there will be more than one upstream request in flight, however it organizes the code in such a way that doing so will cause less code churn. Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel · 2019-04-10T15:59:30Z

I broke this out from #6228 at Alyssa's suggestion

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel · 2019-04-10T19:06:05Z

/retest

repokitteh-read-only · 2019-04-10T19:06:19Z

🔨 rebuilding ci/circleci: build_image (failed build)
🔨 rebuilding ci/circleci: ipv6_tests (failed build)

🐱

Caused by: a #6540 (comment) was created by @mpuncel.

see: more, trace.

This is in preparation for there being multiple simultaneous requests in the router filter Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel · 2019-04-11T12:12:09Z

added the watermark callbacks change as well

source/common/router/router.cc

snowp

This LGTM, seems like a good call to split this out from the other PR

alyssawilk

Thanks for breaking this out - it really made it easier to reason about.

alyssawilk · 2019-04-11T15:59:02Z

source/common/http/conn_manager_impl.cc

 }

 void ConnectionManagerImpl::ActiveStream::callLowWatermarkCallbacks() {
  ASSERT(high_watermark_count_ > 0);


I'm looking at this and I think there may be a pre-existing bug which worked OK before because there was one back-up cause, but will not work with two.

If you have the upstream connection call high watermark callbacks, and increment high_watermark_count_, then the hedge connection hits its watermark and increments high_watermark_count_, I don't think we want to resume by calling the low watermark callbacks until the count is back to 0.

If I'm correct here we may have been overenthusiastic resuming, but fixing will be a fairly high risk change.

is the reason the fix is high risk because the count might not reach 0 if there is a counting bug somewhere?

Yep. I mean you can land this and do the other separately but I don't think you can land your hedge fixes without both, and the fix is high risk because it may be masking other bugs.

alyssawilk · 2019-04-11T15:59:25Z

source/common/http/conn_manager_impl.cc


 void ConnectionManagerImpl::ActiveStream::callHighWatermarkCallbacks() {
  ++high_watermark_count_;
-  if (watermark_callbacks_) {


I think we should enhance existing unit tests to have two subscribers, to regression test both get the callback

source/common/http/conn_manager_impl.cc

alyssawilk · 2019-04-11T16:02:24Z

source/common/http/conn_manager_impl.cc

    watermark_callbacks.onAboveWriteBufferHighWatermark();
  }
 }
 void ConnectionManagerImpl::ActiveStreamDecoderFilter::removeDownstreamWatermarkCallbacks(


can you poke through code and make sure if upstream connection 1 is above the high watermark (and causes the state to transition to high watermark) and upstream connection 2 ends up paused, that if upstream connection 1 goes away that it clears the state so that 2 ends up resuming? We want to make sure we don't get wedged here.

mattklein123 · 2019-04-12T17:49:26Z

@mpuncel sorry I haven't fully tracked the conversation between you and @alyssawilk. Is there anything you need from me on this right now? Or are you working through her comments?

mpuncel · 2019-04-12T19:48:33Z

@mattklein123 mostly I've been catching up my understanding of the problem. After doing that for a bit, I think for this PR in particular (hedging not implemented yet) I should be fine to assert in the conn manager that at most 1 callback is registered at a given time. Callbacks are registered/deregistered at upstream request construction/destruction. Since there can only be one upstream request at a time, there should never be more than one callback registered at a time.

For the full PR, I think I should never expect the callback to be invoked on more than one UpstreamRequest, because I only ever write data from one upstream request back downstream. Nothing is written to the downstream until all but the winning upstream request are reset. I think I could put an assert in the callback handler that blows up if the corresponding request isn't the "winning" one.

The other direction (request too big) is more difficult, because I think it is possible to hit a per try timeout before having written the full request upstream, so we might stop reading from downstream and wedge the hedged retry. I don't know the implicated code well enough yet to know how to fix.

mattklein123 · 2019-04-12T21:30:55Z

@mpuncel OK at a high level that makes sense to me and I think gives me the context I need to help review. So is this PR finished given that or do you need to make further changes?

mpuncel · 2019-04-15T14:17:59Z

I think I can add a few unit tests around the subscribe/unsubscribe as Alyssa suggested and possibly a few asserts to cover the assumptions that there shouldn't actually be multiple requests in flight

…s, encode assumptions into asserts Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel · 2019-04-15T14:51:10Z

okay @mattklein123 I believe this one is ready to go (assuming build passes)

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mattklein123

Thanks for splitting this out. Makes sense with one small nit.

/wait

mattklein123 · 2019-04-15T22:27:42Z

source/common/router/router.cc

-      upstream_request_->upstream_host_->stats().rq_timeout_.inc();
+  ASSERT(upstream_requests_.size() <= 1);
+  if (upstream_requests_.size() == 1) {
+    UpstreamRequest* upstream_request = upstream_requests_.front().get();


nit: why do we need to grab the raw pointer here? Is to just avoid calling front() a bunch. If so I would either just call front() or grab a reference and not a pointer for non-null clarity.

was just to avoid calling front(), will change

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel · 2019-04-16T00:59:03Z

is there a flake i that lua test?

mpuncel · 2019-04-16T02:45:55Z

/retest

repokitteh-read-only · 2019-04-16T02:46:01Z

🔨 rebuilding ci/circleci: tsan (failed build)

🐱

Caused by: a #6540 (comment) was created by @mpuncel.

see: more, trace.

mattklein123

Thanks!

mpuncel added 2 commits April 10, 2019 13:13

fix heap use after free

6be4b93

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

fix variable redeclaration

8a9e93e

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

use list to store watermark callbacks

ed85803

This is in preparation for there being multiple simultaneous requests in the router filter Signed-off-by: Michael Puncel <mpuncel@squareup.com>

snowp reviewed Apr 11, 2019

View reviewed changes

source/common/router/router.cc Show resolved Hide resolved

snowp previously approved these changes Apr 11, 2019

View reviewed changes

snowp assigned alyssawilk Apr 11, 2019

alyssawilk reviewed Apr 11, 2019

View reviewed changes

mattklein123 self-assigned this Apr 11, 2019

mattklein123 added the waiting label Apr 11, 2019

enhance existing watermark unit tests to exercise multiple subscriber…

5c66169

…s, encode assumptions into asserts Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mpuncel dismissed snowp’s stale review via 5c66169 April 15, 2019 14:50

repokitteh-read-only bot removed the waiting label Apr 15, 2019

fix assert in router

f86b421

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

mattklein123 requested changes Apr 15, 2019

View reviewed changes

repokitteh-read-only bot added the waiting label Apr 15, 2019

prefer avoiding a raw pointer

679f4d9

Signed-off-by: Michael Puncel <mpuncel@squareup.com>

repokitteh-read-only bot removed the waiting label Apr 16, 2019

mattklein123 approved these changes Apr 16, 2019

View reviewed changes

mattklein123 merged commit 21fd119 into envoyproxy:master Apr 16, 2019

Conversation

mpuncel commented Apr 10, 2019

Uh oh!

mpuncel commented Apr 10, 2019

Uh oh!

mpuncel commented Apr 10, 2019

Uh oh!

repokitteh-read-only bot commented Apr 10, 2019

Uh oh!

mpuncel commented Apr 11, 2019

Uh oh!

Uh oh!

snowp left a comment

Choose a reason for hiding this comment

Uh oh!

alyssawilk left a comment

Choose a reason for hiding this comment

Uh oh!

alyssawilk Apr 11, 2019

Choose a reason for hiding this comment

Uh oh!

mpuncel Apr 11, 2019

Choose a reason for hiding this comment

Uh oh!

alyssawilk Apr 11, 2019

Choose a reason for hiding this comment

Uh oh!

alyssawilk Apr 11, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alyssawilk Apr 11, 2019

Choose a reason for hiding this comment

Uh oh!

mattklein123 commented Apr 12, 2019

Uh oh!

mpuncel commented Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattklein123 commented Apr 12, 2019

Uh oh!

mpuncel commented Apr 15, 2019

Uh oh!

mpuncel commented Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

mattklein123 Apr 15, 2019

Choose a reason for hiding this comment

Uh oh!

mpuncel Apr 16, 2019

Choose a reason for hiding this comment

Uh oh!

mpuncel commented Apr 16, 2019

Uh oh!

mpuncel commented Apr 16, 2019

Uh oh!

repokitteh-read-only bot commented Apr 16, 2019

Uh oh!

mattklein123 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mpuncel commented Apr 12, 2019 •

edited

Loading

mpuncel commented Apr 15, 2019 •

edited

Loading