Skip to content

buffer: add "overflow" watermark#7619

Closed
mergeconflict wants to merge 14 commits intoenvoyproxy:masterfrom
mergeconflict:overflow_watermark
Closed

buffer: add "overflow" watermark#7619
mergeconflict wants to merge 14 commits intoenvoyproxy:masterfrom
mergeconflict:overflow_watermark

Conversation

@mergeconflict
Copy link
Copy Markdown

@mergeconflict mergeconflict commented Jul 17, 2019

WIP: Add a new "overflow" watermark to watermark buffers. Intended usage: if a buffer continues to be written to after exceeding its high watermark, and exceeds its overflow watermark (by default 2x the high watermark), Envoy should protect itself by closing the associated stream.

Risk Level: Medium
Testing: Only unit tests so far; needs integration tests.
Docs Changes: TBD
Release Notes: TBD

Signed-off-by: Dan Rosen mergeconflict@google.com

Signed-off-by: Dan Rosen <mergeconflict@google.com>
@mergeconflict
Copy link
Copy Markdown
Author

Looking for some initial feedback as I start working on integration tests. Is this roughly what we had in mind?
/assign @alyssawilk @htuch @yanavlasov

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has the right overall shape.
I think we can simplify, by basically dealing with events fairly locally. If we go over the limit on a connection or codec buffer, just close the connection (no need to inform every stream as we do for overflow)
I think that'll simplify this quite a bit, then I'll do another pass and you can start in on tests.

// below_low_watermark_ has been called.
bool above_high_watermark_called_{false};
// Set to true after above_overflow_watermark_ has been called. Never reset, because we assume
// the stream will be forcibly closed in response.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object will be destroyed? I don't think we want this stream-centric given it could be a connection

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, which object will be destroyed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that in a buffer which is used by http streams, raw tcp connections, and arbitrary other objects, we shouldn't be talking about streams :-)
For L7 the stream will be closed, for L4 the connection will be closed, so let's find some neutral way of wording that the owning object will take care of the buffer going away.

@mattklein123
Copy link
Copy Markdown
Member

QQ: What's the config plan for this? Correctly configuring all of the flow control settings is already extremely complicated. Can we potentially start with this just being a multiple of the high watermark? Or is the plan to have independent config?

Also, is this a defense in depth measure, or are there specific cases we are targeting? One case I know this will fix is the local reply case which is not well covered today.

@mergeconflict
Copy link
Copy Markdown
Author

QQ: What's the config plan for this? Correctly configuring all of the flow control settings is already extremely complicated. Can we potentially start with this just being a multiple of the high watermark? Or is the plan to have independent config?

I'm not currently planning to add any new configuration for this. The only existing configuration I found so far are initial_stream_window_size and initial_connection_window_size in Http2ProtocolOptions, but I've quite possibly missed something. My guess is that setting the overflow watermark at 2x the high watermark (both for the connection and per H2 stream) is probably fine, at least as a starting point, but I'll defer to @alyssawilk.

Also, is this a defense in depth measure, or are there specific cases we are targeting? One case I know this will fix is the local reply case which is not well covered today.

I'm not aware of specific cases we're trying to cover, I think this is just defense in depth.

@mergeconflict
Copy link
Copy Markdown
Author

/wait while I work on addressing Alyssa's feedback.

@mattklein123
Copy link
Copy Markdown
Member

My guess is that setting the overflow watermark at 2x the high watermark (both for the connection and per H2 stream) is probably fine, at least as a starting point, but I'll defer to @alyssawilk.

Yes agreed, this makes sense to me. Ping me when this is more fleshed out and I'm happy to take a look also.

@mattklein123 mattklein123 self-assigned this Jul 20, 2019
wip
Signed-off-by: Dan Rosen <mergeconflict@google.com>
@mergeconflict
Copy link
Copy Markdown
Author

@alyssawilk - Some updates here, still a lot of work to do, but wondering if you could take another quick look? I'm now just closing the stream or connection immediately whenever I get an overflow, rather than passing the event through callbacks, and I've started fixing some tests.

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry for the delay on comments but I think this is much cleaner and will hopefully be correspondingly safer.

Couple of drive by comments while I'm in the file, and I'll do a full pass once you're ready.

// below_low_watermark_ has been called.
bool above_high_watermark_called_{false};
// Set to true after above_overflow_watermark_ has been called. Never reset, because we assume
// the stream will be forcibly closed in response.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that in a buffer which is used by http streams, raw tcp connections, and arbitrary other objects, we shouldn't be talking about streams :-)
For L7 the stream will be closed, for L4 the connection will be closed, so let's find some neutral way of wording that the owning object will take care of the buffer going away.

Signed-off-by: Dan Rosen <mergeconflict@google.com>
Dan Rosen added 4 commits July 29, 2019 10:53
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
@mergeconflict
Copy link
Copy Markdown
Author

/wait

istream
istringstream
iteratively
janky
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my most important contribution to Envoy.

Dan Rosen added 2 commits August 6, 2019 15:45
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
@mergeconflict
Copy link
Copy Markdown
Author

Status update on this PR:

  • The high watermark overflow multiplier is now controlled by a runtime variable, buffer.overflow.high_watermark_multiplier. Its default value is 2, and it can be disabled by setting it to 0. I haven't documented this yet.
  • There are a bunch of tests that indirectly create watermarked buffers and therefore require the runtime to be present. I've created a Runtime::ScopedMockLoaderSingleton class to deal with these easily.
  • There are a lot existing tests that set a relatively small buffer size (e.g. 1 KB) and then flood it with data (e.g. 1 MB), to exercise the various backpressure mechanisms. These tests break with high_watermark_multiplier=2, because the new behavior is to immediately close the connection or stream. So I've disabled the feature and left a TODO in each of those specific tests, to write another test that covers the new behavior. There are about 35 of these.

So now is, I think, the hard part: writing new tests. @alyssawilk and @mattklein123, if you get a chance, please let me know if my methodology seems reasonable to you, and have another look at the code. You can mainly limit your attention to the changes under /source, since the /test changes are all basically mechanical so far. Thanks again!

@mergeconflict
Copy link
Copy Markdown
Author

/unassign @yanavlasov @htuch

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mergeconflict thanks agreed this is the right approach. I left some comments on the source and didn't look at tests as you requested.

/wait

void checkHighAndOverflowWatermarks();
void checkLowWatermark();

Runtime::Loader& runtime_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we snap the multiplier, I think we won't need to store the reference and can just use the free variant to grab the multiplier at construction?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated; wasn't sure what you meant about the "free variant" - lmk if this isn't what you had in mind.

[this]() -> void { this->requestDataTooLarge(); });
auto buffer = std::make_unique<Buffer::WatermarkBuffer>(
[this]() -> void { this->requestDataDrained(); },
[this]() -> void { this->requestDataTooLarge(); }, [this]() -> void { this->resetStream(); });
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to refresh my memory on this code a bit, but some thought will need to be put into how we reset the stream here. Should it look like a remote reset? Local reset? What reset code do we use? Etc. Mainly just a heads up to think about this a bit. Same below. (You might consider moving buffer creation to a shared function with more comments.) Will also need a stat here.

[this]() -> void { this->enableDataFromDownstream(); },
[this]() -> void { this->disableDataFromDownstream(); });
[this]() -> void { this->disableDataFromDownstream(); },
[this]() -> void { this->resetStream(); });
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about thinking about reset type, etc.


response_limiter_ = std::make_unique<StreamRateLimiter>(
rate_kbps.value(), encoder_callbacks_->encoderBufferLimit(),
[this] { encoder_callbacks_->resetStream(); },
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a stat of some type.

…flow behavior, and update tests

Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source seems overall solid modulo the H2 concern. Hopefully for tests you can mostly crib the existing watermark unit and integration tests? If you're having trouble forcing network back-up for integration tests you could make a fake network listener which backs up. I'd been playing around with randomized backup over in
master...alyssawilk:backup_test_fuzzing
which might have some code useful to crib off of.

low_watermark_ = low_watermark;
high_watermark_ = high_watermark;
checkHighWatermark();
overflow_watermark_ = high_watermark * overflow_multiplier_;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check that we don't overflow our overflow due to bad config?

const uint32_t overflow_multiplier_{0};
// Used for enforcing buffer limits (off by default). If these are set to non-zero by a call to
// setWatermarks() the watermark callbacks will be called as described above.
uint32_t overflow_watermark_{0};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no longer a way to set this on a per-buffer basis?

Again my concern is that if you have many streams outputting to one downstream H2 connection, and the network::Connection goes over-watermark, that the streams can each dump roughly one watermark worth of data into the network::connection without being malicious. I think when we talk about memory limits per stream this is accounted for, and we have to make sure that the H2 HCM can set this for downstream H2 and the connection pool can set a higher multiplier for H2 upstream.

@mergeconflict
Copy link
Copy Markdown
Author

Heads up for all concerned: I have a different project I need to prioritize right now, so I'm stepping away from this PR for a while. I'll try to keep it in sync with master in the meantime.

@stale
Copy link
Copy Markdown

stale bot commented Aug 19, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 19, 2019
Signed-off-by: Dan Rosen <mergeconflict@google.com>
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Aug 21, 2019
@mergeconflict
Copy link
Copy Markdown
Author

Still not really working on this. Just merged master to prevent excessive bitrot and appease stalebot.
/wait

@stale
Copy link
Copy Markdown

stale bot commented Aug 28, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 28, 2019
Signed-off-by: Dan Rosen <mergeconflict@google.com>
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Aug 28, 2019
@mergeconflict
Copy link
Copy Markdown
Author

On second thought, @alyssawilk had a great suggestion: just close this until I actually have a chance to get back to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants