Conversation
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
…flow behavior, and update tests Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
Signed-off-by: Dan Rosen <mergeconflict@google.com>
…onnection_impl Signed-off-by: Adi Suissa-Peleg <adip@google.com>
|
There are many changes in this PR. mostly due to merging of ~6 months of upstream work. Note that in this PR I only address the downstream buffer overflow (upstream will be handled after we decide on the right approach).
I'll add more discussion points shortly. |
|
/cc @htuch @yanavlasov |
|
Discussion points:
|
| : below_low_watermark_(below_low_watermark), above_high_watermark_(above_high_watermark), | ||
| above_overflow_watermark_(above_overflow_watermark), | ||
| overflow_multiplier_(Runtime::LoaderSingleton::get().threadsafeSnapshot()->getInteger( | ||
| "buffer.overflow.high_watermark_multiplier", 2)) {} |
There was a problem hiding this comment.
Is a factor of 2 safe based on the maximum expansion of a request that can happen during proxying due to encoding related issues?
Some cases that come to mind:
- H2 compressed headers expanding to H1 headers. I think this is limited by max header size bytes, which is not related to watermark parameters.
- H2 connection flow control windows vs buffer watermarks.
- H2 control frames which are not limited by flow control.
- H1 to H2 chunk body conversions. I think an 1-byte H1 chunk can be encoded in either 4 or 6 bytes based on which line termination the sender uses. The H2 data frame header is 9 bytes, so the H2 encoding for a 1 byte frame is 10 bytes (2.5x larger)
There was a problem hiding this comment.
- Asymmetric read/write buffer configs. If the read buffer size is 10x the write buffer, you are very likely to hit the overflow condition after a large read.
There was a problem hiding this comment.
Good points...
To add to the first bullet, should we verify that the max header size bytes is less-or-equal to overflow watermark value?
Regarding point 5, IIUC it is also the case where the rate of the producer is much faster than the rate of the consumer. In this case I think it's up to the configuration parameters, but if there's a way to warn that a configuration might be wrong, then this is an example where it should.
There was a problem hiding this comment.
The max header limit excludes framing overhead. Also, you can end up writing headers and body in the same stack, you need space for both.
I think your changes effectively make it is unsafe to set buffer high water to anything below around 32kb or possibly, yet this is not enforced anywhere in the config plane. Note that there's widespread use of high watermarks of exactly 32kb.
There was a problem hiding this comment.
I think the goal of high watermark is to deal with highly unusual situations where normal watermarks have failed us. Examples are where a number of connection buffers are close to their watermark, but not over, and a large read occurs, adding a bunch of data before watermarking or overload manager can respond. Or when we forget to do watermark flow control push back due to a missing feedback path (e.g. for proxy generated direct responses).
Thus, I'm not super worried about the user having to explicitly size them, they should be sizing the normal watermarks (or knobs implying them) based on an understanding of traffic. Whenever there is compression involved, this might be tricky, e.g. zlib or HPACK. Headers are a special case, in general though compression could be 100x. Presumably the basic watermark limit is set to reflect this.
Setting some fixed constant as the fudge factor then makes sense to provide a zero tune capability here IMHO, it probably doesn't matter as long as it's some reasonable distance from the normal operational limit.
|
|
||
| void WatermarkBuffer::checkLowWatermark() { | ||
| if (!above_high_watermark_called_ || | ||
| if (above_overflow_watermark_called_ || !above_high_watermark_called_ || |
There was a problem hiding this comment.
Why do we check for above_overflow_watermark_called_ ?
There was a problem hiding this comment.
At the moment, although an overflow occurs, the buffer can still be used. Once the buffer is read from, the low watermark callback will be called. I'm not sure that in cases where the connection is terminated, the readDisable should be called again.
If I'm missing a scenario please let me know.
That said, a different buffer design that should be considered is to disable all operations once an overflow has been detected.
There was a problem hiding this comment.
Here's the problem: if the overflow callback is called, the buffer stops functioning properly since low watermark callbacks are no longer called when they should.
This PR feels high risk. I think we may be jumping the gun by going straight to closing the connection/resetting stream on overflow. It would be good to gather some data about what can cause the overflow cb to trigger. I know there's a runtime flag to control this behavior, but it defaulting on with a 2x overflow ratio could cause some problems.
There was a problem hiding this comment.
Agreed that runtime control over the high watermark constant seems a useful capability.
There was a problem hiding this comment.
@antoniovicente I'm unsure we have valid causes for the overflow to trigger at the moment. This feature to me is a safety valve to account for unknown bugs that may trigger buffer to grow way beyond high watermark. As such I would think outright socket closure is appropriate.
I agree though the multiplier is way too low and perhaps is not quite the right approach, see my comment above.
There was a problem hiding this comment.
@yanavlasov triggering the 2x multipler overflow watermark is very easy. Specially when you factor in that the H2 connection flow control window is not coordinated with high watermarks. Going straight to closing connection is very risky.
I'm just arguing for keeping the high/low watermarks in working order even if the overflow one is hit, so we can add overflow implementations that just increment counters.
| if (!above_overflow_watermark_called_ && overflow_watermark_ != 0 && | ||
| OwnedImpl::length() > overflow_watermark_) { | ||
| above_overflow_watermark_called_ = true; | ||
| above_overflow_watermark_(); |
There was a problem hiding this comment.
This early return combined with above_overflow_watermark_called_ means that the second check can end up calling the high watermark cb. Should these checks happen in the opposite order: high watermark first, then overflow?
There was a problem hiding this comment.
It really depends whether the normal flow control should be preserved or not, and if the buffer should be usable or not.
To the best of my understanding it should actually be that we first check if the overflow watermark cb was called, and if so, then early return.
There was a problem hiding this comment.
Is there any behavior we want to support other than connection shutdown and resource teardown? Presumably initiated within the space of a single epoll event call stack (we would do a hard connection shutdown, with no drain).
| } | ||
|
|
||
| StreamRateLimiter::StreamRateLimiter(uint64_t max_kbps, uint64_t max_buffered_data, | ||
| std::function<void()> overflow_cb, |
There was a problem hiding this comment.
overflow cb seems to be the last argument most of the time. Why is it the first in this case?
There was a problem hiding this comment.
Thanks for pointing it out... Actually it seems that it is reversed compared to others. Usually it is low_cb, high_cb, overflow_cb.
Not sure why, and I'll try to find out.
| overflow_watermark_ = high_watermark * overflow_multiplier_; | ||
| // TODO(adip): What should be done if there's an overflow (overflow_watermark_ < | ||
| // overflow_watermark_)? should this be a release_assert? | ||
| ASSERT((high_watermark <= overflow_watermark_) || (overflow_multiplier_ == 0)); |
There was a problem hiding this comment.
|| overflow_watermark_ == 0
There was a problem hiding this comment.
I'll add a comment here. This checks that there's no overflow due to multiplication:
overflow_watermark_ = high_watermark * overflow_multiplier_;
|
|
||
| // Overflow should happen, and the response should be reset | ||
| // Wait for the response to be read by the codec client. | ||
| codec_client_->waitForDisconnect(); |
There was a problem hiding this comment.
Do we fail due to overflow limit being hit or because headers are too large?
There was a problem hiding this comment.
The failure is due to the buffer limit. I'll add a stats counter and an assert that verifies this.
| file_event_->activate(Event::FileReadyType::Write); | ||
| // If the writing to the buffer caused an overflow-watermark breach, the connection should be | ||
| // closed | ||
| // TODO(adip): what should be done if the state is Closing |
There was a problem hiding this comment.
I think for connection-level resets, since this is highly unusual, it should just do a hard connection shutdown.
htuch
left a comment
There was a problem hiding this comment.
I think this fits the high watermark notion that @alyssawilk has been championing, would be great to get her take.
I'll send a preliminary design doc that describes the goal, and we can start the discussion from there. |
| low_watermark_ = low_watermark; | ||
| high_watermark_ = high_watermark; | ||
| checkHighWatermark(); | ||
| overflow_watermark_ = high_watermark * overflow_multiplier_; |
There was a problem hiding this comment.
I wonder if multiplier is the right approach. I think the worst case formula for figuring out how much data can end up in the downstream outgoing buffer is (for H/2 upstream):
min(max_concurrent_streams-downstream, max_concurrent_streams-upstream) * initial_stream_window_size-upstream.
So if the client makes a lot of simultaneous requests which land on distinct upstream servers all responses may end up in the outbound buffer before flow control can do anything about it. In this case low multiplier may cause spurious disconnects.
I wonder if the we should use a number instead of multiplier, or at the very least check if it makes sense given H2 settings for downstream and upstream?
|
|
||
| void WatermarkBuffer::checkLowWatermark() { | ||
| if (!above_high_watermark_called_ || | ||
| if (above_overflow_watermark_called_ || !above_high_watermark_called_ || |
There was a problem hiding this comment.
@antoniovicente I'm unsure we have valid causes for the overflow to trigger at the moment. This feature to me is a safety valve to account for unknown bugs that may trigger buffer to grow way beyond high watermark. As such I would think outright socket closure is appropriate.
I agree though the multiplier is way too low and perhaps is not quite the right approach, see my comment above.
|
I think these are separate issues. We should have better safety guards to ensure the overflow watermark is high enough from an HTTP/2 perspective (taking into account flow control windows, streams etc.) In terms of overflow watermark shutdown behavior, I think we need to keep the config as simple as possible while providing maximum chance of runtime recovery. What if we have a runtime knob that globally decides whether to do a hard shutdown or a stats bump on overflow? That requires regular HW/LW functions. I think this would be default enabled on every edge Envoy; the entire goal of OW is to deal with the real risk of OOM from unknown watermark snafus. |
|
Given the complexity of this PR, do you think it would be reasonable to split this into at least 2 PRs, the first being the actual overflow implementation in the watermark buffer, and then actually using this by providing non-nullptr callbacks in various places? (The latter could possibly be split also but might not be needed) WDYT? |
|
@mattklein123 I agree that this is a complex PR, and probably should be divided into multiple PRs. |
Awesome a design proposal sounds great, thank you. |
|
This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Continues the PR of mergeconflict.
Description: Continuation of the overflow watermark work, that closes the connection/stream in case of too much data in the buffer.
Risk Level: Medium - modifies the workflow in case an overflow occurs.
Testing: Added Unit tests, and an integration test that tests downstream buffer overflow. Need to add more.
Docs Changes: Need to update docs.