http1: Allocate encoder on heap to survive move#10561
http1: Allocate encoder on heap to survive move#10561mattklein123 merged 6 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
|
Hmm. This is scary. What is the crashing call stack exactly? I don't see encoder would be used after this call. I would like to understand the underlying issue better before we decide how to fix as it would be optimal to avoid the extra heap allocation if we don't need it. /wait-any |
|
Sure, here is a traceback of the segmentation fault at e9248e2: |
|
What happens under the hood is that If heap allocation is costly in this scenario we can move |
|
The comment makes me wonder if simply moving reset to the end of block does not cause any other side effects: envoy/source/common/http/http1/codec_impl.cc Lines 964 to 966 in 986e941 @mattklein123 any thoughts? |
|
OK this makes sense thanks. I'm really surprised/scared that this works in clang and especially doesn't get caught by ASAN. I think clang probably does not actually clear the memory when it "moves" (copies) the value into the local before resetting the optional (or the clang does not clear the memory in optional reset and gcc does).
I don't think this the issue. The issue is here: envoy/source/common/http/codec_client.cc Lines 107 to 110 in 446c7fd We are using the encoder again after it was moved/reset. This is definitely broken though again it's scary that this seems to work OK in clang.
I think it's probably not safe to do this, since the intention of this code (IIRC) is to prevent a recursive reset during the completion callback. I think the simplest way of fixing this for now would be to:
I think this should be functionaly equivalent to what we have today and then we can potentially improve this in further follow ups. Can you potentially try that? If things continue to break / you are having trouble I would be fine to go back to the heap allocation and I can look at fixing this in a follow up. |
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this LGTM for a fix with one small ASSERT comment. I would also like either @alyssawilk or @antoniovicente to take a quick look. Thank you!
/wait
| PendingResponse& response = pending_response_.value(); | ||
| // Encoder is used as part of decode* calls later in this function so pending_response_ can not | ||
| // be reset just yet. Preserve the state in pending_response_done_ instead. | ||
| pending_response_done_ = true; |
There was a problem hiding this comment.
This doesn't thrill me due to extra logic complexity, but I think this is an OK solution for now. Can you do me a favor and put an additional ASSERT(!pending_response_done_); every time we load successfully check that pending_response has a value in the other functions above? I think there shouldn't be too many.
Signed-off-by: Ruslan Nigmatullin <elessar@dropbox.com>
|
The failures are known flakes so will go ahead and merge. |
|
PR description was not updated when the implementation moved away from heap allocation to extra ASSERTs. |
Description:
ClientConnectionImpl::onMessageComplete()movesPendingResponse(and thereforeEncoder) and tries to use it after that. As move invalidates the object - accessing it by pointer is not valid anymore.It looks like clang keeps objects roughly intact so it's uncaught by existing envoy tests. GCC on the other hand does invalidate the memory region, which leads to crash.
Risk Level: low
Testing: n/a (was only able to repro in gcc)
Docs Changes: n/a
Release Notes: n/a