-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Envoy Segfaults #6744
Comments
Update: Increasing the per_connection_buffer_limit_bytes fixed the issue. I still think envoy shouldn't segfault in this case |
cc @lizan. |
Looks like a flow control issue? cc @alyssawilk |
At first glance we're getting data from upstream, we go over the limit, send en error response, call doDeferredStreamDestroy, which triggers router cleanup and deletes the upstream request while we're still under the stack of the upstream request. That seems pretty sketchy from a lifetime perspective. I'll have to poke around tomorrow and see what changed and/or if I can reproduce. I know we have tests where we go over the limit and fail and I'm not sure what's different about this set up and the tests that they're passing and this isn't |
Oh, I think I know what happens, and it happens when we sendLocalReply after receiving response headers for a gRPC response. When the router decodes headers, it latches a pointer to the upstream headers, then hands off ownership to the HCM. We don't have a response-data-too-large test which has both access logs (accessing the headers after they were destroyed) and gRPC (doing the header overwrite) but I'll get started on one. @mattklein123 we could fix either by updating the headers in place for the gRPC response path, or by making response headers accessible via callbacks and not having the router latch. I feel like the latter is more future-proof. Thoughts? |
Yeah I've wanted to get rid of all headers latching for quite some time and have headers/trailers only be accessible via callbacks like body. More work but definitely the much better change IMO. |
@alyssawilk / @mattklein123 : For now we have per_connection_buffer_limit_bytes set to 10MB (in both the listeners and clusters section). Is this the right workaround? Is this a bad idea (performance/memory)? Do we need to set it in both places? Additionally, is this setting the buffer size or just an upper bound on the buffer size? Our payload size will depend on the page size that is requested by the user. So we don't know how much buffer size is required. |
Having a large per connection buffer limit basically makes it easier for your server to OOM, either accidentally by sending many large responses or intentionally if malicious clients try to DoS you. If you're hitting the response too large callback it means you're using buffering filters (where headers will not be sent to the users until Envoy has received and buffered the whole response) and some of your responses are too large, so if the bug were fixed you'd be failing to proxy your own responses. So figuring out how large your responses actually are and capping the buffers there may be optimal. |
The gRPC JSON transcoded is a buffering filter, if the transcoded gRPC call
is an unary call, it will buffer until upstream sends the whole response
and trailers.
…On Thu, May 2, 2019 at 10:12 AM alyssawilk ***@***.***> wrote:
Having a large per connection buffer limit basically makes it easier for
your server to OOM, either accidentally by sending many large responses or
intentionally if malicious clients try to DoS you.
If you're hitting the response too large callback it means you're using
buffering filters (where headers will not be sent to the users until Envoy
has received and buffered the whole response) and some of your responses
are too large, so if the bug were fixed you'd be failing to proxy your own
responses. So figuring out how large your responses actually are and
capping the buffers there may be optimal.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6744 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHYB3ZACYOKXESJ3YETUADPTMOGXANCNFSM4HJGKRHA>
.
|
… to upstream access logs (#6785) Risk Level: Medium (will have a perf effect for folks using access logs) Testing: regression integration test Docs Changes: n/a Release Notes: n/a Fixes #6744 Signed-off-by: Alyssa Wilk <[email protected]>
… to upstream access logs (envoyproxy#6785) Risk Level: Medium (will have a perf effect for folks using access logs) Testing: regression integration test Docs Changes: n/a Release Notes: n/a Fixes envoyproxy#6744 Signed-off-by: Alyssa Wilk <[email protected]> Signed-off-by: Jeff Piazza <[email protected]>
Title: Envoy is segfaulting
Description:
Envoy is segfaulting. I am sending http1 GET requests. Envoy has grpc transcoding setup. Backend service is a grpc service.
I am running envoyproxy/envoy-alpine-debug:v1.10.0 within a docker container on a VM.
Repro steps:
Envoy config here
The text was updated successfully, but these errors were encountered: