Skip to content

Fix crash in request hedging handling.#7530

Merged
mattklein123 merged 2 commits intoenvoyproxy:masterfrom
mpuncel:mpuncel/fix-hedging-crash
Jul 11, 2019
Merged

Fix crash in request hedging handling.#7530
mattklein123 merged 2 commits intoenvoyproxy:masterfrom
mpuncel:mpuncel/fix-hedging-crash

Conversation

@mpuncel
Copy link
Copy Markdown
Contributor

@mpuncel mpuncel commented Jul 11, 2019

This commit fixes a crash in which an upstream request would not be
properly reset after the router was done with it and the callbacks
destroyed which would cause a segfault when data was received on the
upstream request. This happened in particular when: 1) "bad" headers are
received for an upstream request 2) that request is not retried, and 3) there are still requests in flight.
When hedging is disabled (i.e. default behavior) this is not an issue
because the filter will be destroyed synchronously which will reset the
stream, however with hedging it needs to be done more proactively.

Signed-off-by: Michael Puncel mpuncel@squareup.com

Description: Fix possible crash when request hedging is enabled
Risk Level: low
Testing: unit test, deployed in production and saw crashes stop
Docs Changes: n/a
Release Notes: n/a

This commit fixes a crash in which an upstream request would not be
properly reset after the router was done with it and the callbacks
destroyed which would cause a segfault when data was received on the
upstream request. This happened in particular when "bad" headers are
received for an upstream request but there are still requests in flight.
When hedging is disabled (i.e. default behavior) this is not an issue
because the filter will be destroyed synchronously which will reset the
stream, however with hedging it needs to be done more proactively.

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
@alyssawilk
Copy link
Copy Markdown
Contributor

alyssawilk commented Jul 11, 2019

Throwing to Matt, less because I can't do the review myself and more because I'm unsure if we want to land this before or after release (#7538) :-)

Given it sounds like it works for the non-hedging case, I'm inclined to land after the release, even if it means hedging isn't as usable in 1.11. Thoughts?

@mattklein123
Copy link
Copy Markdown
Member

Yeah let's hold for 1.12. I will take a look.

@mpuncel
Copy link
Copy Markdown
Contributor Author

mpuncel commented Jul 11, 2019

Confirming there's no behavior change here if you don't have hedging enabled.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM with small question.

/wait-any

if (could_not_retry && (numRequestsAwaitingHeaders() > 0 || pending_retries_ > 0)) {
upstream_request.upstream_host_->stats().rq_error_.inc();
upstream_request.removeFromList(upstream_requests_);
upstream_request.removeFromList(upstream_requests_)->resetStream();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry QQ in the non-hedging case, how did this work previously? Where were things reset, and how do we avoid resetting things twice now?

Copy link
Copy Markdown
Contributor Author

@mpuncel mpuncel Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the non-hedging case you'd continue to read from this request until end_stream and return any body/trailers downstream. When hedging is happening and there are other requests to wait around for, we want to cancel.

It won't be reset twice because we're removing it from the upstream request list on this line, so it won't be seen in resetAll() in Filter::onDestroy().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK can you maybe just add as mall comment about this? Makes sense.

/wait

Signed-off-by: Michael Puncel <mpuncel@squareup.com>
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@mattklein123 mattklein123 merged commit feaada3 into envoyproxy:master Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants