upstream: handle health check fail after removal#6765
Merged
mattklein123 merged 8 commits intomasterfrom May 1, 2019
Merged
Conversation
When using active health checking, hosts are not removed from dynamic clusters if they are still passing health checks. This creates a situation in which hosts might not be removed for a very long time if the sequence is reversed; removal followed by health check failure. This change handles the second case so that any time a host is both removed AND failing active health check, in any order, it will be removed. This has been an issue "forever" but is more obvious when using streaming EDS or very long polling DNS. Fixes #6625 Signed-off-by: Matt Klein <mklein@lyft.com>
Member
Author
|
@snowp I'm going to take a fresh pass on this tomorrow and add some more tests and see if I can figure out a better solution for the |
Member
Author
|
cc @lita |
Signed-off-by: Matt Klein <mklein@lyft.com>
snowp
suggested changes
May 1, 2019
Contributor
snowp
left a comment
There was a problem hiding this comment.
Seems good to me modulo the strict DNS issue
Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Member
Author
|
@snowp updated to support only EDS and add better tests. I think this is a better solution for now. PTAL. |
snowp
suggested changes
May 1, 2019
Contributor
snowp
left a comment
There was a problem hiding this comment.
LGTM, just one minor comment
Member
Author
|
@snowp thanks updated |
jeffpiazza-google
pushed a commit
to jeffpiazza-google/envoy
that referenced
this pull request
May 3, 2019
When using active health checking, hosts are not removed from dynamic clusters if they are still passing health checks. This creates a situation in which hosts might not be removed for a very long time if the sequence is reversed; removal followed by health check failure. This change handles the second case so that any time a host is both removed AND failing active health check, in any order, it will be removed. This has been an issue "forever" but is more obvious when using streaming EDS or very long polling DNS. Fixes envoyproxy#6625 Signed-off-by: Matt Klein <mklein@lyft.com> Signed-off-by: Jeff Piazza <jeffpiazza@google.com>
mattklein123
added a commit
that referenced
this pull request
May 6, 2019
mattklein123
added a commit
that referenced
this pull request
May 14, 2019
If we inline delete a host during a failure callback we need to account for the connection being cleaned up prior to handling 'connection: close' headers. Signed-off-by: Matt Klein <mklein@lyft.com>
mattklein123
added a commit
that referenced
this pull request
May 15, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When using active health checking, hosts are not removed from
dynamic clusters if they are still passing health checks. This
creates a situation in which hosts might not be removed for a
very long time if the sequence is reversed; removal followed by
health check failure. This change handles the second case so that
any time a host is both removed AND failing active health check,
in any order, it will be removed.
This has been an issue "forever" but is more obvious when using
streaming EDS or very long polling DNS.
Fixes #6625
Signed-off-by: Matt Klein mklein@lyft.com
Risk Level: Medium/High. Scary stuff.
Testing: New unit tests.
Docs Changes: N/A
Release Notes: N/A