Skip to content

Conversation

@original-brownbear
Copy link
Contributor

Hotfix to not run into stuck snapshots because of master circuit breaking these requests.
Given that these requests are very small and much of the memory associated with them is already allocated
when the circuit breaker kicks in, the risk of this change introducing a higher chance of master running out of memory should be very small.

Closes #54714

Note: I didn't add the 7.7.x label here yet since that's still up for discussion. Just opening this to show the simplicity of the fix.

Hotfix to not run into stuck snapshots because of master circuit breaking these requests.
Given that these requests are very small and much of the memory associated with them is already allocated
when the circuit breaker kicks in, the risk of this change introducing a higher chance of master running out
of memory should be very small.

Closes #54714
@original-brownbear original-brownbear added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.8.0 labels Apr 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@original-brownbear
Copy link
Contributor Author

Jenkins run elasticsearch-ci/1

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've added the 7.7.1 label as this should definitely go to the 7.7 branch (and can be merged as soon as CI is green). If we end up respinning the build candidate for 7.7.0, we can relabel appropriately here (that's the official process).

@ywelsch ywelsch added the v7.7.1 label Apr 17, 2020
@original-brownbear
Copy link
Contributor Author

Thanks Yannick!

@original-brownbear original-brownbear merged commit 26fd944 into elastic:master Apr 17, 2020
@original-brownbear original-brownbear deleted the exclude-snapshot-shard-status-from-breaker branch April 17, 2020 10:57
original-brownbear added a commit that referenced this pull request Apr 17, 2020
…55376) (#55384)

Hotfix to not run into stuck snapshots because of master circuit breaking these requests.
Given that these requests are very small and much of the memory associated with them is already allocated
when the circuit breaker kicks in, the risk of this change introducing a higher chance of master running out
of memory should be very small.

Closes #54714
original-brownbear added a commit that referenced this pull request Apr 17, 2020
…55376) (#55383)

Hotfix to not run into stuck snapshots because of master circuit breaking these requests.
Given that these requests are very small and much of the memory associated with them is already allocated
when the circuit breaker kicks in, the risk of this change introducing a higher chance of master running out
of memory should be very small.

Closes #54714
@bpintea bpintea added v7.7.0 and removed v7.7.1 labels Apr 21, 2020
@original-brownbear original-brownbear restored the exclude-snapshot-shard-status-from-breaker branch August 6, 2020 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Snapshot Can Become Stuck if Master Circuit-Breaks on Shard Snapshot Update Message (TransportMasterNodeAction does not retry)

5 participants