Refining the logic for passing from half-open state to closed state #239

g7ed6e · 2017-03-23T08:18:28Z

Hi and thanks for the great work.
I need to apply a policy when the circuit is in half-open state in order to prevent closing too quickly.
I don't think this feature is in the roadmap today, would it be possible to add it ?
Please see below the part that should be modified.

reisenberger · 2017-03-25T14:52:07Z

Hi @railarmenien , thanks for this. It would be good to understand your thinking / need more clearly. What factors are you thinking could be useful to govern the transition back from HalfOpen to Closed state? And: what is the real-world scenario driving this? (all helps us think around how tackle). Thanks!

g7ed6e · 2017-03-27T07:18:05Z

Hi @reisenberger.
I would need either to apply a consecutive success count check or a sampling check before transitionning back to Closed state.
The real world scenario is a service hosted in a farm behind a load balancer. Once one node is down the load will be processed by the remaining ones, increasing the response time and leading sometimes to timeout. In this scenario i would have to have fine grained control over the behavior of the CircuitBreaker not necessarily applying the governance strategy of the Closed to Open transition which is the behavior i will get if the first call of the HalfOpen results to a timeout.
Regards

reisenberger · 2017-04-01T09:54:25Z

@railarmenien I understand the logic that if fewer nodes are available behind a load-balancer, the response times of calls could increase (assuming the system is in some sense close to capacity?), but (with apologies) I am not sure I have understood how you are seeking the circuit to behave differently in light of the greater potential for timeouts.

Are you thinking that the circuit should be more 'lenient' (to make allowances for timeouts being higher, and not break again so easily), or more 'strict' (eg to return to closed less readily, and/or break more easily)?

g7ed6e · 2017-04-02T07:00:35Z

@reisenberger let's assume the circuit broke before a node fall indeed due to a load close to maximum capacity. Then, before opening gates to consumers (i.e passing from HalfOpen to Closed state), i would like to ensure the fallen node had enough time to recover not to break again immediately. So to measure the parameter "the fallen had enough time to recover" i would like to be able to apply a policy/strategy here (more or less strict, that's to be defined).
My thinking is that we should be able to setup a primary threshold leading to the transition from Closed to Open state and also a secondary threshold leading to the transition from HalfOpen to Closed. This pattern is described here http://blog.octo.com/circuit-breaker-un-pattern-pour-fiabiliser-vos-systemes-distribues-ou-microservices-partie-2/ (in french I apologize) and brings to light the idea
of allowing a few requests to try to reach backend instead of only one when in HalfOpenstate.

reisenberger · 2017-04-08T11:05:29Z

Perfect @railarmenien ! : glad we are talking about the same thing! Had just wanted to clarify following the timeout comments, because a consecutive success metric at HalfOpen can only make the circuit stricter (more likely to break again) than current implementation. With you on the concept (mais merci aussi pour l'article: pas de probleme a lire en francais!)

@railarmenien We would be very happy for a PR on this if you want to work on one!

Here are some points I thought about when first considering this in January, and again now:

Consecutive count circuit-breaker seems relatively straightforward:

Consecutive success count (as you say) is the appropriate (and widely-seen) metric to refine HalfOpen to Closed, for a count-based breaker.
HalfOpen state must limit the calls placed as well as measuring incoming results. Otherwise the HalfOpen state is vulnerable to request stampedes: see AdvancedCircuitBreaker allows additional Excute() in HalfOpen state when waiting for first Excute() to finish #216 (an inherited bug which we squashed very quickly!). If the consecutive success count metric required to close again is N, we should also limit calls placed to N.
That limit on calls placed in the HalfOpen state should be N calls per breakDuration. As @kharos rightly pointed out, a HalfOpen state may want to try to place more calls periodically, in case the earlier calls have hung.

So the specification (as we might document it in the wiki) would become:

When the circuit is half-open and the consecutiveSuccessRecoveryThreshold is specified (call it N):

the circuit will permit N further calls to be placed per breakDuration, as trial calls to determine the circuit's health

if the circuit receives N consecutive successes, the circuit will transition back to Closed.

if any failures occur before N successes is reached, the circuit will transition immediately back to Open again for the configured timespan.

AdvancedCircuitBreaker case seems more nuanced:

For the new metric controlling HalfOpen to Closed, my thought is that users would specify only one additional parameter double recoveryThreshold (not additional values for all three parameters which govern Closed state) (keep things as simple as we can!)
double recoveryThreshold would again be a percentage of calls which fail. eg If a user specified failureThreshold: 0.4, recoveryThreshold: 0.2, they would be specifying that they want the failure rate through the circuit to fall from 40% (when circuit breaks) to 20%, before the circuit would transition from HalfOpen to Closed again.
The HalfOpen state still exists to place only limited calls: to remain protective of the called system while placing enough calls to determine if the underlying operation has recovered. It should limit calls to the minimum necessary to determine whether the failure rate falls below recoveryThreshold: in this case, N calls, where N is the minimumThroughput which the user defined to make circuit metrics statistically significant. That minimumThroughput becomes the maximum calls which the breaker will place while in HalfOpen state, in any period of breakDuration.
For same reasons as for consecutive-count breaker, that upper limit on calls placed in the HalfOpen state should be per some period, say N calls per breakDuration.
(EDIT) The influence of the samplingDuration parameter could be retained or could become irrelevant to the HalfOpen state.

What do you think? Does this work for your Use Case? Could we do anything differently? Community views?

pvmraghunandan · 2017-07-17T05:21:44Z

We are also planning to use in high throughput telemetry ingestion in IoT scenarios which involves business processing and communication with different external systems which are out of our control. We want to use circuit breaker there but at the same time we would like to limit no.of calls (i.e. only allow X calls even if requests are more) during Half-Open and then have different configuration on Half-Open i.e. No.of calls, threshold, duration, minimum throughout incase of Advanced for fine grade control.

reisenberger · 2018-01-06T10:43:39Z

(triage of open issues)

This (refining the metrics governing exiting half-open) remains a good possible addition to the circuit-breaker!

Because of the core Polly team's desire to deliver events/metrics as the next major feature, the core team hasn't time to work on this (#239) at the moment. Community contributions remain welcome. @reisenberger provided a possible specification/starting point for discussion above. Post on this issue if you're interested in taking this on.

reisenberger · 2018-09-08T09:29:16Z

Removing the 'up-for-grabs' label from this. I would prefer to next develop the circuit-breaker by more cleanly factoring out an ICircuitStateController and ICircuitStateStore, to facilitate high-priority features like #287 (Distribitued breaker for serverless/actors scenario). The feature in this issue (refining exit from half-open state) would then become an option (also easier to implement) of a custom ICircuitStateController (rather than adding more features to the existing pre-defined circuit-breakers).

reisenberger · 2019-12-06T08:55:29Z

Closing due to no further community request/comment over 24 months.

djkarlos · 2021-06-11T14:27:25Z

Just wondering whether anything came of this since it was closed? I too am looking for a way to transition from HalfOpen back to closed based on a threshold of successful responses rather than just on a single success.

reisenberger added the enhancement label Mar 25, 2017

g7ed6e changed the title ~~Introducing appliance of a policy when passing from half-open state to closed state~~ Applying a policy when passing from half-open state to closed state Apr 2, 2017

reisenberger changed the title ~~Applying a policy when passing from half-open state to closed state~~ Applying a policy (refined metric) when passing from half-open state to closed state Apr 8, 2017

reisenberger added the up-for-grabs label Apr 8, 2017

reisenberger mentioned this issue Apr 8, 2017

Future Polly Roadmap - **community feedback sought** #90

Closed

reisenberger mentioned this issue Jun 8, 2017

Half Open State Handling #254

Closed

reisenberger changed the title ~~Applying a policy (refined metric) when passing from half-open state to closed state~~ Refining the logic for passing from half-open state to closed state Jun 15, 2017

reisenberger added the advanced label Jul 6, 2017

reisenberger removed the up-for-grabs label Sep 8, 2018

reisenberger closed this as completed Dec 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refining the logic for passing from half-open state to closed state #239

Refining the logic for passing from half-open state to closed state #239

g7ed6e commented Mar 23, 2017

reisenberger commented Mar 25, 2017

g7ed6e commented Mar 27, 2017

reisenberger commented Apr 1, 2017 •

edited

Loading

g7ed6e commented Apr 2, 2017 •

edited

Loading

reisenberger commented Apr 8, 2017 •

edited

Loading

pvmraghunandan commented Jul 17, 2017

reisenberger commented Jan 6, 2018 •

edited

Loading

reisenberger commented Sep 8, 2018

reisenberger commented Dec 6, 2019

djkarlos commented Jun 11, 2021

Refining the logic for passing from half-open state to closed state #239

Refining the logic for passing from half-open state to closed state #239

Comments

g7ed6e commented Mar 23, 2017

reisenberger commented Mar 25, 2017

g7ed6e commented Mar 27, 2017

reisenberger commented Apr 1, 2017 • edited Loading

g7ed6e commented Apr 2, 2017 • edited Loading

reisenberger commented Apr 8, 2017 • edited Loading

pvmraghunandan commented Jul 17, 2017

reisenberger commented Jan 6, 2018 • edited Loading

reisenberger commented Sep 8, 2018

reisenberger commented Dec 6, 2019

djkarlos commented Jun 11, 2021

reisenberger commented Apr 1, 2017 •

edited

Loading

g7ed6e commented Apr 2, 2017 •

edited

Loading

reisenberger commented Apr 8, 2017 •

edited

Loading

reisenberger commented Jan 6, 2018 •

edited

Loading