Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refining the logic for passing from half-open state to closed state #239

Closed
g7ed6e opened this issue Mar 23, 2017 · 10 comments
Closed

Refining the logic for passing from half-open state to closed state #239

g7ed6e opened this issue Mar 23, 2017 · 10 comments

Comments

@g7ed6e
Copy link

g7ed6e commented Mar 23, 2017

Hi and thanks for the great work.
I need to apply a policy when the circuit is in half-open state in order to prevent closing too quickly.
I don't think this feature is in the roadmap today, would it be possible to add it ?
Please see below the part that should be modified.
be448f2c-181e-11e6-925f-700121d39ddc

@reisenberger
Copy link
Member

Hi @railarmenien , thanks for this. It would be good to understand your thinking / need more clearly. What factors are you thinking could be useful to govern the transition back from HalfOpen to Closed state? And: what is the real-world scenario driving this? (all helps us think around how tackle). Thanks!

@g7ed6e
Copy link
Author

g7ed6e commented Mar 27, 2017

Hi @reisenberger.
I would need either to apply a consecutive success count check or a sampling check before transitionning back to Closed state.
The real world scenario is a service hosted in a farm behind a load balancer. Once one node is down the load will be processed by the remaining ones, increasing the response time and leading sometimes to timeout. In this scenario i would have to have fine grained control over the behavior of the CircuitBreaker not necessarily applying the governance strategy of the Closed to Open transition which is the behavior i will get if the first call of the HalfOpen results to a timeout.
Regards

@reisenberger
Copy link
Member

reisenberger commented Apr 1, 2017

@railarmenien I understand the logic that if fewer nodes are available behind a load-balancer, the response times of calls could increase (assuming the system is in some sense close to capacity?), but (with apologies) I am not sure I have understood how you are seeking the circuit to behave differently in light of the greater potential for timeouts.

Are you thinking that the circuit should be more 'lenient' (to make allowances for timeouts being higher, and not break again so easily), or more 'strict' (eg to return to closed less readily, and/or break more easily)?

@g7ed6e g7ed6e changed the title Introducing appliance of a policy when passing from half-open state to closed state Applying a policy when passing from half-open state to closed state Apr 2, 2017
@g7ed6e
Copy link
Author

g7ed6e commented Apr 2, 2017

@reisenberger let's assume the circuit broke before a node fall indeed due to a load close to maximum capacity. Then, before opening gates to consumers (i.e passing from HalfOpen to Closed state), i would like to ensure the fallen node had enough time to recover not to break again immediately. So to measure the parameter "the fallen had enough time to recover" i would like to be able to apply a policy/strategy here (more or less strict, that's to be defined).
My thinking is that we should be able to setup a primary threshold leading to the transition from Closed to Open state and also a secondary threshold leading to the transition from HalfOpen to Closed. This pattern is described here http://blog.octo.com/circuit-breaker-un-pattern-pour-fiabiliser-vos-systemes-distribues-ou-microservices-partie-2/ (in french I apologize) and brings to light the idea
of allowing a few requests to try to reach backend instead of only one when in HalfOpenstate.

@reisenberger
Copy link
Member

reisenberger commented Apr 8, 2017

Perfect @railarmenien ! : glad we are talking about the same thing! Had just wanted to clarify following the timeout comments, because a consecutive success metric at HalfOpen can only make the circuit stricter (more likely to break again) than current implementation. With you on the concept (mais merci aussi pour l'article: pas de probleme a lire en francais!)

@railarmenien We would be very happy for a PR on this if you want to work on one!

Here are some points I thought about when first considering this in January, and again now:

Consecutive count circuit-breaker seems relatively straightforward:

  • Consecutive success count (as you say) is the appropriate (and widely-seen) metric to refine HalfOpen to Closed, for a count-based breaker.
  • HalfOpen state must limit the calls placed as well as measuring incoming results. Otherwise the HalfOpen state is vulnerable to request stampedes: see AdvancedCircuitBreaker allows additional Excute() in HalfOpen state when waiting for first Excute() to finish #216 (an inherited bug which we squashed very quickly!). If the consecutive success count metric required to close again is N, we should also limit calls placed to N.
  • That limit on calls placed in the HalfOpen state should be N calls per breakDuration. As @kharos rightly pointed out, a HalfOpen state may want to try to place more calls periodically, in case the earlier calls have hung.

So the specification (as we might document it in the wiki) would become:

When the circuit is half-open and the consecutiveSuccessRecoveryThreshold is specified (call it N):

  • the circuit will permit N further calls to be placed per breakDuration, as trial calls to determine the circuit's health
  • if the circuit receives N consecutive successes, the circuit will transition back to Closed.
  • if any failures occur before N successes is reached, the circuit will transition immediately back to Open again for the configured timespan.

AdvancedCircuitBreaker case seems more nuanced:

  • For the new metric controlling HalfOpen to Closed, my thought is that users would specify only one additional parameter double recoveryThreshold (not additional values for all three parameters which govern Closed state) (keep things as simple as we can!)
  • double recoveryThreshold would again be a percentage of calls which fail. eg If a user specified failureThreshold: 0.4, recoveryThreshold: 0.2, they would be specifying that they want the failure rate through the circuit to fall from 40% (when circuit breaks) to 20%, before the circuit would transition from HalfOpen to Closed again.
  • The HalfOpen state still exists to place only limited calls: to remain protective of the called system while placing enough calls to determine if the underlying operation has recovered. It should limit calls to the minimum necessary to determine whether the failure rate falls below recoveryThreshold: in this case, N calls, where N is the minimumThroughput which the user defined to make circuit metrics statistically significant. That minimumThroughput becomes the maximum calls which the breaker will place while in HalfOpen state, in any period of breakDuration.
  • For same reasons as for consecutive-count breaker, that upper limit on calls placed in the HalfOpen state should be per some period, say N calls per breakDuration.
  • (EDIT) The influence of the samplingDuration parameter could be retained or could become irrelevant to the HalfOpen state.

What do you think? Does this work for your Use Case? Could we do anything differently? Community views?

@reisenberger reisenberger changed the title Applying a policy when passing from half-open state to closed state Applying a policy (refined metric) when passing from half-open state to closed state Apr 8, 2017
@reisenberger reisenberger changed the title Applying a policy (refined metric) when passing from half-open state to closed state Refining the logic for passing from half-open state to closed state Jun 15, 2017
@pvmraghunandan
Copy link

We are also planning to use in high throughput telemetry ingestion in IoT scenarios which involves business processing and communication with different external systems which are out of our control. We want to use circuit breaker there but at the same time we would like to limit no.of calls (i.e. only allow X calls even if requests are more) during Half-Open and then have different configuration on Half-Open i.e. No.of calls, threshold, duration, minimum throughout incase of Advanced for fine grade control.

@reisenberger
Copy link
Member

reisenberger commented Jan 6, 2018

(triage of open issues)

This (refining the metrics governing exiting half-open) remains a good possible addition to the circuit-breaker!

Because of the core Polly team's desire to deliver events/metrics as the next major feature, the core team hasn't time to work on this (#239) at the moment. Community contributions remain welcome. @reisenberger provided a possible specification/starting point for discussion above. Post on this issue if you're interested in taking this on.

@reisenberger
Copy link
Member

Removing the 'up-for-grabs' label from this. I would prefer to next develop the circuit-breaker by more cleanly factoring out an ICircuitStateController and ICircuitStateStore, to facilitate high-priority features like #287 (Distribitued breaker for serverless/actors scenario). The feature in this issue (refining exit from half-open state) would then become an option (also easier to implement) of a custom ICircuitStateController (rather than adding more features to the existing pre-defined circuit-breakers).

@reisenberger
Copy link
Member

Closing due to no further community request/comment over 24 months.

@djkarlos
Copy link

Just wondering whether anything came of this since it was closed? I too am looking for a way to transition from HalfOpen back to closed based on a threshold of successful responses rather than just on a single success.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants