Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for external storage in Circuit Breaker #215

Closed
rahulrai-in opened this issue Jan 20, 2017 · 11 comments
Closed

Support for external storage in Circuit Breaker #215

rahulrai-in opened this issue Jan 20, 2017 · 11 comments

Comments

@rahulrai-in
Copy link

I want to implement Circuit Breaker in a distributed system. I would like to share circuit breaker state among all instances of the application. How can I support this scenario?

@reisenberger
Copy link
Member

reisenberger commented Jan 21, 2017

@rahulrai-in Polly does not provide an in-built way to share circuit state between different processes / instances of an application.

Previous discussion on Polly highlighted possible risks with this pattern intrinsic to a distributed system. Essentially, if node A1 is experiencing problems in calls to M, is it safe to assume that other nodes (A2, A3, A4 say) also would? What if the problem is specific only to the network path between A1 and M, but not A2/3/4 to M? Circuit-state co-ordination could then be dangerous and unhelpful: it risks telling A2/3/4 they can't talk to M, when in fact they still can. Likewise if the problem is something internal only to A1 (eg burst of requests or other malfunction causing local CPU/thread/memory starvation). What if the issue is a general network problem between A1 and anywhere? Most methods to co-ordinate circuit state (shared db; Redis cache; messaging) would depend on the same network ...

Per these risks, some commentators suggest it may be simpler, and safer, just to treat A1, A2, A3, A4 as independent actors, and let them pop their own circuits on the only thing they can know to be true ... that they aren't getting answers from calls to M.

@reisenberger
Copy link
Member

reisenberger commented Jan 21, 2017

@rahulrai-in If otoh making a judgment the benefits outweigh the risks for a particular installation (effectively, you take the risk of assuming the problem is with M, not anywhere else), you could support this scenario with no changes to current Polly thus:

  • Make the onBreak() and onReset() delegates of the CircuitBreaker, broadcast the break and reset to other instances of the application, for example via a messaging system.
  • All instances listen for breaks and resets.
  • On hearing of a break made by another instance, they would manually break their instance using Isolate(). [or recode Polly to allow manually break only for a set period]
  • On hearing of the reset made by the original instance, they would manually reset using Reset().

(a possibility, not necessarily a recommendation)
.

To implement within Polly, alternatively, a genuine shared external storage of the circuit state updated on each and every call, would require coding a new ICircuitController implementation. OnActionPreExecute() would read the circuit state from shared storage. Other methods such as OnActionSuccess(), OnActionFailure() etc would update it.

We would not want the core Polly package to take a dependency on a particular external storage system, so an implementation would have to go in an external package such as Polly.SharedStateCircuitBreaker, rather than in the core Polly package.

Both approaches imply extra network overhead, and are vulnerable to failure if the network isn't available.

@rahulrai-in
Copy link
Author

Thank you for the insights @reisenberger. In my opinion, there are still benefits to making Polly share state among the various services. I am working on building a stateless Microservices application in which the client requests may arrive at any instance of Microservice. I want the requests to behave in a consistent manner and not fail individually on every instance. This feature would also be beneficial in actor model as without this feature several instances of the same actor will try to communicate with external dependency before failing. This scenario is also supported by Akka.

I agree to the fact that this should be an opt-in rather than a compulsory feature and I would love to submit a PR to include this feature in Polly. We can design this feature to trip the circuit if failure is reported from at least N instances so that communication failure between a single node and dependency does not trip the circuit.

This is a really good pattern for use in cloud applications esp. Microservices.

@reisenberger
Copy link
Member

reisenberger commented Jan 21, 2017

@moonytheloony The notion of a quorum of services which must independently report their circuit broken, before forcing others to break, seems a reasonable way of alleviating some of the risks around a single faulting service incorrectly influencing others which might otherwise succeed.

Thinking quickly, we would possibly need look at what needs adding to ICircuitController<TResult>, and what of abstract class CircuitStateController<TResult> and its existing implementations might need to be marked virtual, in order that an implementation for this feature could intercept circuit operations, to share and manipulate state. I will aim to look at this, or: please feel free to look yourself!

What would you envisage needing sharing? The underlying statistical state of the circuit (numbers of successes/failures etc metrics), or just the major events/states? (breaks/resets; CircuitState). Or? EDIT: @reisenberger come to own conclusion that only the major events/states need sharing (as tarup also says later)

Note that we are already simultaneously engaged on other large-scale features, including caching and metrics, so there are a few competing priorities in the short-term.

Feel free to join the Polly slack channel for on-going discussion on features.

Ideally, any solution which would eventually be integrated into the Polly stable would be generalised: ie work for both existing kinds of CircuitBreaker; and connect to external storage with an interface, in order that alternative external storage implementations could easily be coded by interested parties.

What do you think?

@rahulrai-in
Copy link
Author

I think to support the quorum policy, we need to add another Policy class or update CircuitBreakerPolicy class to add support for Quorum policy. By doing this we can add custom policies to the Circuit Breaker. Next, CircuitBreakerState property should have the support to be loaded from a data store. Currently, this property returns what is stored in a local variable.

@rahulrai-in
Copy link
Author

Adding @tarunp who is a .net architect specializing in distributed systems to add his thoughts and contribute to this feature.

@reisenberger
Copy link
Member

@bunceg, you were interested in this pattern too? (circuitBreakers breaking in common across distributed systems.) Please join the conversation as feature evolves, if you have angles to share.

@tarunp
Copy link

tarunp commented Feb 2, 2017

Sure @moonytheloony . I also faced the same issue in one of my application consisting of background services. If we can have quorum concept within the Polly itself then it will be really useful for multi instance scenarios. I think major events or state of the circuit should be good enough to determine the state of each node.

@reisenberger
Copy link
Member

Another request for this feature at #259

@reisenberger
Copy link
Member

Proposal for DistributedCircuitBreaker now posted as #287

@reisenberger
Copy link
Member

Closing as duplicate, main discussion now on #287

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants