Add waiting state and clean up tests #15

DrClick · 2017-09-12T15:43:36Z

This address the issue of a waiting state, and cleans up the tests. Sorry for the last PR, intellij moved code around automagically for me.

…n general

diego-alvarez-hs · 2017-09-15T15:48:28Z

Hi @DrClick sorry for the delay,
I'm a little confused regarding this PR and #12

Is this PR a new version of the previous one? if yes, wouldn't be better to rebase this changes in the previous PR so we can continue the conversation there?

diego-alvarez-hs · 2017-09-15T16:17:13Z

@DrClick are the tests passing in your local machine?

I'm getting this consistent failure:

[info] - should be tripped when custom failures are detected *** FAILED *** (194 milliseconds)
[info]   A timeout occurred waiting for a future to complete. Queried 10 times, sleeping 10 milliseconds between each query. (CircuitBreakerTest.scala:666)
[info]   org.scalatest.concurrent.Futures$FutureConcept$$anon$1:

diego-alvarez-hs · 2017-09-15T16:18:27Z

src/main/scala/com/hootsuite/circuitbreaker/CircuitBreaker.scala

+      Thread.sleep(retryDelay)
+    }.onComplete(_ => {
+      cb.attemptResetBrokenState(this, 0)
+    })


this look suspicious, what were you trying to do here @DrClick , as blocking the tread and using the global implicit are not good idea (might be good just for testing purposes)

In order to schedule tasks in the future, we would need something like akka after
http://doc.akka.io/api/akka/2.5.4/akka/pattern/Patterns$.html#after[T](duration:scala.concurrent.duration.FiniteDuration,scheduler:akka.actor.Scheduler,context:scala.concurrent.ExecutionContext,value:scala.concurrent.Future[T]):scala.concurrent.Future[T]

but that would make the lib depends on akka... 🤔

ohh I thought you were trying to make the CB auto-reopen itself, but it seems it's just for the first attempt, why is this needed?

So do you have time for chat on this. Is the objection just the use of the global execution context (easily addressable) or is sleeping the thread the Future is on (effectively a timer).

diego-alvarez-hs · 2017-09-15T17:13:16Z

src/main/scala/com/hootsuite/circuitbreaker/CircuitBreaker.scala

  val failLimit: Int,
  val retryDelay: FiniteDuration,
+  val isExponentialBackoff: Boolean = false,
+  val exponentialRetryCap: Option[Int],


we can have just one variable exponentialRetryCap as when it's defined we know the caller wanted to use the exponential backoff making the isExponentialBackoff unnecessary in this case

Ok, i can defer to your call on this, I found it to be a bit criptic. I actually think it might make more sense to have Exponential Circuit Breaker Class on its own to prevent this argument sprawl.

diego-alvarez-hs · 2017-09-15T17:15:37Z

src/main/scala/com/hootsuite/circuitbreaker/CircuitBreaker.scala

  val failLimit: Int,
  val retryDelay: FiniteDuration,
+  val isExponentialBackoff: Boolean = false,
+  val exponentialRetryCap: Option[Int],


It'd be good to have some strategy involved here where we can pass it as a parameter, a strategy could involved max attempts, other could be max delay, fibonacci delay, etc other strategy could be a combination of different strategies, but we can improve this in following PRs

diego-alvarez-hs · 2017-09-15T17:27:59Z

it seems the version on #12 is simpler regarding adding an exponential retry backoff.

DrClick · 2017-09-15T18:19:08Z

Regarding the two PRS, this was meant to be an extension of the first, adding the cap. I can merge them together. Let me explain what we are trying to do, so maybe the context makes more sense. 1) Exponential backoff. 2) After a certain number of retries, we want to shut down the docker and allow the mesosphere infrastructure to do its thing.

It was my understanding of the code base that the broken state was really two states in once broken/immediately fail and broken/retry so I am splitting into two states. The idea with the timer was to actively transition the state from broken/fail to broken/retry or more cleanly (flow, broken, retry).

RE the test failing, let me double check, i recall they were, but maybe something slipped through.

I dont think including AKKA to do this timer function is the right idea. If you dont think the timer is a good idea, I can be convinced. I may be missing a base understanding of the thread blocking issue and if so, i look forward to understanding it more. I thought it wold only block the Future thread that it is on. (Is that a problem)?

DrClick · 2017-09-15T18:22:13Z

Maybe use this: https://github.com/jvirtanen/twitter-util/blob/master/util-core/src/main/scala/com/twitter/util/Timer.scala

diego-alvarez-hs · 2017-09-15T21:03:46Z

thanks for the additional context, it helped us understand more your case.

Let's try to find an alternative to the scheduled task of the first attempt of cb.attemptResetBrokenState(this, 0) , so we can avoid adding more dependencies like akka/Twitter future, and as it's used just once, let's see what alternatives are for it. The version on #12 didn't need this.

Additionally, please let me check with the team about splitting the broken and retry in two states, it sounds like a good idea; I just want to have some feedback from some people that initially created the code or might have more context about the initial design.

It looks like your scenario might have some similarities to something that we're doing internally, CB + retryBackoff, if the CB is short-circuiting the calls is because there is something bad happening and we haven't been able to recover from it, and the health checks are going to start alerting and the infrastructure can act according to the defined rules (restart instances, add/remove, etc)

The main difference is that we've separated the CB from the RetryBackoff util, so we have some Retrier utility that just keeps retrying an operation given some strategy, and on top of that we surround the code with the CB, this now gives us for free that we can now apply the retry logic to the single operation that the CB let pass to check if it can reset/recover from the bad state.

This might be one of the reasons why we haven't added the retry feature to the CB,

Our code looks like:

sealed trait RetrierWithCircuitBreaker {

  def retrier: Retrier
  def circuitBreaker: CircuitBreaker

  def apply[T: ClassTag](
    operationIdentifier: String
  )(operation: => Future[T])(implicit ec: ExecutionContext, scheduler: Scheduler): Future[T] =
    circuitBreaker.async() { // here we combine the CB + the retryBackoff
      retrier(operationIdentifier) { operation }
    }
...
}

This composition allows to use it like this:

retrierWithCircuitBreaker: RetrierWithCircuitBreaker // injected in the class a precofigured CB + retryBackoff

  override def getSomeData(
    profileId: Long
  )(implicit ec: ExecutionContext): Future[OurAdtErrorOr[SomeData]] = {
    val operationName = s"xxx.yyyy.getSomeData: $profileId"
    retrierWithCircuitBreaker(operationName) {
      unsafeTaskThatCanThrowOrReturnNonExpectedResults(profileId)
    }.recover {
      // represent failure as a value of the expected ADT
      case e: Throwable => Left(OurAdtErrorOr.UnexpectedException(e))
    }

  }

The Retrier and the CB is configured with something like:

service {
  xxxx {
    retry-backoff {
      backoff-strategy {
        type = "fibonaccibackoffstrategy"
        delay = 50 milliseconds
        max-delay = 5 seconds
      }
      stop-strategy {
        type = "maxretriesstrategy"
        retries = 12
        max-accumulated-delay = 2 minutes
      }
    }
    circuit-breaker {
      name = "cb-example"
      fail-limit = 10
      retry-delay = 500 milliseconds
    }
  }
}

There is a tradeoff regarding that this approach might the CB detect permanent failures as quickly as before, as when the CB let pass an operation this operation will start retrying until the attempts are exhausted. But, it allows us to recover more quickly from transient states.

I'm not sure if this last information might help you design your case, but it's good to share and work out of loud our experiences.

diego-alvarez-hs · 2017-10-02T18:37:56Z

Hi @DrClick
It seems the newly added state AttemptResetState might be unnecessary, as it's doing the same as the "old" BrokenState, in this PR the BrokenState is just moving directly to AttemptResetState.

What do you think about closing this PR and moving the effort to #12 ?

#12 just needs to add some kind of limit (time or retries or both) and integrate the new arguments in an option argument (case class?)

CLAassistant · 2020-08-25T19:59:31Z

All committers have signed the CLA.

DrClick added 2 commits September 8, 2017 00:43

addressing issue 11, added support for exponential backoff

0d9bf50

added exponentialRetryCap, attemptReset state, and cleaned up tests i…

e2379e6

…n general

diego-alvarez-hs reviewed Sep 15, 2017

View reviewed changes

Add waiting state and clean up tests #15

Are you sure you want to change the base?

Add waiting state and clean up tests #15

Uh oh!

Conversation

DrClick commented Sep 12, 2017

Uh oh!

diego-alvarez-hs commented Sep 15, 2017

Uh oh!

diego-alvarez-hs commented Sep 15, 2017

Uh oh!

diego-alvarez-hs Sep 15, 2017

Choose a reason for hiding this comment

Uh oh!

diego-alvarez-hs Sep 15, 2017

Choose a reason for hiding this comment

Uh oh!

diego-alvarez-hs Sep 15, 2017

Choose a reason for hiding this comment

Uh oh!

DrClick Sep 15, 2017

Choose a reason for hiding this comment

Uh oh!

diego-alvarez-hs Sep 15, 2017

Choose a reason for hiding this comment

Uh oh!

DrClick Sep 15, 2017

Choose a reason for hiding this comment

Uh oh!

diego-alvarez-hs Sep 15, 2017

Choose a reason for hiding this comment

Uh oh!

diego-alvarez-hs commented Sep 15, 2017

Uh oh!

DrClick commented Sep 15, 2017

Uh oh!

DrClick commented Sep 15, 2017

Uh oh!

diego-alvarez-hs commented Sep 15, 2017

Uh oh!

diego-alvarez-hs commented Oct 2, 2017

Uh oh!

CLAassistant commented Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Aug 25, 2020 •

edited

Loading