-
Notifications
You must be signed in to change notification settings - Fork 217
Closed
Labels
Description
Summary
cf restart-app-instance <app> <index>
ends up in calling bbs/controllers/actual_lrp_lifecycle_controller.go/RetireActualLRP
- here. It the LRP is Claimed and running, it will not emit events.
As a consequence
gorouter
does not remove the route to the instance being restarted, and it could get requests, despite it being ingraceful shutdown mode
- the new instance is started only after the first one exited (or got killed)
The expected outcome of cf restart-app-instace
is that:
- The route to this instance is immediately removed from
gorouter
Steps to Reproduce
Described here https://github.com/vlast3k/dontdie/tree/main
Diego repo
https://github.com/cloudfoundry/bbs/tree/main
Environment Details
Possible Causes or Fixes (optional)
The removeLRP
method is only called if the LRP is Unclaimed or Crashed, or in case of errors - here
removeLRP := func() error {
err = h.db.RemoveActualLRP(ctx, logger, lrp.ProcessGuid, lrp.Index, &lrp.ActualLRPInstanceKey)
if err == nil {
newLRPs = eventCalculator.RecordChange(lrp, nil, lrps)
}
return err
}
for retryCount := 0; retryCount < models.RetireActualLRPRetryAttempts; retryCount++ {
switch lrp.State {
case models.ActualLRPStateUnclaimed, models.ActualLRPStateCrashed:
err = removeLRP()
case models.ActualLRPStateClaimed, models.ActualLRPStateRunning:
cell, err = h.serviceClient.CellById(logger, lrp.CellId)
The change in this draft PR fixes the issue (w/o breaking existing tests)
cloudfoundry/bbs#72