Skip to content

RetireActualLRP is not emitting events #820

@vlast3k

Description

@vlast3k

Summary

cf restart-app-instance <app> <index> ends up in calling bbs/controllers/actual_lrp_lifecycle_controller.go/RetireActualLRP - here. It the LRP is Claimed and running, it will not emit events.

As a consequence

  • gorouter does not remove the route to the instance being restarted, and it could get requests, despite it being in graceful shutdown mode
  • the new instance is started only after the first one exited (or got killed)

The expected outcome of cf restart-app-instace is that:

  • The route to this instance is immediately removed from gorouter

Steps to Reproduce

Described here https://github.com/vlast3k/dontdie/tree/main

Diego repo

https://github.com/cloudfoundry/bbs/tree/main

Environment Details

diego-release - 2.80.0

Possible Causes or Fixes (optional)

The removeLRP method is only called if the LRP is Unclaimed or Crashed, or in case of errors - here

	removeLRP := func() error {
		err = h.db.RemoveActualLRP(ctx, logger, lrp.ProcessGuid, lrp.Index, &lrp.ActualLRPInstanceKey)
		if err == nil {
			newLRPs = eventCalculator.RecordChange(lrp, nil, lrps)
		}
		return err
	}

	for retryCount := 0; retryCount < models.RetireActualLRPRetryAttempts; retryCount++ {
		switch lrp.State {
		case models.ActualLRPStateUnclaimed, models.ActualLRPStateCrashed:
			err = removeLRP()
		case models.ActualLRPStateClaimed, models.ActualLRPStateRunning:
			cell, err = h.serviceClient.CellById(logger, lrp.CellId)

The change in this draft PR fixes the issue (w/o breaking existing tests)
cloudfoundry/bbs#72

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions