Skip to content

Actual LRPs that are not desired on diego 2.25.0 #424

@scottillogical

Description

@scottillogical

Summary

Containers are running that are not desired, diego is failing to remove the actual lrps

Expected Result

If a LRP is not desired it should not be running

Actual Result

The lrp is still running. This an the output from cfdot actual-lrps. The process guid 4002 is the newly deployed container that should be running. Process guid 3952 should not be running

{
  "process_guid": "4002",
  "index": 1,
  "domain": "REDACTED_DOMAIN",
  "instance_guid": "cc061ce2-9a96-4cd6-6b49-3bf5",
  "cell_id": "229fbf0e-f289-4014-94ff-4a9459199639",
  "address": "172.16.83.3",
  "ports": null,
  "instance_address": "172.31.11.90",
  "crash_count": 0,
  "state": "RUNNING",
  "since": 1554820047032546600,
  "modification_tag": {
    "epoch": "979c0c70-e493-41a9-5729-e67bb619e63f",
    "index": 2
  },
  "presence": "ORDINARY"
}
{
  "process_guid": "3952",
  "index": 3,
  "domain": "REDACTED_DOMAIN",
  "instance_guid": "bbc7e0de-c4e0-4625-43f5-1d20",
  "cell_id": "7eec1496-7803-4cc0-9035-f029867c3994",
  "address": "172.16.91.3",
  "ports": null,
  "instance_address": "172.31.11.14",
  "crash_count": 0,
  "state": "RUNNING",
  "since": 1554403876774640000,
  "modification_tag": {
    "epoch": "40ea69a5-55d8-4345-7319-a4fd28de943b",
    "index": 2
  },
  "presence": "ORDINARY"
}

retire-actual-lrp rreturns 404 not found

$ cfdot retire-actual-lrp 3952 4
Error: BBS error
Type 13: ResourceNotFound
Message: the requested resource could not be found

Context

We are running on postgres, I'm not sure if there might be a data issue specific to postgres. Also we have garden's containerd mode disabled (mostly by accident, looking at turning that on since it is cf-deployment default). edit: I have since updated to use the default which is on, and it doesn't appear to be related to my issue

Steps to Reproduce

While this is appearing somewhat frequently in a couple of our environments, I don't have reproduction steps. We have seen it appear multiple times in the past few weeks since upgrading from diego 2.0.0 to 2.25.0. In the case I investigated yesterday, an application had a "zombie" and after I recreated the cell, it was gone. When a few applications were redeployed on this deployment later, the zombies re-appeared for the same app I am working on.

Possible Causes or Fixes (optional)

Recreating the cell appears to cause the "zombies" to be removed. However we have observed them re-appearing afterwards following application deployments.

Additional Text Output or Screenshots (optional)

These bbs logs are still appearing. It seems to think it removes the running lrp during convergence but in reality does not.

{"timestamp":"2019-04-09T15:33:38.405740495Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":"REDACTED_DOMAIN","index":3,"instance-key":{"instance_guid":"bbc7e0de-c4e0-4625-43f5-1d20","cell_id":"7eec1496-7803-4cc0-9035-f029867c3994"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.2.2"}}
{"timestamp":"2019-04-09T15:33:38.407071547Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN","duration":1337428,"index":3,"instance-key":{"instance_guid":"bbc7e0de-c4e0-4625-43f5-1d20","cell_id":"7eec1496-7803-4cc0-9035-f029867c3994"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.2.2"}}
{"timestamp":"2019-04-09T15:33:38.407302386Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":REDACTED_DOMAIN","index":4,"instance-key":{"instance_guid":"b443f7f3-87ba-415a-5aaf-5cc1","cell_id":"588d6934-7845-44c9-bbeb-6f053299d0b5"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.3.2"}}
{"timestamp":"2019-04-09T15:33:38.407447724Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":"REDACTED_DOMAIN2","index":0,"instance-key":{"instance_guid":"8017228a-bf1f-499b-561d-f7bc","cell_id":"a98e36ce-0bce-42b2-852a-63b82d0e4cf8"},"process-guid":"3963","process_guid":"3963","retiring_lrp_count":5,"session":"14316101.4.2"}}
{"timestamp":"2019-04-09T15:33:38.407666153Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":"REDACTED_DOMAIN","index":2,"instance-key":{"instance_guid":"3641ecf0-31fc-44c2-72e9-e233","cell_id":"bf022a89-d410-4ff9-9bdd-5811644c3810"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.5.2"}}
{"timestamp":"2019-04-09T15:33:38.408581342Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN1","duration":1313703,"index":4,"instance-key":{"instance_guid":"b443f7f3-87ba-415a-5aaf-5cc1","cell_id":"588d6934-7845-44c9-bbeb-6f053299d0b5"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.3.2"}}
{"timestamp":"2019-04-09T15:33:38.408698106Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN","duration":1032599,"index":2,"instance-key":{"instance_guid":"3641ecf0-31fc-44c2-72e9-e233","cell_id":"bf022a89-d410-4ff9-9bdd-5811644c3810"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.5.2"}}
{"timestamp":"2019-04-09T15:33:38.408858842Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":"REDACTED_DOMAIN","index":0,"instance-key":{"instance_guid":"4bae4dc0-0361-4ec0-462d-01c4","cell_id":"a98e36ce-0bce-42b2-852a-63b82d0e4cf8"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.1.2"}}
{"timestamp":"2019-04-09T15:33:38.409101410Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN2","duration":1658553,"index":0,"instance-key":{"instance_guid":"8017228a-bf1f-499b-561d-f7bc","cell_id":"a98e36ce-0bce-42b2-852a-63b82d0e4cf8"},"process-guid":"3963","process_guid":"3963","retiring_lrp_count":5,"session":"14316101.4.2"}}
{"timestamp":"2019-04-09T15:33:38.410228109Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN","duration":1377834,"index":0,"instance-key":{"instance_guid":"4bae4dc0-0361-4ec0-462d-01c4","cell_id":"a98e36ce-0bce-42b2-852a-63b82d0e4cf8"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.1.2"}}
{"timestamp":"2019-04-09T15:33:38.413241629Z","level":"info","source":"bbs","message":"bbs.executing-convergence.converge-lrps-done","data":{"session":"14316100"}}```

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions