-
Notifications
You must be signed in to change notification settings - Fork 217
Description
Summary
Containers are running that are not desired, diego is failing to remove the actual lrps
Expected Result
If a LRP is not desired it should not be running
Actual Result
The lrp is still running. This an the output from cfdot actual-lrps. The process guid 4002 is the newly deployed container that should be running. Process guid 3952 should not be running
{
"process_guid": "4002",
"index": 1,
"domain": "REDACTED_DOMAIN",
"instance_guid": "cc061ce2-9a96-4cd6-6b49-3bf5",
"cell_id": "229fbf0e-f289-4014-94ff-4a9459199639",
"address": "172.16.83.3",
"ports": null,
"instance_address": "172.31.11.90",
"crash_count": 0,
"state": "RUNNING",
"since": 1554820047032546600,
"modification_tag": {
"epoch": "979c0c70-e493-41a9-5729-e67bb619e63f",
"index": 2
},
"presence": "ORDINARY"
}
{
"process_guid": "3952",
"index": 3,
"domain": "REDACTED_DOMAIN",
"instance_guid": "bbc7e0de-c4e0-4625-43f5-1d20",
"cell_id": "7eec1496-7803-4cc0-9035-f029867c3994",
"address": "172.16.91.3",
"ports": null,
"instance_address": "172.31.11.14",
"crash_count": 0,
"state": "RUNNING",
"since": 1554403876774640000,
"modification_tag": {
"epoch": "40ea69a5-55d8-4345-7319-a4fd28de943b",
"index": 2
},
"presence": "ORDINARY"
}
retire-actual-lrp rreturns 404 not found
$ cfdot retire-actual-lrp 3952 4
Error: BBS error
Type 13: ResourceNotFound
Message: the requested resource could not be found
Context
We are running on postgres, I'm not sure if there might be a data issue specific to postgres. Also we have garden's containerd mode disabled (mostly by accident, looking at turning that on since it is cf-deployment default). edit: I have since updated to use the default which is on, and it doesn't appear to be related to my issue
Steps to Reproduce
While this is appearing somewhat frequently in a couple of our environments, I don't have reproduction steps. We have seen it appear multiple times in the past few weeks since upgrading from diego 2.0.0 to 2.25.0. In the case I investigated yesterday, an application had a "zombie" and after I recreated the cell, it was gone. When a few applications were redeployed on this deployment later, the zombies re-appeared for the same app I am working on.
Possible Causes or Fixes (optional)
Recreating the cell appears to cause the "zombies" to be removed. However we have observed them re-appearing afterwards following application deployments.
Additional Text Output or Screenshots (optional)
These bbs logs are still appearing. It seems to think it removes the running lrp during convergence but in reality does not.
{"timestamp":"2019-04-09T15:33:38.405740495Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":"REDACTED_DOMAIN","index":3,"instance-key":{"instance_guid":"bbc7e0de-c4e0-4625-43f5-1d20","cell_id":"7eec1496-7803-4cc0-9035-f029867c3994"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.2.2"}}
{"timestamp":"2019-04-09T15:33:38.407071547Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN","duration":1337428,"index":3,"instance-key":{"instance_guid":"bbc7e0de-c4e0-4625-43f5-1d20","cell_id":"7eec1496-7803-4cc0-9035-f029867c3994"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.2.2"}}
{"timestamp":"2019-04-09T15:33:38.407302386Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":REDACTED_DOMAIN","index":4,"instance-key":{"instance_guid":"b443f7f3-87ba-415a-5aaf-5cc1","cell_id":"588d6934-7845-44c9-bbeb-6f053299d0b5"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.3.2"}}
{"timestamp":"2019-04-09T15:33:38.407447724Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":"REDACTED_DOMAIN2","index":0,"instance-key":{"instance_guid":"8017228a-bf1f-499b-561d-f7bc","cell_id":"a98e36ce-0bce-42b2-852a-63b82d0e4cf8"},"process-guid":"3963","process_guid":"3963","retiring_lrp_count":5,"session":"14316101.4.2"}}
{"timestamp":"2019-04-09T15:33:38.407666153Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":"REDACTED_DOMAIN","index":2,"instance-key":{"instance_guid":"3641ecf0-31fc-44c2-72e9-e233","cell_id":"bf022a89-d410-4ff9-9bdd-5811644c3810"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.5.2"}}
{"timestamp":"2019-04-09T15:33:38.408581342Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN1","duration":1313703,"index":4,"instance-key":{"instance_guid":"b443f7f3-87ba-415a-5aaf-5cc1","cell_id":"588d6934-7845-44c9-bbeb-6f053299d0b5"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.3.2"}}
{"timestamp":"2019-04-09T15:33:38.408698106Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN","duration":1032599,"index":2,"instance-key":{"instance_guid":"3641ecf0-31fc-44c2-72e9-e233","cell_id":"bf022a89-d410-4ff9-9bdd-5811644c3810"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.5.2"}}
{"timestamp":"2019-04-09T15:33:38.408858842Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.starting","data":{"domain":"REDACTED_DOMAIN","index":0,"instance-key":{"instance_guid":"4bae4dc0-0361-4ec0-462d-01c4","cell_id":"a98e36ce-0bce-42b2-852a-63b82d0e4cf8"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.1.2"}}
{"timestamp":"2019-04-09T15:33:38.409101410Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN2","duration":1658553,"index":0,"instance-key":{"instance_guid":"8017228a-bf1f-499b-561d-f7bc","cell_id":"a98e36ce-0bce-42b2-852a-63b82d0e4cf8"},"process-guid":"3963","process_guid":"3963","retiring_lrp_count":5,"session":"14316101.4.2"}}
{"timestamp":"2019-04-09T15:33:38.410228109Z","level":"info","source":"bbs","message":"bbs.converge-lrps.retire-actual-lrp.stop-lrp.completed","data":{"domain":"REDACTED_DOMAIN","duration":1377834,"index":0,"instance-key":{"instance_guid":"4bae4dc0-0361-4ec0-462d-01c4","cell_id":"a98e36ce-0bce-42b2-852a-63b82d0e4cf8"},"process-guid":"3952","process_guid":"3952","retiring_lrp_count":5,"session":"14316101.1.2"}}
{"timestamp":"2019-04-09T15:33:38.413241629Z","level":"info","source":"bbs","message":"bbs.executing-convergence.converge-lrps-done","data":{"session":"14316100"}}```