-
Notifications
You must be signed in to change notification settings - Fork 217
Description
Thank you for submitting an issue to the diego-release repository. We appreciate the feedback. To help us address your issue, please fill out the sections in the following template to the best of your ability:
Summary
We are running three instances of locket inside diego-api VM.
Crash one diego-api VM(non bbs master node).
tps_watcher lost lock after the VM crash.
cell presence information also lost from the locket database.
This caused downtime for the applications.
Expected Result
tps_watcher should not have lost lock, as other two instances of lockets are still running.
Cell presence information for all cells should have been existed during this phase.
Actual Result
tps_watcher acquires lock at some duration like 2-3mins.
cell presence information for few cells lost after crash of one diego-api VM.
Context
we are using diego-release v1.29.2.
Steps to Reproduce
Deploy cf-deployment v0.37.0 having diego-release v1.29.2.
Having cf-deployment v0.37.0, check for the cfdot locks
& cfdot presences
.
Crash one diego-api VM having locket & make sure to check this VM is non bbs master node.
After performing crash operation, check the information for locks & presences.
Do check for the cf push operation.
Additional Text Output or Screenshots (optional)
cfdot locks
& cfdot presences
log before crash of diego-api VM.
$ cfdot locks
{"key":"tps_watcher","owner":"c75b2782-2151-447c-64e0-233edc77c6a1","type":"lock","type_code":1}
{"key":"auctioneer","owner":"850efde7-b42f-4485-975d-39c4cef0f7f3","type":"lock","type_code":1}
{"key":"bbs","owner":"6e915057-a027-4050-a8de-5f9c07306600","type":"lock","type_code":1}
[lab-openstack watchtower/0 100.73.57.23] rpaas-admin:~
$ cfdot presences
{"key":"a0035d63-9916-4665-a9c3-3e84868f822c","owner":"e37658a6-6f40-40bf-75d1-90748255f212","value":"{"cell_id":"a0035d63-9916-4665-a9c3-3e84868f822c","rep_address":"http://100.73.62.130:1800\",\"zone\":\"z2\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"stg\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"stg\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"stg\"],\"rep_url\":\"https://a0035d63-9916-4665-a9c3-3e84868f822c.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"86f75848-dd18-4513-b409-25a285560a68","owner":"0d44864a-4d32-4df6-7c2f-4bbb3652994c","value":"{"cell_id":"86f75848-dd18-4513-b409-25a285560a68","rep_address":"http://100.73.62.129:1800\",\"zone\":\"z1\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"stg\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"stg\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"stg\"],\"rep_url\":\"https://86f75848-dd18-4513-b409-25a285560a68.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"edf37143-0a6b-422b-92ae-ea1369226c90","owner":"4bad7d78-2fb5-497f-65b6-2b34e45c2cf8","value":"{"cell_id":"edf37143-0a6b-422b-92ae-ea1369226c90","rep_address":"http://100.73.63.131:1800\",\"zone\":\"z3\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"pro\"],\"rep_url\":\"https://edf37143-0a6b-422b-92ae-ea1369226c90.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"28a4d1ff-6d1c-4640-bd9e-63745e94bdba","owner":"f0965cc0-e876-4733-58bc-a797bf78cfb4","value":"{"cell_id":"28a4d1ff-6d1c-4640-bd9e-63745e94bdba","rep_address":"http://100.73.63.130:1800\",\"zone\":\"z2\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"pro\"],\"rep_url\":\"https://28a4d1ff-6d1c-4640-bd9e-63745e94bdba.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"70f94008-23cf-4968-a41a-13a395557fbe","owner":"800857b2-e591-4d42-5dd5-bebe5c06ba4f","value":"{"cell_id":"70f94008-23cf-4968-a41a-13a395557fbe","rep_address":"http://100.73.62.131:1800\",\"zone\":\"z3\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"stg\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"stg\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"stg\"],\"rep_url\":\"https://70f94008-23cf-4968-a41a-13a395557fbe.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"23acb41b-6979-4a03-ba66-5e0ff08908f7","owner":"54ed1669-0a0e-4916-4439-873623042d4b","value":"{"cell_id":"23acb41b-6979-4a03-ba66-5e0ff08908f7","rep_address":"http://100.73.61.130:1800\",\"zone\":\"z2\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"dev\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"dev\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"dev\"],\"rep_url\":\"https://23acb41b-6979-4a03-ba66-5e0ff08908f7.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"5ff7abf0-4c1f-4302-b15f-438f698250d5","owner":"ea1e45bb-69a1-4a8c-6e91-65a54c7e6860","value":"{"cell_id":"5ff7abf0-4c1f-4302-b15f-438f698250d5","rep_address":"http://100.73.63.129:1800\",\"zone\":\"z1\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"pro\"],\"rep_url\":\"https://5ff7abf0-4c1f-4302-b15f-438f698250d5.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"96bd58e1-ecff-4c61-b882-b8156afa1bde","owner":"07b9527e-048b-4e75-51e3-bd9cb181a262","value":"{"cell_id":"96bd58e1-ecff-4c61-b882-b8156afa1bde","rep_address":"http://100.73.61.129:1800\",\"zone\":\"z1\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"dev\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"dev\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"dev\"],\"rep_url\":\"https://96bd58e1-ecff-4c61-b882-b8156afa1bde.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"08a679bc-9115-4e2e-8be5-8d017ba06c34","owner":"6351549e-0a17-4223-54a0-2888a188836f","value":"{"cell_id":"08a679bc-9115-4e2e-8be5-8d017ba06c34","rep_address":"http://100.73.61.131:1800\",\"zone\":\"z3\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"dev\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"dev\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"dev\"],\"rep_url\":\"https://08a679bc-9115-4e2e-8be5-8d017ba06c34.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
After diego-api VM crash, cfdot locks & cfdot presences.
$ cfdot locks
{"key":"auctioneer","owner":"1fbcbedb-5f96-444d-8dc4-2e41706bd74c","type":"lock","type_code":1}
{"key":"bbs","owner":"6e915057-a027-4050-a8de-5f9c07306600","type":"lock","type_code":1}
[lab-openstack watchtower/0 100.73.57.23] rpaas-admin:~
$ cfdot presences
{"key":"a0035d63-9916-4665-a9c3-3e84868f822c","owner":"e37658a6-6f40-40bf-75d1-90748255f212","value":"{"cell_id":"a0035d63-9916-4665-a9c3-3e84868f822c","rep_address":"http://100.73.62.130:1800\",\"zone\":\"z2\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"stg\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"stg\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"stg\"],\"rep_url\":\"https://a0035d63-9916-4665-a9c3-3e84868f822c.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"edf37143-0a6b-422b-92ae-ea1369226c90","owner":"4bad7d78-2fb5-497f-65b6-2b34e45c2cf8","value":"{"cell_id":"edf37143-0a6b-422b-92ae-ea1369226c90","rep_address":"http://100.73.63.131:1800\",\"zone\":\"z3\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"pro\"],\"rep_url\":\"https://edf37143-0a6b-422b-92ae-ea1369226c90.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"23acb41b-6979-4a03-ba66-5e0ff08908f7","owner":"54ed1669-0a0e-4916-4439-873623042d4b","value":"{"cell_id":"23acb41b-6979-4a03-ba66-5e0ff08908f7","rep_address":"http://100.73.61.130:1800\",\"zone\":\"z2\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"dev\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"dev\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"dev\"],\"rep_url\":\"https://23acb41b-6979-4a03-ba66-5e0ff08908f7.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
{"key":"5ff7abf0-4c1f-4302-b15f-438f698250d5","owner":"ea1e45bb-69a1-4a8c-6e91-65a54c7e6860","value":"{"cell_id":"5ff7abf0-4c1f-4302-b15f-438f698250d5","rep_address":"http://100.73.63.129:1800\",\"zone\":\"z1\",\"capacity\":{\"memory_mb\":16000,\"disk_mb\":95394,\"containers\":249},\"rootfs_provider_list\":[{\"name\":\"preloaded\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"preloaded+layer\",\"properties\":[\"cflinuxfs2\",\"pro\"]},{\"name\":\"docker\"}],\"placement_tags\":[\"pro\"],\"rep_url\":\"https://5ff7abf0-4c1f-4302-b15f-438f698250d5.cell.service.cf.internal:1801\"}","type":"presence","type_code":2}
tps_watcher.stdout.log before & after crash of diego-api VM.
$ tailf /var/vcap/sys/log/tps/tps_watcher.stdout.log
{"timestamp":"1533547843.285956860","source":"tps-watcher","message":"tps-watcher.locket-lock.lost-lock","log_level":2,"data":{"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","lock":{"key":"tps_watcher","owner":"c75b2782-2151-447c-64e0-233edc77c6a1","type":"lock","type_code":1},"session":"2","ttl_in_seconds":15}}
{"timestamp":"1533547843.286203861","source":"tps-watcher","message":"tps-watcher.locket-lock.completed","log_level":1,"data":{"lock":{"key":"tps_watcher","owner":"c75b2782-2151-447c-64e0-233edc77c6a1","type":"lock","type_code":1},"session":"2","ttl_in_seconds":15}}
{"timestamp":"1533547843.286386490","source":"tps-watcher","message":"tps-watcher.watcher.stopping","log_level":1,"data":{"session":"3"}}
{"timestamp":"1533547843.286486387","source":"tps-watcher","message":"tps-watcher.watcher.finished","log_level":1,"data":{"session":"3"}}
{"timestamp":"1533547843.286759377","source":"tps-watcher","message":"tps-watcher.watcher.failed-getting-next-event","log_level":2,"data":{"error":"source closed","session":"3"}}
{"timestamp":"1533547843.286840677","source":"tps-watcher","message":"tps-watcher.consul-lock.shutting-down","log_level":1,"data":{"key":"v1/locks/tps_watcher_lock","received-signal":2,"session":"1","value":"7627bc63-b22c-442e-47e8-18eebbd2f7e6"}}
{"timestamp":"1533547843.336131096","source":"tps-watcher","message":"tps-watcher.consul-lock.done","log_level":1,"data":{"key":"v1/locks/tps_watcher_lock","session":"1","value":"7627bc63-b22c-442e-47e8-18eebbd2f7e6"}}
{"timestamp":"1533547843.336535692","source":"tps-watcher","message":"tps-watcher.exited-with-failure","log_level":2,"data":{"error":"Exit trace for group:\nsql-lock exited with error: rpc error: code = DeadlineExceeded desc = context deadline exceeded\nwatcher exited with nil\nconsul-lock exited with nil\ndebug-server exited with nil\n"}}
{"timestamp":"1533547848.103848696","source":"tps-watcher","message":"tps-watcher.consul-lock.starting","log_level":1,"data":{"key":"v1/locks/tps_watcher_lock","session":"1","value":"77bd3068-3dc7-45b4-6aab-6f646e8321fa"}}
{"timestamp":"1533547848.103962421","source":"tps-watcher","message":"tps-watcher.consul-lock.acquiring-lock","log_level":1,"data":{"key":"v1/locks/tps_watcher_lock","session":"1","value":"77bd3068-3dc7-45b4-6aab-6f646e8321fa"}}