Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy cluster doesn't fail gracefully. #146

Open
PC-Admin opened this issue Jun 4, 2024 · 0 comments
Open

Destroy cluster doesn't fail gracefully. #146

PC-Admin opened this issue Jun 4, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@PC-Admin
Copy link

PC-Admin commented Jun 4, 2024

Hello,

First off, awesome Ansible collection, thanks you for making it available!

Having a small issue when using this collection with -e "cephadm_recreate=true". As the previous run failed to add 2/3 hosts to my cluster the 'Destroy cluster' task fails like so:

TASK [stackhpc.cephadm.cephadm : Destroy cluster] ***********************************************************************************************************************************************
fatal: [index-16-09078]: FAILED! => {"changed": true, "cmd": ["cephadm", "rm-cluster", "--fsid", "53d7c6cc-2229-11ef-a94c-b1f216e39593", "--force"], "delta": "0:00:00.499584", "end": "2024-06-04 05:06:01.772498", "msg": "non-zero return code", "rc": 1, "start": "2024-06-04 05:06:01.272914", "stderr": "Traceback (most recent call last):\n  File \"/usr/lib/python3.10/runpy.py\", line 196, in _run_module_as_main\n    return _run_code(code, main_globals, None,\n  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10700, in <module>\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10688, in main\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7989, in command_rm_cluster\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 8047, in _rm_cluster\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7979, in get_ceph_cluster_count\nFileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'", "stderr_lines": ["Traceback (most recent call last):", "  File \"/usr/lib/python3.10/runpy.py\", line 196, in _run_module_as_main", "    return _run_code(code, main_globals, None,", "  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code", "    exec(code, run_globals)", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10700, in <module>", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10688, in main", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7989, in command_rm_cluster", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 8047, in _rm_cluster", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7979, in get_ceph_cluster_count", "FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'"], "stdout": "Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593", "stdout_lines": ["Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593"]}
fatal: [storage-16-09074]: FAILED! => {"changed": true, "cmd": ["cephadm", "rm-cluster", "--fsid", "53d7c6cc-2229-11ef-a94c-b1f216e39593", "--force"], "delta": "0:00:00.513107", "end": "2024-06-04 05:06:01.810504", "msg": "non-zero return code", "rc": 1, "start": "2024-06-04 05:06:01.297397", "stderr": "Traceback (most recent call last):\n  File \"/usr/lib/python3.10/runpy.py\", line 196, in _run_module_as_main\n    return _run_code(code, main_globals, None,\n  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10700, in <module>\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10688, in main\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7989, in command_rm_cluster\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 8047, in _rm_cluster\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7979, in get_ceph_cluster_count\nFileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'", "stderr_lines": ["Traceback (most recent call last):", "  File \"/usr/lib/python3.10/runpy.py\", line 196, in _run_module_as_main", "    return _run_code(code, main_globals, None,", "  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code", "    exec(code, run_globals)", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10700, in <module>", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10688, in main", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7989, in command_rm_cluster", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 8047, in _rm_cluster", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7979, in get_ceph_cluster_count", "FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'"], "stdout": "Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593", "stdout_lines": ["Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593"]}
changed: [storage-14-09034] => {"changed": true, "cmd": ["cephadm", "rm-cluster", "--fsid", "53d7c6cc-2229-11ef-a94c-b1f216e39593", "--force"], "delta": "0:00:07.164754", "end": "2024-06-04 05:06:08.185614", "rc": 0, "start": "2024-06-04 05:06:01.020860", "stderr": "", "stderr_lines": [], "stdout": "Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593", "stdout_lines": ["Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593"]}

TASK [stackhpc.cephadm.cephadm : Remove ssh keys] ***********************************************************************************************************************************************
changed: [storage-14-09034] => (item=/etc/ceph/cephadm.id) => {"ansible_loop_var": "item", "changed": true, "item": "/etc/ceph/cephadm.id", "path": "/etc/ceph/cephadm.id", "state": "absent"}
changed: [storage-14-09034] => (item=/etc/ceph/cephadm.pub) => {"ansible_loop_var": "item", "changed": true, "item": "/etc/ceph/cephadm.pub", "path": "/etc/ceph/cephadm.pub", "state": "absent"}

TASK [stackhpc.cephadm.cephadm : Run prechecks] *************************************************************************************************************************************************
included: /home/mcollins1/.ansible/collections/ansible_collections/stackhpc/cephadm/roles/cephadm/tasks/prechecks.yml for storage-14-09034

This causes the subsequent tasks to not be applied to those failed hosts.

Perhaps a ignore_errors: true here would be appropriate.

@cityofships cityofships added the bug Something isn't working label Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants