Skip to content

Commit

Permalink
doc: Add failure state documentation (canonical#4578)
Browse files Browse the repository at this point in the history
Followup to document commit 70acb7f.
  • Loading branch information
holmanb committed Nov 5, 2023
1 parent cf9db1e commit 7328700
Show file tree
Hide file tree
Showing 2 changed files with 139 additions and 0 deletions.
138 changes: 138 additions & 0 deletions doc/rtd/explanation/failure_states.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
.. _failure_states:

Failure states: recoverable errors vs non-recoverable errors
============================================================

critical failure
----------------
If cloud-init is unable to complete, error messages will now
additionally be visible in output of `cloud-init status --format json`
within the 'error' key nested under the module-level keys: 'init-local',
'init', 'modules-config', 'modules-final'.

recoverable failure
-------------------
In the case that cloud-init is able to complete yet something goes awry,
the service will now return with exit code 2, and error messages will be
visible in the output of `cloud-init status --format` json under the top
level 'recoverable_errors' key as well as within the 'error' key nested
under the module-level keys: 'init-local', 'init', 'modules-config',
'modules-final'.

Implementation
==============

Cloud-init error codes
----------------------
0 - success
1 - unrecoverable error
2 - recoverable error

If cloud-init exits with exit code 1, cloud-init experienced critical failure
and was unable to recover. In this case, something is likely seriously
wrong with the system, or cloud-init has experienced a serious bug.

If cloud-init exits with exit code 2, cloud-init was able to complete
gracefully, however something went wrong and the user should investigate.


Reported state
--------------
Cloud-init can report its internal state via the `status --format json`
subcommand undert the `'extended_status'` key.

$ cloud-init status --format json | jq .extended_status
"degraded done"

See the list of all possible states:

.. code-block: shell-session
"not running"
"running"
"done"
"error"
"degraded done"
"degraded running"
"disabled"
Exported errors: Aggregated errors
----------------------------------
When a recoverable error occurs, the internal cloud-init state
information is made visible under a top level aggregate key
'recoverable_errors' with errors sorted by error level:

.. code-block: shell-session
$ cloud-init status --format json | jq .recoverable_errors
{
"WARNING": [
"Failed at merging in cloud config part from p-01: empty cloud config",
"No template found in /etc/cloud/templates for template source.deb822",
"No template found in /etc/cloud/templates for template sources.list",
"No template found, not rendering /etc/apt/soures.list.d/ubuntu.source"
]
}
See :ref:`Appendix A<states_appendix_a>` for list of possible error levels.

Exported errors: Per-stage errors
---------------------------------
The keys 'errors' and 'recoverable_errors' are also exported for each
stage to allow attribution of recoverable and non-recoverable errors
to their source.

.. code-block: shell-session
$ cloud-init status --format json | jq .init.recoverable_errors
{
"WARNING": [
"Failed at merging in cloud config part from p-001: empty cloud config"
]
}
Note: Only cloud-init stages which have completed are listed in the
output of `cloud-init status --format json`.

See :ref:`Appendix B<states_appendix_b>` for list of cloud-init stages.

Limitations of internal errors
==============================
- Exported recoverable errors represent logged messages, which are not
guaranteed to be stable between releases. The contents of the
'errors' and 'recoverable_errors' keys are not guaranteed to have
stable output!
- Exported errors and recoverable errors may occur at different stages
since users may reorder configuration modules to run at different
stages via cloud.cfg.

Appendices
==========

.. _states_appendix_a:

Appendix A: Error levels
------------------------
Reported recoverable error messages are grouped by the level at which
they are logged. Complete list of levels:

.. code-block: shell-session
WARNING
DEPRECATED
ERROR
CRITICAL
.. _states_appendix_b:

Appendix B: Stages of cloud-init
--------------------------------
The json representation of cloud-init stages (in run order) is:

.. code-block: shell-session
"init-local"
"init"
"modules-config"
"modules-final"
1 change: 1 addition & 0 deletions doc/rtd/explanation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ knowledge and become better at using and configuring ``cloud-init``.
security.rst
analyze.rst
kernel-cmdline.rst
failure_states.rst

0 comments on commit 7328700

Please sign in to comment.