Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add support for cloud-init "degraded" state (#4500)
Summary ======= This commit `cloud-init status` to include: 1. A new exit code (2) 2. Additional running states, exported under a new key "extended_status" 3. External representation of all internal errors: - aggregate recoverable errors - per-stage recoverable errors - per-stage non-recoverable errors (aggregate key already exists) Current state: recoverable errors vs non-recoverable errors =========================================================== critical failure ---------------- If cloud-init is unable to complete, the service returns with exit code 1, and error messages are visible in the log files and in output of `cloud-init status --format json` under the top level 'error' key. recoverable failure ------------------- In the case that cloud-init is able to complete yet something goes awry, the service returns with exit code 0 and messages are visible in the log files. Future state: recoverable errors vs non-recoverable errors ========================================================== critical failure ---------------- If cloud-init is unable to complete, error messages will now additionally be visible in output of `cloud-init status --format json` within the 'error' key nested under the module-level keys: 'init-local', 'init', 'modules-config', 'modules-final'. recoverable failure ------------------- In the case that cloud-init is able to complete yet something goes awry, the service will now return with exit code 2, and error messages will be visible in the output of `cloud-init status --format` json under the top level 'recoverable_errors' key as well as within the 'error' key nested under the module-level keys: 'init-local', 'init', 'modules-config', 'modules-final'. Implementation ============== Cloud-init error codes ---------------------- 0 - success 1 - unrecoverable error 2 - recoverable error (new) This new exit code indicates recoverable errors. If cloud-init exits with exit code (2), cloud-init was able to complete gracefully, however something went wrong and the user should investigate. Additional states ----------------- For backwards compatibility, the output of `cloud-init status` remains unchanged. A new key 'extended_status' is included in the output: $ cloud-init status --format json | jq .status "done" $ cloud-init status --format json | jq .extended_status "degraded done" See Appendix A for list of possible states. Exported errors: Aggregated errors ---------------------------------- When a recoverable error occurs, the internal cloud-init state information is made visible under a top level aggregate key 'recoverable_errors' with errors sorted by error level: $ cloud-init status --format json | jq .recoverable_errors { "WARNING": [ "Failed at merging in cloud config part from part-001: empty cloud config", "No template found in /etc/cloud/templates for template named sources.list.ubuntu.deb822", "No template found in /etc/cloud/templates for template named sources.list", "No template found, not rendering /etc/apt/sources.list.d/ubuntu.sources" ] } See Appendix B for list of possible error levels. Exported errors: Per-stage errors --------------------------------- The keys 'errors' and 'recoverable_errors' are also exported for each stage to allow attribution of recoverable and non-recoverable errors to their source. $ cloud-init status --format json | jq .init.recoverable_errors { "WARNING": [ "Failed at merging in cloud config part from part-001: empty cloud config" ] } Note: Only cloud-init stages which have completed are listed in the output of `cloud-init status --format json`. See Appendix C for list of possible cloud-init stages. Limitations of internal errors ============================== - Exported recoverable errors represent logged messages, which are not guaranteed to be stable between releases. The contents of the 'errors' and 'recoverable_errors' keys are not guaranteed to have stable output! - Exported errors and recoverable errors may occur at different stages since users may reorder configuration modules to run at different stages via cloud.cfg. Appendices ========== Appendix A: Extended states --------------------------- "not running" "running" "done" "error" "degraded done" "degraded running" "disabled" Appendix B: Error levels ------------------------ Reported recoverable error messages are grouped by the level at which they are logged. Complete list of levels: WARNING DEPRECATED ERROR CRITICAL Appendix C: Stages of cloud-init -------------------------------- The json representation of cloud-init stages (in run order) is: "init-local" "init" "modules-config" "modules-final" This commit implements design specification US057[1]. [1] https://discourse.ubuntu.com/t/spec-improve-error-and-warning-visibility/39765
- Loading branch information