-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boot assessment must wait for network #1263
Comments
This is a tricky topic. Currently the boot assessment relies on systemd being capable to reach certain stage at boot and rebooting automatically in case of failure. We should explore how health check from SLE Micro and also check if we need to define some sort of more elaborated concept to verify the system booted as expected. I'd say a sane check could be that the system managed to register itself once the default target is reached and before certain timeout. I'd like to see the check configurable, like giving a check script that is executed certain amount of times with a predefined cadence before considering it actually failed to successfully boot. On failure we could simply reboot. This would give as the chance to also reboot in case systemd booted but on a degraded state (e.g. no network) and it would also give us the chance to configure some explicit constraints to consider for an upgrade. |
Just did a quick check to Micro health check and I do believe we should migrate to use such a service. What I am wondering is if we could easily conceptualize this concept in a generic way so we could provide a health check system for elemental-toolkit that does not depend on Micro. |
After a a further look at health checker form Micro I do believe there are little chances for us to adopt it right now (it is really coupled to btrfs and several Micro specifics such as grub, etc.). We need a deeper integration and further perspective to make use of it. However what we can actually do is build our own checker script and logic in a compatible way, we could build something that is really close to a health checker plugin, so we can easily adapt/adopt former Micro system when there is a chance. |
We have a user case where the system comes up, passes boot assessment, but fails to start the workload.
Turns out that there's a filesystem error, preventing even NetworkManager.service from starting.
Boot assessment should capture this case and reboot into
fallback
if network fails to start up.The text was updated successfully, but these errors were encountered: