You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Image I'm using:
Bottlerocket v1.1.2
Seen this in both aarch64 aws-k8s-1.17 and aarch64 aws-k8s-1.20 so far
What I expected to happen:
Host to start-up successfully after a reboot
What actually happened:
2/400ish hosts failed to reboot due to the following error:
[ OK ] Finished Datastore creator.
Starting Bottlerocket API server...
[ OK ] Started Bottlerocket API server.
Starting User-specified setting generators...
[ 4.676152] settings-committer[366]: Error sending POST to /tx/commit?tx=bottlerocket-launch: Failed to send request: error trying to connect: No such file or directory (os error 2)
[FAILED] Failed to start User-specified setting generators.
See 'systemctl status sundog.service' for details.
[DEPEND] Dependency failed for Applies settings to create config files.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Isolates configured.target.
sundog fails to commit settings to the apiserver due to the socket somehow missing.
apiserver logs:
Started Bottlerocket API server.
06:32:50 [INFO] Starting 1 workers
06:32:50 [INFO] Starting "actix-web-service-"/run/api.sock"" service on "/run/api.sock" (pathname)
06:33:18 [INFO] thar-be-settings started
06:33:18 [INFO] Parsing stdin for updated settings
06:33:18 [INFO] Requesting affected services for settings: {"settings.updates.metadata-base-url", "settings.updates.targets-base-
06:33:18 [INFO] Requesting configuration file data for affected services
06:33:18 [INFO] Rendering config files...
06:33:18 [INFO] Writing config files to disk...
06:33:18 [INFO] Restarting affected services...
06:33:18 [INFO] restart commands []
06:33:21 [INFO] thar-be-settings started
06:33:21 [INFO] Parsing stdin for updated settings
06:33:21 [INFO] Requesting affected services for settings: {"settings.updates.ignore-waves", "settings.updates.version-lock"}
06:33:21 [INFO] Requesting configuration file data for affected services
06:33:21 [INFO] Rendering config files...
06:33:21 [INFO] Writing config files to disk...
06:33:21 [INFO] Restarting affected services...
06:33:21 [INFO] restart commands []
Stopping Bottlerocket API server...
06:33:43 [INFO] SIGTERM received, stopping
06:33:43 [INFO] Paused accepting connections on "/run/api.sock" (pathname)
06:33:43 [INFO] Shutting down worker, 0 connections
apiserver.service: Succeeded.
Stopped Bottlerocket API server.
--reboot--
Starting Bottlerocket API server...
06:34:04 [INFO] Starting server at /run/api.sock with 1 thread and datastore at /var/lib/bottlerocket/datastore/current
Started Bottlerocket API server.
06:34:04 [INFO] Starting 1 workers
06:34:04 [INFO] Starting "actix-web-service-"/run/api.sock"" service on "/run/api.sock" (pathname)
06:34:04 [INFO] Paused accepting connections on "/run/api.sock" (pathname)
06:34:04 [INFO] Resume accepting connections on "/run/api.sock" (pathname)
It seems like at some point apiserver pauses connections on /run/api.sock. The timing coincides with sundog trying to commit settings to the apiserver.
More detailed logs:
[root@ip-172-31-47-164 debug]# journalctl -D ./var/log/journal/ -u sundog -u apiserver -u migrator -u storewolf
-- Logs begin at Thu 2021-07-08 06:32:46 UTC, end at Thu 2021-07-08 17:04:06 UTC. --
Jul 08 06:32:48 localhost systemd[1]: Starting Bottlerocket data store migrator...
Jul 08 06:32:48 localhost migrator[344]: Data store does not exist at given path, exiting (/var/lib/bottlerocket/datastore/current)
Jul 08 06:32:48 localhost systemd[1]: Finished Bottlerocket data store migrator.
Jul 08 06:32:48 localhost systemd[1]: Starting Datastore creator...
Jul 08 06:32:48 localhost storewolf[353]: 06:32:48 [INFO] Storewolf started
Jul 08 06:32:48 localhost storewolf[353]: 06:32:48 [INFO] Deleting pending transactions
Jul 08 06:32:48 localhost storewolf[353]: 06:32:48 [INFO] Populating datastore at: /var/lib/bottlerocket/datastore
Jul 08 06:32:48 localhost storewolf[353]: 06:32:48 [INFO] Creating datastore at: /var/lib/bottlerocket/datastore/current/live
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal storewolf[353]: 06:32:50 [INFO] Datastore populated
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Finished Datastore creator.
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Starting Bottlerocket API server...
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal apiserver[362]: 06:32:50 [INFO] Starting server at /run/api.sock with 1 thread and datastore at /var/lib/bottlerocket/datastore/current
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Started Bottlerocket API server.
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal apiserver[362]: 06:32:50 [INFO] Starting 1 workers
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal apiserver[362]: 06:32:50 [INFO] Starting "actix-web-service-"/run/api.sock"" service on "/run/api.sock" (pathname)
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Starting User-specified setting generators...
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal settings-committer[370]: 06:32:50 [INFO] Checking pending settings.
Jul 08 06:32:50 ip-192-168-144-159.us-west-2.compute.internal settings-committer[370]: 06:32:50 [INFO] Committing settings.
Jul 08 06:32:51 ip-192-168-144-159.us-west-2.compute.internal sundog[373]: 06:32:51 [INFO] Sundog started
Jul 08 06:32:51 ip-192-168-144-159.us-west-2.compute.internal sundog[373]: 06:32:51 [INFO] Retrieving setting generators
Jul 08 06:32:51 ip-192-168-144-159.us-west-2.compute.internal sundog[373]: 06:32:51 [INFO] Retrieving settings values
Jul 08 06:32:51 ip-192-168-144-159.us-west-2.compute.internal sundog[373]: 06:32:51 [INFO] Sending settings values to the API
Jul 08 06:32:51 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Finished User-specified setting generators.
Jul 08 06:33:18 ip-192-168-144-159.us-west-2.compute.internal apiserver[1040]: 06:33:18 [INFO] thar-be-settings started
Jul 08 06:33:18 ip-192-168-144-159.us-west-2.compute.internal apiserver[1040]: 06:33:18 [INFO] Parsing stdin for updated settings
Jul 08 06:33:18 ip-192-168-144-159.us-west-2.compute.internal apiserver[1040]: 06:33:18 [INFO] Requesting affected services for settings: {"settings.updates.metadata-base-url", "settings.updates.tar
Jul 08 06:33:18 ip-192-168-144-159.us-west-2.compute.internal apiserver[1040]: 06:33:18 [INFO] Requesting configuration file data for affected services
Jul 08 06:33:18 ip-192-168-144-159.us-west-2.compute.internal apiserver[1040]: 06:33:18 [INFO] Rendering config files...
Jul 08 06:33:18 ip-192-168-144-159.us-west-2.compute.internal apiserver[1040]: 06:33:18 [INFO] Writing config files to disk...
Jul 08 06:33:18 ip-192-168-144-159.us-west-2.compute.internal apiserver[1040]: 06:33:18 [INFO] Restarting affected services...
Jul 08 06:33:18 ip-192-168-144-159.us-west-2.compute.internal apiserver[1040]: 06:33:18 [INFO] restart commands []
Jul 08 06:33:21 ip-192-168-144-159.us-west-2.compute.internal apiserver[1081]: 06:33:21 [INFO] thar-be-settings started
Jul 08 06:33:21 ip-192-168-144-159.us-west-2.compute.internal apiserver[1081]: 06:33:21 [INFO] Parsing stdin for updated settings
Jul 08 06:33:21 ip-192-168-144-159.us-west-2.compute.internal apiserver[1081]: 06:33:21 [INFO] Requesting affected services for settings: {"settings.updates.ignore-waves", "settings.updates.version-
Jul 08 06:33:21 ip-192-168-144-159.us-west-2.compute.internal apiserver[1081]: 06:33:21 [INFO] Requesting configuration file data for affected services
Jul 08 06:33:21 ip-192-168-144-159.us-west-2.compute.internal apiserver[1081]: 06:33:21 [INFO] Rendering config files...
Jul 08 06:33:21 ip-192-168-144-159.us-west-2.compute.internal apiserver[1081]: 06:33:21 [INFO] Writing config files to disk...
Jul 08 06:33:21 ip-192-168-144-159.us-west-2.compute.internal apiserver[1081]: 06:33:21 [INFO] Restarting affected services...
Jul 08 06:33:21 ip-192-168-144-159.us-west-2.compute.internal apiserver[1081]: 06:33:21 [INFO] restart commands []
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: sundog.service: Succeeded.
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Stopped User-specified setting generators.
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Stopping Bottlerocket API server...
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal apiserver[362]: 06:33:43 [INFO] SIGTERM received, stopping
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal apiserver[362]: 06:33:43 [INFO] Paused accepting connections on "/run/api.sock" (pathname)
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal apiserver[362]: 06:33:43 [INFO] Shutting down worker, 0 connections
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: apiserver.service: Succeeded.
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Stopped Bottlerocket API server.
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: storewolf.service: Succeeded.
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Stopped Datastore creator.
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: migrator.service: Succeeded.
Jul 08 06:33:43 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Stopped Bottlerocket data store migrator.
-- Reboot --
Jul 08 06:34:02 localhost systemd[1]: Starting Bottlerocket data store migrator...
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal migrator[343]: 06:34:04 [INFO] New data store is being built at work location /var/lib/bottlerocket/datastore/v20.0.0_aDrOR2T3GwWYrcVP
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal migrator[343]: 06:34:04 [INFO] Running migration command: "/proc/self/fd/4" "--forward" "--source-datastore" "/var/lib/bottlerocket/data
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal migrator[343]: 06:34:04 [INFO] New data store is being built at work location /var/lib/bottlerocket/datastore/v20.0.0_2VvDttWQprRLYDo4
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal migrator[343]: 06:34:04 [INFO] Running migration command: "/proc/self/fd/4" "--forward" "--source-datastore" "/var/lib/bottlerocket/data
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal migrator[343]: 06:34:04 [INFO] Flipping /var/lib/bottlerocket/datastore/v20.0.0 to point to v20.0.0_2VvDttWQprRLYDo4
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal migrator[343]: 06:34:04 [INFO] Flipping /var/lib/bottlerocket/datastore/v20.0 to point to v20.0.0
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal migrator[343]: 06:34:04 [INFO] Flipping /var/lib/bottlerocket/datastore/v20 to point to v20.0
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal migrator[343]: 06:34:04 [INFO] Flipping /var/lib/bottlerocket/datastore/current to point to v20
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Finished Bottlerocket data store migrator.
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Starting Datastore creator...
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal storewolf[363]: 06:34:04 [INFO] Storewolf started
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal storewolf[363]: 06:34:04 [INFO] Deleting pending transactions
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal storewolf[363]: 06:34:04 [INFO] Populating datastore at: /var/lib/bottlerocket/datastore
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal storewolf[363]: 06:34:04 [INFO] Datastore populated
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Finished Datastore creator.
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Starting Bottlerocket API server...
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal apiserver[364]: 06:34:04 [INFO] Starting server at /run/api.sock with 1 thread and datastore at /var/lib/bottlerocket/datastore/current
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Started Bottlerocket API server.
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal apiserver[364]: 06:34:04 [INFO] Starting 1 workers
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Starting User-specified setting generators...
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal apiserver[364]: 06:34:04 [INFO] Starting "actix-web-service-"/run/api.sock"" service on "/run/api.sock" (pathname)
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal settings-committer[366]: 06:34:04 [INFO] Checking pending settings.
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal apiserver[364]: 06:34:04 [INFO] Paused accepting connections on "/run/api.sock" (pathname)
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal apiserver[364]: 06:34:04 [INFO] Resume accepting connections on "/run/api.sock" (pathname)
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal settings-committer[366]: 06:34:04 [INFO] Committing settings.
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal settings-committer[366]: Error sending POST to /tx/commit?tx=bottlerocket-launch: Failed to send request: error trying to connect: No su
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: sundog.service: Control process exited, code=exited, status=1/FAILURE
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: sundog.service: Failed with result 'exit-code'.
Jul 08 06:34:04 ip-192-168-144-159.us-west-2.compute.internal systemd[1]: Failed to start User-specified setting generators.
How to reproduce the problem:
Launch a bunch of bottlerocket hosts and restart them.
The text was updated successfully, but these errors were encountered:
I'm having trouble reproducing this issue. It might be rarer than expected or maybe there are some other factors I haven't considered. In any case, I'm gonna put this aside for now and come back to it once it happens again or resolve this if it hasn't happened again after a while.
This hasn't happened for a while now (two weeks+). Not sure what fixed it, maybe the rust package upgrades. I'll go ahead and close the issue for now and re-open it if it ever resurfaces again.
Image I'm using:
Bottlerocket v1.1.2
Seen this in both aarch64 aws-k8s-1.17 and aarch64 aws-k8s-1.20 so far
What I expected to happen:
Host to start-up successfully after a reboot
What actually happened:
2/400ish hosts failed to reboot due to the following error:
sundog fails to commit settings to the apiserver due to the socket somehow missing.
apiserver logs:
It seems like at some point apiserver pauses connections on
/run/api.sock
. The timing coincides with sundog trying to commit settings to the apiserver.More detailed logs:
How to reproduce the problem:
Launch a bunch of bottlerocket hosts and restart them.
The text was updated successfully, but these errors were encountered: