Skip to content

Conversation

@jcajka
Copy link
Collaborator

@jcajka jcajka commented May 27, 2020

I would like to propose to switch to virtio-blk for default ignition injection for non fw_cfg platforms. It should be more performant than injecting the ignition directly in to the VM image and IMHO it will be more resilient going forward. I'm currently not aware about any significant drawbacks compared to the current solution. I have tested this on x86_64, ppc64le, aarch64 and s390x with FCOS build/tests and have not observed any regressions.

CC'ing @Prashanth684

To add, I have recently started to see random kola test failures with FCOS(usually ~5 out of all kola tests) on non-fw_cfg platforms, caused by ignition failures. IIRC there has been reason why CoreOS nowadays preferred "non disk" based ignition injection, right? This seems to prevent these from happening.

Log snippet of noted issue follows.

*snip*
[    3.738589] systemd[1]: Started dracut initqueue hook.^M
[    3.739890] systemd[1]: Reached target Remote File Systems (Pre).^M
[    3.741170] systemd[1]: Reached target Remote File Systems.^M
[    3.742480] systemd[1]: Starting dracut pre-mount hook...^M
[    3.743695] systemd[1]: Started dracut pre-mount hook.^M
[    3.921515] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready^M
[    4.286028]  vda: vda1 vda2 vda4^M
[    4.292738] coreos-gpt-setup[444]: The operation has completed successfully.^M
[    4.342945] systemd[1]: Started Generate new UUID for disk GPT dev/disk/by-label/root.^M
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mGenerate new UUID …sk GPT dev/disk/by-label/root^[[0m.^M
[    4.345840] systemd[1]: Starting Ignition (setup base config)...^M
         Starting ^[[0;1;39mIgnition (setup base config)^[[0m...^M
[    4.348311] systemd[1]: Starting Ignition (setup user config)...^M
         Starting ^[[0;1;39mIgnition (setup user config)^[[0m...^M
[    4.358114] ignition-setup-base[508]: File /usr/lib/ignition/platform/qemu/base.ign does not exist.. Skipping copy^M
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mIgnition (setup base config)^[[0m.^M
[    4.364058] systemd[1]: Started Ignition (setup base config).^M
[    4.371245] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null)^M
[    4.373669] ignition-setup-user[509]: Copying /mnt/boot_partition/ignition/config.ign to /usr/lib/ignition/user.ign^M
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mIgnition (setup user config)^[[0m.^M
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mLocal File Systems (Pre)^[[0m.^M
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mLocal File Systems^[[0m.^M
         Starting ^[[0;1;39mCreate Volatile Files and Directories^[[0m...^M
[    4.392909] systemd[1]: Started Ignition (setup user config).^M
[    4.394103] systemd[1]: Reached target Local File Systems (Pre).^M
[    4.395187] systemd[1]: Reached target Local File Systems.^M
[    4.396212] systemd[1]: Starting Create Volatile Files and Directories...^M
[^[[0;32m  OK  ^[[0m] Started ^[[0;1;39mCreate Volatile Files and Directories^[[0m.^M
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mSystem Initialization^[[0m.^M
[^[[0;32m  OK  ^[[0m] Reached target ^[[0;1;39mBasic System^[[0m.^M
         Starting ^[[0;1;39mIgnition (fetch)^[[0m...^M
[    4.404388] systemd[1]: Started Create Volatile Files and Directories.^M
[    4.405621] ignition[519]: Ignition 2.3.0^M
[    4.406591] systemd[1]: Reached target System Initialization.^M
[    4.409744] ignition[519]: Stage: fetch^M
[    4.410698] systemd[1]: Reached target Basic System.^M
[    4.413214] ignition[519]: reading system config file "/usr/lib/ignition/base.ign"^M
[    4.414201] systemd[1]: Starting Ignition (fetch)...^M
[    4.415214] ignition[519]: fetched base config from "system"^M
[    4.416172] ignition[519]: Fetching the Ignition config via the Virtio block driver is currently experimental and subject to change.^M
[    4.418153] ignition[519]: op(1): [started]  loading Virtio block driver module^M
[    4.419157] ignition[519]: op(1): [finished] loading Virtio block driver module^M
[^[[0m^[[0;31m*     ^[[0m] A start job is running for Ignition (fetch) (8s / no limit)^M
^[M^M^[[K[^[[0;1;31m*^[[0m^[[0;31m*    ^[[0m] A start job is running for Ignition (fetch) (8s / no limit)^M
^[M^M^[[K[^[[0;31m*^[[0;1;31m*^[[0m^[[0;31m*   ^[[0m] A start job is running for Ignition (fetch) (9s / no limit)^M
^[M^M^[[K[ ^[[0;31m*^[[0;1;31m*^[[0m^[[0;31m*  ^[[0m] A start job is running for Ignition (fetch) (10s / no limit)^M
.
.
.
^[M^M^[[K[^[[0;31m*^[[0;1;31m*^[[0m^[[0;31m*   ^[[0m] A start job is running for Ignition (fetch) (5min 1s / no limit)^M
^[M^M^[[K[ ^[[0;31m*^[[0;1;31m*^[[0m^[[0;31m*  ^[[0m] A start job is running for Ignition (fetch) (5min 2s / no limit)^M
^[M^M^[[K[  ^[[0;31m*^[[0;1;31m*^[[0m^[[0;31m* ^[[0m] A start job is running for Ignition (fetch) (5min 2s / no limit)^M
^[M^M^[[K[   ^[[0;31m*^[[0;1;31m*^[[0m^[[0;31m*^[[0m] A start job is running for Ignition (fetch) (5min 3s / no limit)^M
[  304.413856] ignition[519]: failed to fetch config: timed out after 5m0s waiting for block device "/dev/disk/by-id/virtio-ignition" to appear^M
^[M^M^[[K[^[[0;1;31mFAILED^[[0m] Failed to start ^[[0;1;39mIgnition (fetch)^[[0m.^M
^[[KSee 'systemctl status ignition-fetch.service' for details.^M
[^[[0;1;38;5;185mDEPEND^[[0m] Dependency failed for ^[[0;1;39mIgnition Complete^[[0m.^M
[^[[0;1;38;5;185mDEPEND^[[0m] Dependency failed for ^[[0;1;39mInitrd Default Target^[[0m.^M
[  304.432013] kauditd_printk_skb: 7 callbacks suppressed^M
[  304.432014] audit: type=1130 audit(1590571211.830:18): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=ignition-fetch comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mBasic System^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mPaths^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mRemote File Systems^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mRemote File Systems (Pre)^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSlices^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSockets^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSystem Initialization^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mSwap^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mTimers^[[0m.^M
         Starting ^[[0;1;39mIgnition (virtio dump)^[[0m...^M
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mdracut pre-mount hook^[[0m.^M
[  304.442323] systemd[1]: ignition-fetch.service: Main process exited, code=exited, status=1/FAILURE^M
[  304.444807] ignition[519]: failed to acquire config: timed out after 5m0s waiting for block device "/dev/disk/by-id/virtio-ignition" to appear^M
[  304.447387] systemd[1]: ignition-fetch.service: Failed with result 'exit-code'.^M
[  304.457827] audit: type=1131 audit(1590571211.850:19): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-pre-mount comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mLocal Encrypted Volumes^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mDispatch Password …ts to Console Directory Watch^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mdracut initqueue hook^[[0m.^M
[  304.462436] audit: type=1131 audit(1590571211.860:20): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'^M
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mCopy CoreOS Firstboot Networking Config^[[0m.^M
[  304.470600] audit: type=1131 audit(1590571211.860:21): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=coreos-copy-firstboot-network comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'^M
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mApply Kernel Variables^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mLoad Kernel Modules^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mCreate Volatile Files and Directories^[[0m.^M
[  304.475790] audit: type=1131 audit(1590571211.870:22): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-sysctl comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'^M
[  304.479844] audit: type=1131 audit(1590571211.870:23): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-modules-load comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mLocal File Systems^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped target ^[[0;1;39mLocal File Systems (Pre)^[[0m.^M
[^[[0;32m  OK  ^[[0m] Stopped ^[[0;1;39mIgnition (setup base config)^[[0m.^M
*snip*

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks! Yes, this is a nice cleanup now that there's a better way to pass the Ignition config.

Could you include the first paragraph of your comment as the commit message body?

@jlebon
Copy link
Member

jlebon commented May 27, 2020

To add, I have recently started to see random kola test failures with FCOS(usually ~5 out of all kola tests) on non-fw_cfg platforms, caused by ignition failures. IIRC there has been reason why CoreOS nowadays preferred "non disk" based ignition injection, right? This seems to prevent these from happening.

Hmm, this is with using -device virtio-blk,serial=ignition,drive=ignition in a s390x/ppc64le VM? If that's the case, then can you file an issue in the tracker?

@jcajka
Copy link
Collaborator Author

jcajka commented May 27, 2020

To add, I have recently started to see random kola test failures with FCOS(usually ~5 out of all kola tests) on non-fw_cfg platforms, caused by ignition failures. IIRC there has been reason why CoreOS nowadays preferred "non disk" based ignition injection, right? This seems to prevent these from happening.

Hmm, this is with using -device virtio-blk,serial=ignition,drive=ignition in a s390x/ppc64le VM? If that's the case, then can you file an issue in the tracker?

Without.

…fault

Move from injecting ingnition directly in to the VM image on non-fw_cfg architectures
to passing it via virtio-blk device. It should be more performant and should be more
robust going forward.
See: coreos/ignition#905
@jlebon
Copy link
Member

jlebon commented May 27, 2020

Hmm, this is with using -device virtio-blk,serial=ignition,drive=ignition in a s390x/ppc64le VM? If that's the case, then can you file an issue in the tracker?

Without.

Ahh OK. Yeah, with coreos/ignition#905, Ignition will just keep waiting until the device shows up (or eventually fail). This is by design.

@jcajka
Copy link
Collaborator Author

jcajka commented May 27, 2020

Hmm, this is with using -device virtio-blk,serial=ignition,drive=ignition in a s390x/ppc64le VM? If that's the case, then can you file an issue in the tracker?

Without.

Ahh OK. Yeah, with coreos/ignition#905, Ignition will just keep waiting until the device shows up (or eventually fail). This is by design.

IIRC there has been mentions of the "disk" ignition being inherently prone to race conditions, reason why they are discouraged nowadays, is that right? Possibly the mentioned case/log?

panic("Ignition specified but no primary disk")
} else {
// Alternative to fw_cfg, should be generally usable on all arches,
// especially those without fw_cfg support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note it's technically not usable yet on arches that do support fw_cfg (see coreos/ignition#928 (comment)). But meh... not worth a respin.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... sorry bad wording on my side, should have been possibly...., will keep it on back of my mind and will fix it in some future PR.

@jlebon
Copy link
Member

jlebon commented May 27, 2020

/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcajka, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jlebon
Copy link
Member

jlebon commented May 27, 2020

IIRC there has been mentions of the "disk" ignition being inherently prone to race conditions, reason why they are discouraged nowadays, is that right? Possibly the mentioned case/log?

The ambiguity/race arises when the medium is optional. coreos/ignition#905 works around that by making it required, which is why that VM is failing. :) See coreos/ignition#928 for lots more discussions around this.

@openshift-merge-robot openshift-merge-robot merged commit 272529a into coreos:master May 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants