Skip to content

Conversation

@cgwalters
Copy link
Member

And this final commit gets rid of all of the qemu hardcoding in
cmdlib, so e.g. architecture specific changes only need to happen
in one place.

(Well, of course arch changes also end up in libvirt which we
use indirectly too, but at least we only have one place)

@cgwalters
Copy link
Member Author

cgwalters commented Mar 25, 2020

@cgwalters cgwalters force-pushed the cosa-always-qemuexec branch 2 times, most recently from 47f640e to dd53484 Compare March 25, 2020 21:51
@cgwalters
Copy link
Member Author

Right, forgot this depends on #1285 too since the 9p bits reference devtype. Added that too.

@cgwalters
Copy link
Member Author

For example this PR would have been unnecessary because we already had the right thing in qemu.go.

@cgwalters cgwalters force-pushed the cosa-always-qemuexec branch 2 times, most recently from 3eb4939 to 82a4148 Compare March 25, 2020 23:16
@ashcrow
Copy link
Member

ashcrow commented Mar 26, 2020

Do we need an update to CI or an update to qemuexec?

Running: rpm-ostree compose tree --repo=/srv/tmp/repo --cachedir=/srv/cache --touch-if-changed /srv/tmp/build/tmp/treecompose.changed --unified-core /srv/src/config/manifest.yaml --download-only --ex-lockfile=/srv/src/config/manifest-lock.x86_64.json --ex-lockfile=/srv/src/config/manifest-lock.overrides.x86_64.yaml
supermin: warning: /usr/libexec/utempter/utempter: Permission denied (ignored)
Some distro files are not public readable, so supermin cannot copy them
into the appliance.  This is a problem with your Linux distro.  Please ask
your distro to stop doing pointless security by obscurity.
You can ignore these warnings.  You *do not* need to use sudo.
supermin: warning: /usr/sbin/unix_update: Permission denied (ignored)
supermin: warning: /var/lib/systemd/random-seed: Permission denied (ignored)
Error: open : no such file or directory
2020-03-25T23:22:14Z cli: open : no such file or directory
script returned exit code 1

@cgwalters
Copy link
Member Author

Do we need an update to CI or an update to qemuexec?

I don't yet know why that's failing, but I plan to debug it more once the dependencies (all currently passing CI) have landed.

@cgwalters
Copy link
Member Author

Still working on this, split out #1342

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 13, 2020
This is a followup to:
coreos#1342
which is part of:
coreos#1289

Our builds right now run qemu in a way that's highly sensitive
to changes in device ordering.

SCSI hardware supports a [world wide name](https://en.wikipedia.org/wiki/World_Wide_Name);
change our qemu invocation to use `42` (in decimal) for that.

Unfortunately because we're not using udev in our supermin
VM right now we don't get the nice `/dev/disk/by-id` symlink
to it.  Instead just walk `/sys/block` manually.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 13, 2020
This is a followup to:
coreos#1342
which is part of:
coreos#1289

Our builds right now run qemu in a way that's highly sensitive
to changes in device ordering.

SCSI hardware supports a [world wide name](https://en.wikipedia.org/wiki/World_Wide_Name);
change our qemu invocation to use `42` (in decimal) for that.

Unfortunately because we're not using udev in our supermin
VM right now we don't get the nice `/dev/disk/by-id` symlink
to it.  Instead just walk `/sys/block` manually.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 13, 2020
This is a followup to:
coreos#1342
which is part of:
coreos#1289

Our builds right now run qemu in a way that's highly sensitive
to changes in device ordering.

SCSI hardware supports a [world wide name](https://en.wikipedia.org/wiki/World_Wide_Name);
change our qemu invocation to use `42` (in decimal) for that.

Unfortunately because we're not using udev in our supermin
VM right now we don't get the nice `/dev/disk/by-id` symlink
to it.  Instead just walk `/sys/block` manually.
@cgwalters cgwalters force-pushed the cosa-always-qemuexec branch from 82a4148 to 0024ac0 Compare April 13, 2020 22:28
@cgwalters cgwalters force-pushed the cosa-always-qemuexec branch from 0024ac0 to b207d09 Compare April 14, 2020 14:50
@cgwalters
Copy link
Member Author

OK cool, so this one is working now! Just depends on #1343 merging first.

@Prashanth684
Copy link
Contributor

tested this on s390x , did a kola test run ..looks good..i will test on ppc64le too but i don't foresee any issues

@cgwalters
Copy link
Member Author

OK since we're pretty confident in this I'd say we just merge this one directly!

@mike-nguyen
Copy link
Member

Tested on x86_64, ran buildextend-metal, then went through the coreos-installer with the raw image. Looks good!

@cgwalters
Copy link
Member Author

/hold cancel

@jlebon
Copy link
Member

jlebon commented Apr 16, 2020

Just to be safe, could we hold this a bit more until we build the next release?

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 18, 2020
This is part of speeding up `cosa build-fast`; here the `dhclient`
run takes about a second out of 11 (after other speedups) so
that's a 9% win.  After rebasing on
coreos#1289
I'll also drop out the virtio-net NIC entirely which will help
us ensure tasks that shouldn't pull external content don't.
openshift-merge-robot pushed a commit that referenced this pull request Apr 20, 2020
This is part of speeding up `cosa build-fast`; here the `dhclient`
run takes about a second out of 11 (after other speedups) so
that's a 9% win.  After rebasing on
#1289
I'll also drop out the virtio-net NIC entirely which will help
us ensure tasks that shouldn't pull external content don't.
@cgwalters
Copy link
Member Author

Just to be safe, could we hold this a bit more until we build the next release?

That's done now? https://builds.coreos.fedoraproject.org/browser?stream=next

@ashcrow ashcrow requested review from bgilbert and jlebon and removed request for ashcrow April 22, 2020 14:11
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice cleanup! Just minor comments, LGTM overall.

fi
mkdir -p "${workdir}"/cache /host/container-work
cachedev=$(blkid -lt LABEL=cosa-cache -o device)
cachedev=$(blkid -lt LABEL=cosa-cache -o device || true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is already in the tree (#1345), but ahh... it's the exact same commit, so that's why GitHub and Jenkins aren't complaining. OK, that should work then; I'm a bit interested to see GitHub handle it.

if [ -z "${disk:-}" ]; then
# hex 0x2a = 42 in decimal; this is set in cmd-buildextend-metal.
# We use the WWN as an unambiguous way to identify our target disk,
# independent of other devices attached to the VM (caches, etc.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note virtio-blk does support something similar as well with serial=... (that's what coreos/ignition#905 uses). This is fine too though! (I would think virtio-scsi is slower than virtio-blk for our use case, but since it's not a long-running VM or anything, it's probably negligible overall. Not to mention our cache qcow2 has been using SCSI for a while too now.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh right...hmm, yes, I think if I just steal these bits from your osmet PR we could switch to serial and /dev/disk/by-id. Will try. What I ran into is that blkid didn't know how to parse the serials or something, and I thought running udev outside of systemd would be harder than that.

# we want /dev/disk symlinks for coreos-installer
/usr/lib/systemd/systemd-udevd --daemon
/usr/sbin/udevadm trigger --settle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I fought this in cgwalters@75aeaba and couldn't get it to work; I think somehow the target drive is ending up as /dev/vda. See also coreos/ignition#905 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in coreos/ignition#905 (comment). But yeah, let's just always enable udev to make all this cleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so...I tried to use this for the root device but that's in the supermin code which has its own custom init that only handles /dev/vda or UUIDs (and for some reason it uses random UUIDs still for the rootfs...)

The flip side of this is...using WWNs is also natural for people doing installs to real devices (e.g. with coreos-installer), so it's somewhat useful to also do that for our builds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my vote hence is to land this, since it's tested and an improvement, we we can do something else as followup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, sure. WFM!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up to this in #1398!

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 22, 2020
And this final commit gets rid of almost all the core qemu logic in
cmdlib, so e.g. architecture specific changes only need to happen
in one place.  The only special casing left is `$devtype` and
the terminal.

First, change how we identify devices.
This is a followup to: coreos#1342
which is part of:
coreos#1289

Our builds right now run qemu in a way that's highly sensitive
to changes in device ordering.

SCSI hardware supports a [world wide name](https://en.wikipedia.org/wiki/World_Wide_Name);
change our qemu invocation to use `42` (in decimal) for that.

Unfortunately because we're not using udev in our supermin
VM right now we don't get the nice `/dev/disk/by-id` symlink
to it.  Instead just walk `/sys/block` manually.

Next, use `kola qemuexec`.  We inherit the virtio-rand setup, CPU calculations etc.

And now we can just use virtio for the root disk
and the cache disk, which drops all the eyeball-glazing
scsi incantations.

(Well, of course arch changes also end up in libvirt which we
 use indirectly too, but at least *we* only have one place)
This is a followup to:
coreos#1342
which is part of:
coreos#1289

Our builds right now run qemu in a way that's highly sensitive
to changes in device ordering.

SCSI hardware supports a [world wide name](https://en.wikipedia.org/wiki/World_Wide_Name);
change our qemu invocation to use `42` (in decimal) for that.

Unfortunately because we're not using udev in our supermin
VM right now we don't get the nice `/dev/disk/by-id` symlink
to it.  Instead just walk `/sys/block` manually.
And this final commit gets rid of almost all the core qemu logic in
cmdlib, so e.g. architecture specific changes only need to happen
in one place.  The only special casing left is `$devtype` and
the terminal.

We inherit the virtio-rand setup, CPU calculations etc.

And now we can just use virtio for the root disk
and the cache disk, which drops all the eyeball-glazing
scsi incantations.

(Well, of course arch changes also end up in libvirt which we
 use indirectly too, but at least *we* only have one place)
@cgwalters cgwalters force-pushed the cosa-always-qemuexec branch from b207d09 to 6a78bde Compare April 22, 2020 19:53
@jlebon
Copy link
Member

jlebon commented Apr 22, 2020

/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 8774891 into coreos:master Apr 22, 2020
openshift-merge-robot pushed a commit that referenced this pull request Apr 22, 2020
This is a followup to:
#1342
which is part of:
#1289

Our builds right now run qemu in a way that's highly sensitive
to changes in device ordering.

SCSI hardware supports a [world wide name](https://en.wikipedia.org/wiki/World_Wide_Name);
change our qemu invocation to use `42` (in decimal) for that.

Unfortunately because we're not using udev in our supermin
VM right now we don't get the nice `/dev/disk/by-id` symlink
to it.  Instead just walk `/sys/block` manually.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 23, 2020
Regression from coreos#1289
that probably explains other mysterious problems I was seeing too.

Basically, if a qemu image already existed, we would auto-load it
with `kola qemuexec`.  In the case of builds, we absolutely do not
want to do that.

I probably didn't see this *initially* when working on this because
my build didn't have a qemu image yet.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 23, 2020
Regression from coreos#1289
that probably explains other mysterious problems I was seeing too.

Basically, if a qemu image already existed, we would auto-load it
with `kola qemuexec`.  In the case of builds, we absolutely do not
want to do that.

I probably didn't see this *initially* when working on this because
my build didn't have a qemu image yet.
openshift-merge-robot pushed a commit that referenced this pull request Apr 23, 2020
Regression from #1289
that probably explains other mysterious problems I was seeing too.

Basically, if a qemu image already existed, we would auto-load it
with `kola qemuexec`.  In the case of builds, we absolutely do not
want to do that.

I probably didn't see this *initially* when working on this because
my build didn't have a qemu image yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants