Skip to content

Conversation

@Adam0Brien
Copy link
Contributor

New PR as the last one was pushing to main

@Adam0Brien Adam0Brien requested a review from travier February 20, 2023 10:33
@travier
Copy link
Member

travier commented Feb 20, 2023

If I run that on an FCOS system right now I get:

$ systemctl show --property=DefaultTimeoutStartUSec
DefaultTimeoutStartUSec=1min 30s

@travier
Copy link
Member

travier commented Feb 20, 2023

You should also check all values, not just one.

@travier
Copy link
Member

travier commented Feb 20, 2023

Can you also add a link to the issue you're fixing to the commit message and update the first comment here?

Something like:

Fixes: <URL>

@Adam0Brien
Copy link
Contributor Author

when you say check all values do you mean these three :
DefaultTimeoutStartUSec=1min 30s
DefaultTimeoutStopUSec=1min 30s
DefaultTimeoutAbortUSec=1min 30s

@Adam0Brien Adam0Brien force-pushed the adams-test branch 2 times, most recently from 2e3a936 to c3a662b Compare February 20, 2023 11:56
@travier
Copy link
Member

travier commented Feb 20, 2023

All the values mentioned in the configuration that we are changing in https://src.fedoraproject.org/rpms/fedora-release/pull-request/249 from coreos/fedora-coreos-tracker#1404

@travier
Copy link
Member

travier commented Feb 20, 2023

So that two times 3 values as we are changing user & system values.

@Adam0Brien Adam0Brien force-pushed the adams-test branch 2 times, most recently from 9b8cc73 to 8510088 Compare February 22, 2023 17:02
@Adam0Brien Adam0Brien requested a review from travier February 23, 2023 12:35
@travier
Copy link
Member

travier commented Feb 23, 2023

Alright, this is now failing on:

[2023-02-23T10:57:20.211Z] Feb 23 10:57:18 qemu0 kola-runext-test.sh[1725]: + [[ '' == \1\m\i\n\ \3\0\s ]]
[2023-02-23T10:57:20.211Z] Feb 23 10:57:18 qemu0 kola-runext-test.sh[1725]: + fatal DefaultTimeoutStartUsec=1min 30s DefaultTimeoutStopUSec=1min 30s DefaultTimeoutAbortUSec=1min 30s DefaultDeviceTimeoutUSec=
[2023-02-23T10:57:20.211Z] Feb 23 10:57:18 qemu0 kola-runext-test.sh[1725]: + echo DefaultTimeoutStartUsec=1min 30s DefaultTimeoutStopUSec=1min 30s DefaultTimeoutAbortUSec=1min 30s DefaultDeviceTimeoutUSec=
[2023-02-23T10:57:20.211Z] Feb 23 10:57:18 qemu0 kola-runext-test.sh[1725]: DefaultTimeoutStartUsec=1min 30s DefaultTimeoutStopUSec=1min 30s DefaultTimeoutAbortUSec=1min 30s DefaultDeviceTimeoutUSec=

which is expected given that this is only in the systemd version in rawhide now.

What we need to do now is to check the systemd version using the rpm command (look for that logic in other tests here) and check this value only if systemd is recent enough.

travier
travier previously requested changes Mar 6, 2023
Copy link
Member

@travier travier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits but LGTM otherwise

@Adam0Brien Adam0Brien force-pushed the adams-test branch 2 times, most recently from 716e287 to 8510088 Compare March 6, 2023 16:07
fatal $error
fi

if [[ "${systemd_version}" == "${required_version}" || "${systemd_version}" > "${required_version}" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the comparison will work with the . in the middle. Does it work?

Copy link
Contributor Author

@Adam0Brien Adam0Brien Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run the test on rawhide and fcos and they both seem to be passing at the moment. Here are the results i got :

Output 1:

`adamobrien@fedora:~/OS-Builds/fcos » cosa kola run ext.config.systemd
+cosa:16> podman run --rm -ti --security-opt 'label=disable' --privileged '--uidmap=1000:0:1' '--uidmap=0:1:1000' --uidmap 1001:1001:64536 -v /home/adamobrien/OS-Builds/fcos:/srv/ --device /dev/kvm --device /dev/fuse --tmpfs /tmp -v /var/tmp:/var/tmp quay.io/coreos-assembler/coreos-assembler:latest kola run ext.config.systemd
kola -p qemu-unpriv --output-dir tmp/kola run ext.config.systemd
⚠️  Skipping kola test pattern "fcos.internet":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
⚠️  Skipping kola test pattern "podman.workflow":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
🕒 Snoozing kola test pattern "ext.config.platforms.aws.nvme" until Mar 10 2023:
  👉 https://github.com/coreos/fedora-coreos-tracker/issues/1306#issuecomment-1426106534
=== RUN   non-exclusive-test-bucket-0
=== RUN   non-exclusive-test-bucket-0/ext.config.systemd
--- PASS: non-exclusive-test-bucket-0 (27.15s)
    --- PASS: non-exclusive-test-bucket-0/ext.config.systemd (1.52s)
PASS, output in tmp/kola
+cosa:24> rc=0 
+cosa:24> set +x`

Output 2:

`adamobrien@fedora:~/OS-Builds/fcos-rawhide » cosa kola run ext.config.systemd          
+cosa:16> podman run --rm -ti --security-opt 'label=disable' --privileged '--uidmap=1000:0:1' '--uidmap=0:1:1000' --uidmap 1001:1001:64536 -v /home/adamobrien/OS-Builds/fcos-rawhide:/srv/ --device /dev/kvm --device /dev/fuse --tmpfs /tmp -v /var/tmp:/var/tmp quay.io/coreos-assembler/coreos-assembler:latest kola run ext.config.systemd
kola -p qemu-unpriv --output-dir tmp/kola run ext.config.systemd
⚠️  Skipping kola test pattern "fcos.internet":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
⚠️  Skipping kola test pattern "podman.workflow":
  👉 https://github.com/coreos/coreos-assembler/pull/1478
🕒 Snoozing kola test pattern "ext.config.platforms.aws.nvme" until Mar 10 2023:
  👉 https://github.com/coreos/fedora-coreos-tracker/issues/1306#issuecomment-1426106534
=== RUN   non-exclusive-test-bucket-0
=== RUN   non-exclusive-test-bucket-0/ext.config.systemd
--- PASS: non-exclusive-test-bucket-0 (27.33s)
    --- PASS: non-exclusive-test-bucket-0/ext.config.systemd (1.58s)
PASS, output in tmp/kola
+cosa:24> rc=0 
+cosa:24> set +x`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you do some more testing you'll find that while it may work in this limited set of use cases it would be better if we made something a little more comprehensive here. I propose we add some new functions to the common library and then use them here. See what you think about #2280

@dustymabe
Copy link
Member

eventually we'll need to squash all the commits down into a single commit.

@dustymabe
Copy link
Member

mind again squashing down into a single commit?

@Adam0Brien
Copy link
Contributor Author

mind again squashing down into a single commit?

will do!

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I'll address #2247 (comment) in #2280

@dustymabe dustymabe enabled auto-merge (rebase) March 7, 2023 16:21
@dustymabe dustymabe disabled auto-merge March 7, 2023 16:30
@dustymabe dustymabe enabled auto-merge (rebase) March 7, 2023 16:31
@dustymabe dustymabe merged commit d74ac7e into coreos:testing-devel Mar 7, 2023
@cgwalters
Copy link
Member

It's a problem for us to write tests like this that explicitly hardcode something about the operating system. Today, when tests fail we stop shipping operating system updates by default. In my opinion, this test should be at best informative. (We don't have a good way to express that today)

However, I think there's a simpler fix: Instead of hardcoding expecting to find 1m 30s, let's hardcode that we expect a timeout that is at least that value. Then we don't need to do anything with respect to the systemd or operating sytsem version.

@cgwalters
Copy link
Member

(The above comment is motivated by this test failing in the midst of us trying to ship RHEL CoreOS 9.2)

@Adam0Brien Adam0Brien deleted the adams-test branch March 16, 2023 14:02
@dustymabe
Copy link
Member

dustymabe commented Mar 16, 2023

It's a problem for us to write tests like this that explicitly hardcode something about the operating system. Today, when tests fail we stop shipping operating system updates by default. In my opinion, this test should be at best informative. (We don't have a good way to express that today)

The config bumps are gated by CI. When this config bump landed it should have failed in CI so IMO "no, this shouldn't have stopped us shipping operating system updates".

However, I think there's a simpler fix: Instead of hardcoding expecting to find 1m 30s, let's hardcode that we expect a timeout that is at least that value.

That's a bit hard here since we're dealing with strings.

Then we don't need to do anything with respect to the systemd or operating system version.

The only reason we are checking systemd version is because DefaultDeviceTimeoutUSec doesn't exist in systemd prior to 252.4, which isn't going to be solved by checking anything greater than or equal to 1m 30s.

@cgwalters
Copy link
Member

The config bumps are gated by CI. When this config bump landed it should have failed in CI so IMO "no, this shouldn't have stopped us shipping operating system updates".

Yes, but this is not true for RHCOS today.

That's a bit hard here since we're dealing with strings.

Yeah...external tests are nice, bash is not.

The only reason we are checking systemd version is because DefaultDeviceTimeoutUSec doesn't exist in systemd prior to 252.4, which isn't going to be solved by checking anything greater than or equal to 1m 30s.

I think the test can skip checking the property if it is empty.

@dustymabe
Copy link
Member

The config bumps are gated by CI. When this config bump landed it should have failed in CI so IMO "no, this shouldn't have stopped us shipping operating system updates".

Yes, but this is not true for RHCOS today.

If that's not true then this is the thing we should be complaining about.

The only reason we are checking systemd version is because DefaultDeviceTimeoutUSec doesn't exist in systemd prior to 252.4, which isn't going to be solved by checking anything greater than or equal to 1m 30s.

I think the test can skip checking the property if it is empty.

Looking at the test failure:

Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2114]: ++ rpm -q --queryformat '%{VERSION}' systemd
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + systemd_version=239
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + required_version=252.4
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + [[ 1min 30s == \1\m\i\n\ \3\0\s ]]
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + ok 'The default start timeout is 1m 30 seconds.'
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + echo ok 'The default start timeout is 1m 30 seconds.'
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: ok The default start timeout is 1m 30 seconds.
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + [[ 1min 30s == \1\m\i\n\ \3\0\s ]]
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + ok 'The default stop timeout is 1m 30 seconds.'
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + echo ok 'The default stop timeout is 1m 30 seconds.'
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: ok The default stop timeout is 1m 30 seconds.
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + [[ '' == \1\m\i\n\ \3\0\s ]]
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + fatal DefaultTimeoutStartUsec=1min 30s DefaultTimeoutStopUSec=1min 30s DefaultTimeoutAbortUSec= DefaultDeviceTimeoutUSec=
Mar 16 13:06:30 qemu0 kola-runext-default-unit-timeouts[2108]: + echo DefaultTimeoutStartUsec=1min 30s DefaultTimeoutStopUSec=1min 30s DefaultTimeoutAbortUSec= DefaultDeviceTimeoutUSec=

It's apparent that DefaultTimeoutAbortUSec (in addition to DefaultDeviceTimeoutUSec) doesn't exist in the systemd in RHEL/RHCOS.

@dustymabe
Copy link
Member

I think the test can skip checking the property if it is empty.

That seems like a good idea, until systemd upstream changes the property name and then we have a test that's not checking for the right value.

@cgwalters
Copy link
Member

If that's not true then this is the thing we should be complaining about.

Complaining to...? We own this whole thing and need to keep in mind both c9s and rhel9 too.

until systemd upstream changes the property name and then we have a test that's not checking for the right value.

That seems very unlikely to me. Changing the property name would break anyone who was trying to configure it.

@travier
Copy link
Member

travier commented Mar 16, 2023

fedora-coreos-config bumps are gated in Prow CI in openshift/os.

This very specific one got through because I disabled 8.6 CI too early in openshift/release#35789 as we decided to keep the 8.6 branch alive a bit longer.

This is only impacting RHEL 8.6 builds so it should not block 9.2 builds.

@travier
Copy link
Member

travier commented Mar 16, 2023

We'll skip it in openshift/os#1205

@cgwalters
Copy link
Member

fedora-coreos-config bumps are gated in Prow CI in openshift/os.

Yeah, but crucially not on package changes.

This is only impacting RHEL 8.6 builds so it should not block 9.2 builds.

But yes, I think I got build logs crossed here, so it didn't impact 9.2 work, so indeed not too important.

(But the larger point still stands I think)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants