[rhcos-4.11] support being initiated from a `cosa remote-session` #3108

dustymabe · 2022-09-16T14:02:17Z

We recently did some golang migration [1] and happened to add the
cosa remote-session pieces [2] after that migration. This makes
it not easy to backport to other branches. For a period of time
let's use the latest COSA for our COSA pod for multi-arch builds
instead of the versioned COSA. We will still use the versioned COSA
on the remote so the actual building will happened with the correctly
versioned COSA (substituted below and passed as an argument to
cosa remote-session create --image=). However we still need to support
some of the new command line flags that were added to aid in the transition
to cosa remote-session. This PR backports various pieces needed.

[1] #2919
[2] #2979

The command previously wouldn't allow us to get/set fedora-coreos.parent-commit and fedora-coreos.parent-version because it would try to traverse the `fedora-coreos.` path. Add an exception here similar to `coreos-assembler.`. Now we can do something like this in the pipeline: ``` cosa meta \ --set fedora-coreos.parent-commit=${parent_commit} \ --set fedora-coreos.parent-version=${parent_version} ``` (cherry picked from commit 433a68d)

This will allow us to specify where the AWS_CONFIG_FILE is from the CLI without having to set it in the environment. (cherry picked from commit db00529)

This will allow us to specify the AWS_CONFIG_FILE from the command line without having to set it in the environment. (cherry picked from commit 9214b9b)

This will allow us to specify the AWS_CONFIG_FILE from the command line without having to set it in the environment. This required reworking the cosalib/s3 library into an S3() class so that we could call boto3.client('s3') after setting the AWS_CONFIG_FILE environment variable in cmd-buildfetch. Otherwise it would get called as soon as the library was imported and setting the environment variable would have no effect (too late). (cherry picked from commit 9d55a20)

This will allow us to specify the AWS_CONFIG_FILE from the command line without having to set it in the environment. (cherry picked from commit 2b0b36d)

Right now cmd-build will just pick up whatever the latest symlink points to and try to get previous build information from that, but what if the absolute latest build doesn't have a build for the architecture we are targetting? Currently cmd-build will fail with: ``` /usr/lib/coreos-assembler/cmd-build: line 163: /srv/builds/36.20220720.20.5/aarch64/meta.json: No such file or directory error: failed to execute cmd-build: exit status 1 ``` What we really need is to get the latest build for the given architecture and use that in cmd-build instead. (cherry picked from commit 2ca610c)

If we are starting out a brand new stream and the x86_64 build has completed, but there was no build yet for $arch then the detected previous buildid: ``` previous_buildid = parent_build or self.get_latest() ``` Will just be the build ID of the x86_64 build from `get_latest()`, but this code tries to open a meta.json for it from $arch, which won't exist. Here's the error we see: ``` Traceback (most recent call last): File "<string>", line 5, in <module> File "/usr/lib/coreos-assembler/cosalib/builds.py", line 134, in init_build_meta_json with open(metapath) as f: FileNotFoundError: [Errno 2] No such file or directory: '/srv/builds/36.20220714.dev.0/aarch64/meta.json' error: failed to execute cmd-build: exit status 1 ``` This should fix that error by verifying the detected buildid actually has a build for that architecture before continuing. (cherry picked from commit 786e87b)

CI hit a race where parallel `buildextend-*` would race to generate `tmp/image.json`. We need to lock this for efficiency (avoid duplicate work) and correctness. (cherry picked from commit 6222e3f)

Linux `tmpfs` still doesn't support `user.` xattrs, and so for toolbox-like containers that use a (proper!) `tmpfs` for `/tmp` instead of having it be `overlayfs`, we need to ensure that we create our tempdir in `$workdir/tmp` - the workdir must have `user.` xattr support for the `cache/repo-build` anyways. Closes: coreos#2904 (cherry picked from commit e024805)

This was originally needed for `cosa sign` to force the re-import from the tarball to ensure that the detached metadata was present in the tmp repo after signing (3b192cc). But now with the switch to `ostree container`, we operate directly on the tmp repo and re-export the signed commit, so this is no longer needed. This also fixes `cosa generate-hashlist` also setting `force`, which I don't think it actually needed. (cherry picked from commit ccb4b73)

Prep for next patch. (cherry picked from commit 68f31d5)

`flufl` requires specifying lock lifetimes, i.e. how long the lock should be considered valid before others can also lock it. The default is 15s. This surprisingly means that a critical section could in fact be run concurrently if one of the locks takes more than 15s. I don't think we want to use lock lifetimes here, so let's make sure we always pass in a large number. The intended use case is that your program doesn't deadlock, but I'd prefer we deadlock and have pipelines fail instead after timing out at the Jenkins level. This should fix the `NotLockedError: Already unlocked` error messages we've had in the FCOS pipeline after parallelizing `cosa generate-hashlist` and `cosa buildextend-qemu` (related: coreos/fedora-coreos-pipeline#554). What would be interesting would be lock timeouts instead: e.g. have `with Lock(...)` automatically throw if it still hasn't acquired the lock after X duration. That'd allow us to error out earlier than the Jenkins timeout in case of something taking much longer than it really should. We could do this now but it'd require creating the lock separately from using it in a context. Latest upstream `flufl` supports specifying this as part of the constructor so let's wait for that instead: https://gitlab.com/warsaw/flufl.lock/-/merge_requests/47 (cherry picked from commit 1151356)

openshift-merge-robot · 2022-09-16T14:02:29Z

@dustymabe: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2022-09-16T14:22:24Z

@dustymabe: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/rhcos	`dd4f704`	link	true	`/test rhcos`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

travier

LGTM from a quick review

dustymabe · 2022-09-16T16:16:12Z

CI is known broken for rhcos* PRs. Merging.

dustymabe and others added 12 commits September 15, 2022 14:47

cosalib/aws: allow passing through --credentials-file to ore

f5a1db1

This will allow us to specify where the AWS_CONFIG_FILE is from the CLI without having to set it in the environment. (cherry picked from commit db00529)

cmd-buildupload: add s3 --aws-config-file option

d0ca1ef

This will allow us to specify the AWS_CONFIG_FILE from the command line without having to set it in the environment. (cherry picked from commit 9214b9b)

cmd-sign: add s3 --aws-config-file option

856b169

This will allow us to specify the AWS_CONFIG_FILE from the command line without having to set it in the environment. (cherry picked from commit 2b0b36d)

build: Use lockfiles to guard ostree repo and image.json extraction

cf0d048

CI hit a race where parallel `buildextend-*` would race to generate `tmp/image.json`. We need to lock this for efficiency (avoid duplicate work) and correctness. (cherry picked from commit 6222e3f)

cmdlib: import datetime module itself

a2b45fd

Prep for next patch. (cherry picked from commit 68f31d5)

openshift-merge-robot added the needs-rebase label Sep 16, 2022

dustymabe changed the base branch from main to rhcos-4.11 September 16, 2022 14:44

travier approved these changes Sep 16, 2022

View reviewed changes

dustymabe merged commit b71d282 into coreos:rhcos-4.11 Sep 16, 2022

dustymabe deleted the dusty-cosa-remote-support-4.11 branch September 16, 2022 16:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rhcos-4.11] support being initiated from a `cosa remote-session` #3108

[rhcos-4.11] support being initiated from a `cosa remote-session` #3108

Uh oh!

dustymabe commented Sep 16, 2022

Uh oh!

openshift-merge-robot commented Sep 16, 2022

Uh oh!

openshift-ci bot commented Sep 16, 2022 •

edited

Loading

Uh oh!

travier left a comment

Uh oh!

dustymabe commented Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[rhcos-4.11] support being initiated from a cosa remote-session #3108

[rhcos-4.11] support being initiated from a cosa remote-session #3108

Uh oh!

Conversation

dustymabe commented Sep 16, 2022

Uh oh!

openshift-merge-robot commented Sep 16, 2022

Uh oh!

openshift-ci bot commented Sep 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

travier left a comment

Choose a reason for hiding this comment

Uh oh!

dustymabe commented Sep 16, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[rhcos-4.11] support being initiated from a `cosa remote-session` #3108

[rhcos-4.11] support being initiated from a `cosa remote-session` #3108

openshift-ci bot commented Sep 16, 2022 •

edited

Loading