-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/e2e: test changes to the pre-install-payload image #179
tests/e2e: test changes to the pre-install-payload image #179
Conversation
I tested it locally on ubuntu and centos8 VMs. Let me see if it works fine on SEV and TDX: /test |
/test-kata-qemu-sev |
Hi @arronwy ! Could you please help me understand why it failed in TDX jobs? I added a handling for CentOS in ea873b9#diff-4f68cb05ed27370659c120227cdc91b0db7a0d38cdcbde60582cfc94ae9b8f71R35 that I suspect caused the failures. In a VM locally using CentOS Stream8 it worked out though. |
Hi @wainersm the tests seems failed at the operator uninstall case:
expected log:
This PR: #180 also have similar issue, the failed cases seems not related with this PR. |
hmm... so it got stuck on the uninstall test until it hit the job timeout ("Build timed out (after 20 minutes)"). I will open an RFE issue: it would be nice to have a timeout (<20min) on the test itself so it does have the opportunity to clearly fail and print debugging messages. The way it is today we only see the job abort message and nothing more. Thanks for the help @arronwy ! Ah, mind to review the code? :D |
The SEV job failed because it hit the docker.io pull limit:
|
ea873b9
to
446d2a2
Compare
446d2a2
to
84fc4dd
Compare
Rebased to main. Let's see if the tests pass now: /test |
@mythi hi Mikko, is there a known issue with enaclave-cc CI? In any case, those changes doesn't touch enclave-cc...can I go ahead and merge once I get the approvals? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; i remember having to do similar things when testing the operator (build and pushing pre-install-payload to local registry, the --insecure
flag, etc.)
Thanks Jeremi! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! @wainersm
I restarted the failing job to see if it was just a temporary error. FWIW, the failure was not in enclave-cc but when pushing the operator image to a "local" registry needed by the tests. |
TDX failures:
|
/test-tdx |
Might not be relevant but I'll post it just in case: when I was debugging the uninstall hook a couple weeks back I had trouble updating multi arch manifests in the local registry. Modifying and rebuilding the container image, but keeping the same tag name for multi arch manifest somehow did not result in updated images being used when pulled. So let's make sure we're cleaning up the local registry between runs. |
enclave-cc is OK |
This error is related :( Failing to push the image's manifest to the local registry. I'm looking at how to debug it without access to the test env.... |
@wainersm can we revive this? i would like to make changes to the operator+pre-install-payload and it would be great to have a way to test it. |
I only rebased this PR and found not conflict. I checked the code and it seems updated, let's see if it still works: /test |
Failed to add Google's kubernetes repository on build containerd image:
Usually that kind of problem is intermittent....run again, it works. Let's try: /test-kata-qemu |
@wainersm: |
Switched the k8s's repository gpg key as suggested by @jepio . Let's try again: /test-kata-qemu |
/test-kata-qemu |
Well, no lucky on getting that working on a re-run though. :-/ |
539f465
to
2d2ccfa
Compare
/test |
Looks like we are not seeing Port 6443 in use error anymore. |
2d2ccfa
to
55c19e9
Compare
/test |
Nops, just kubernetes not being properly uninstalled. I don't remember changing anything related to that. |
@UnmeshDeodhar, SNP: http://jenkins.katacontainers.io/job/confidential-containers-operator-main-ubuntu-20.04_snp-x86_64-containerd_kata-qemu-snp-PR/80/console
|
As the tests had taken an orthogonal approach from the tests running on other platforms, I have no idea where to even start debugging it. @UnmeshDeodhar, @ryansavino, mind to take a look at the failures and double check that everything on the cluster side is okay? This PR is only adding tests to the pre-install image, which makes me quite certain it is not the reason of the failures on the SEV machine. |
enclave-cc test is failing due to:
|
55c19e9
to
d3a364f
Compare
That's a HostPath mounted, and cannot be removed from within the container. This may cause issues like: ``` Removing the /opt/confidential-containers directory rmdir: failed to remove '/opt/confidential-containers': Device or resource busy ``` Signed-off-by: Fabiano Fidêncio <[email protected]>
When building the pre-install-payload image for CI it needs to pull/push the image from a local registry that is not protected. The `docker manifest` commands (e.g. create) refuses to connect in an unsecure registry by default, therefore the pre-install-payload build fail. That can be solved by passing the --insecure flag to `docker manifest` thus this change allow to pass extra flags to that command. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Currently changes on install/pre-install-payload directory aren't tested because the scripts aren't re-building the pre-install-payload image. With this change the image will always be built and used. It was added more two dependencies: - kustomize: used to edit the kustomization file so to update the pre-install-payload image - qemu-user-static: used by docker buildx to build the pre-install-payload image for multiple architectures. It also needs to pass the `--insecure` to `docker manifest` commands because the image is pushed/pulled to a local insecure registry, otherwise `docker manifest` fails Fixes confidential-containers#177 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
We've seen issues like the one shown below as part of the baremetal machines: ``` 09:44:31 failed to put manifest localhost:5000/container-engine-for-cc-payload:latest: errors: 09:44:31 manifest blob unknown: blob unknown to registry 09:44:31 manifest blob unknown: blob unknown to registry 09:44:31 manifest blob unknown: blob unknown to registry 09:44:31 manifest blob unknown: blob unknown to registry ``` Those can be avoided by removing previously created ${HOME}/.docker/manifests/$manifest Signed-off-by: Fabiano Fidêncio <[email protected]>
Instead of doing `[ ... ] && ...`, let's just expand the if as we could simply fail the first condition, making the whole script fail, leading then to a pod error. Signed-off-by: Fabiano Fidêncio <[email protected]>
Let's make sure that we also test the pre-install / post-uninstall images as part of the enclave-cc tests, so we make sure that any changes we do with Kata Containers in mind won't break enclave-cc. Signed-off-by: Fabiano Fidêncio <[email protected]>
d3a364f
to
a648dbc
Compare
This was basically a bad copy and paste from my side, which led to the HW version of the enclave-cc to be used instead of the SIM one. It should be fixed now, thanks @mythi for the help! |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from enclave-cc side
@fidencio thanks for taking over this and making it done! |
/test-sev |
/test-snp |
/test-kata-qemu-sev |
/test-kata-qemu-snp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @fidencio
SEV test is not passing, but we got a green light to proceed from @ryansavino on the following thread: https://cloud-native.slack.com/archives/C039JSH0807/p1693291312001459?thread_ts=1692956110.436209&cid=C039JSH0807 |
Currently changes on install/pre-install-payload directory aren't tested because the scripts aren't re-building the pre-install-payload image, and with this change the image will always be built and used.
Fixes #177