-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-install-payload: Handle official and VFIO / GPU containerd installations #256
pre-install-payload: Handle official and VFIO / GPU containerd installations #256
Conversation
…sion This ensures a clear separation between using / installing the CoCo specific fork of containerd, and using an official containerd release (which will come later in this series). Signed-off-by: Fabiano Fidêncio <[email protected]>
b30b6d9
to
54ccfca
Compare
We're renaming a few variables here to make sure we use COCO stuff only for the projects we've forked, such as containerd. This will help immensely when having to deal with the official and the CoCo version forked of a specific component. In the long run, of course, we want to get rid of all the forks we have, but this is needed at least for this release. Signed-off-by: Fabiano Fidêncio <[email protected]>
3ad7619
to
771523f
Compare
/test |
771523f
to
816b357
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Regarding the test failure for s390x, it is caused by the instability of the infra. We can proceed to get this PR merged. 😉 |
Let's change the reqs-deploy.sh script so it's clear that we're dealing with a specific kind of containerd installation, such as the coco forked one. Later on we'll introduce an option to install an official contained binary, just in case folks are running on an cluster that doesn't have the minimum pre-requirements in place. Signed-off-by: Fabiano Fidêncio <[email protected]>
88dda3e removed those from one part of the config, but forgot to remove those from the pre-install / post-uninstall parts. Signed-off-by: Fabiano Fidêncio <[email protected]>
We rely on the node having containerd v1.7.0 or higher for the work we're doing. With this in mind, let's also ship the minimum containerd required by us, so users can decide whether or not to use it instead of upgrading their clusters. Signed-off-by: Fabiano Fidêncio <[email protected]>
Let's just make it matching the one used in the Makefile. Signed-off-by: Fabiano Fidêncio <[email protected]>
975785f
to
280815b
Compare
/test |
# If set to true, this will install the CoCo fork of the containerd, | ||
# the one that has patches for handling GPU / VFIO, on the node | ||
# default: false | ||
- name: "INSTALL_VFIO_GPU_CONTAINERD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need three different options? Or just having INSTALL_COCO_CONTAINERD for custom containerd and INSTALL_OFFICIAL_CONTAINERD for officially released containerd version is good enough?
My apologies if this has been already discussed and I'm missing the full context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Pradipta, the extra context is that Zvonko & Nvidia need some extra features in containerd that are in our custom fork at the moment, but not related to the image pull pieces, so I believe Fabiano has separated them out as they are not related and will be resolved at different times. https://cloud-native.slack.com/archives/C039JSH0807/p1693476820263639 contains some of the info. Fabiano can correct, if I'm wrong here, but I think he's about 12months behind on github notifications.
I hope that helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good point.
We have 3 different scenarios to cover, Pradipta.
-
We need to use the forked containerd, which allows us to pull the image on the guest. This version is based on 1.6.x, and I really expect this to die as soon as possible, but that won't happen for v0.8.0 release as Enclave CC still depends on that. Without much thinking, we also added to that containerd the VFIO / GPU related patches, as that was already a forked version of containerd.
-
We do not use the forked containerd, as we do rely on nydus-snapshotter to pull the image on the guest.
First of all, this brings in a requirement to use containerd v1.7.x, as that simplifies a lot the snapshotter setup, which can be done per runtime handler now, and that was not possible with 1.6.x.
Then within 2, we have the follow scenarios:
2.1. I just want to use it, I don't care about VFIO / GPUs: Then the upstream containerd is what we want, and we can install it. I hope we will never do that, to be honest, but we're giving the user the option to do so in order for them to easily test CoCo.
2.2. I do need to use VFIO / GPU stuff: Now we have to deal with non-upstreamed patches, and that's yet another containerd fork. Could it be the same one as 1? Maybe, but I'd prefer making it as clear as possible the differentiation of why we're using a forked version of containerd. I'm afraid people will try to use VFIO / GPU and will end up just using the non-maintained / non-supported way of pulling image into the guest by mistake / lack of understanding on what has to be changed.
That's the reason we end up with those 3 different env vars. :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stevenhorsman @fidencio .. This helps ..:-)
tar xvzpf containerd-${OFFICIAL_CONTAINERD_VERSION}-linux-${ARCH}.tar.gz -C ${NODE_DESTINATION} && \ | ||
rm containerd-${OFFICIAL_CONTAINERD_VERSION}-linux-${ARCH}.tar.gz | ||
|
||
#### Confidential Containerd forked containerd for VFIO / GPU stuff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Containerd -> Containers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the typo, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks @fidencio
As we realy on the node having containerd v1.7.0 or higher for the work we're doing, and as some of the consumers are strictly requiring a fork that contains a fix for dealing with VFIO / GPU related workloads, we need to introduce this new flavour and allow them to use it, for the specific cases mentioned above. Signed-off-by: Fabiano Fidêncio <[email protected]>
280815b
to
7b094ab
Compare
/test |
SNP is failing with:
This is not related to this PR at all. /cc @ryansavino |
TDX failed with:
This is unrelated, but I retriggered the CI. |
This one seems interesting. The only thing I could think of here is that it snagged the wrong key associated with ssh for the image. If we see this again on another PR, tag me and I'll take a look. Thanks. |
This work is required as we'll move out from the forked version of containerd, but not for all the projects, and not without requiring a minimum version of containerd to do so.
As enclave-cc will still be depending on the containerd fork for v0.8.0, and as we may need to still install the minimum version of containerd on clusters using, for a reason or another, containerd v1.6.x, we need to handle those two different installations in a graceful manner.