-
Notifications
You must be signed in to change notification settings - Fork 523
Enhancement proposal for Confidential Clusters #1878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @uril. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
c213033 to
677a330
Compare
|
/retest |
cgwalters
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work overall! There's of course huge amounts of detail in some of this, but I think the outline looks good.
| instance | ||
|
|
||
| * RHEL CoreOS | ||
| * Support verifying the integrity of the disk content during re-provisioning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to boot in a pure stateless mode here, where we're not accessing any persistent storage for /etc and /var right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address the larger concern that we can not trust the filesystem itself on the disk on first boot, we need some form of integrity verification that covers the entire partition. I could be implemented in a similar fashion to what is done with Secure Execution on S390x.
If we don't take this concern into account, then we could indeed only read the fs-verity verified content from the composefs repo and re-generate the /etc & /var content from it.
| * Measure Ignition config in a PCR value, before parsing it | ||
|
|
||
| * Machine Config Operator | ||
| * Ensure that MachineConfigs are only served to attested nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still really hope that we get away from having a MCS at all by shrinking the role of Ignition such that everything needed to join fits into the bootstrap config which really really should be able to fit in e.g. AWS instance user-data store completely and the like.
| node, which is considered trusted and it is used to bootstrap the trust for the | ||
| rest of the cluster. | ||
|
|
||
| In phase 2, the bootstrap node itself must be attested to establish trust. It is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but won't most people who want to do this actually want HCP anyways? I would definitely put HCP support far in front of this as a priority
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be an option. HCP deployments place the trust in the cluster hosting the control plane, so for it to make sense for Confidential Clusters, it would be a configuration with a Hosted Control Plane in a trusted environment (likely a Bare Metal cluster) and HCP Workers in a cloud.
If we want everything in the cloud then we are back to the standalone cluster case for the control plane part as you can not claim that your workers are confidential if the control plane is hosted on the same cloud with non confidential VMs.
| have been pre-computed and stored, or pull the container image itself and | ||
| directly compute the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd lean towards pull, but have a cache of course of container-sha ➡️ PCRs
677a330 to
ff97116
Compare
yuqi-zhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments inline
|
|
||
| ## Proposal | ||
|
|
||
| Run all OpenShift nodes on Confidential VMs (CVMs). Use remote attestation to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question: the CVM is the entire node right? You can't say, run 2 CVMs on one machine, or run things outside of the CVM on that machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in our case, OpenShift node == CVM == OpenShift machine.
The "host machine" (the cloud server) can run many CVMs (and other things).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the entire node runs as a Confidential VM provided by the cloud provider. You don't control on which host your VM runs (it's a cloud), and you can not run things outside.
| components: | ||
|
|
||
| * OpenShift API | ||
| * Allow nodes to be marked as confidential. This is specific per cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a clarification here, this will be a cluster level setting? Or are you proposing that in one cluster, some nodes can be CVMs and others not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the nodes of a confidential cluster are CVMs.
The specific configuration/API, for requesting cloud providers to create a CVM is platform dependent and is not kept at the cluster level.
A mixed cluster of non/confidential nodes is technically possible, but is not safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be cluster wide. A cluster will be either all confidential nodes or not at all. Technically you can mix things, but it does not make sense for a cluster running in a cloud to be mixed.
| reference-values (expected "correct" values) in Trustee. | ||
|
|
||
| * RHEL CoreOS | ||
| * Add support for composefs (native), UKI, and systemd-boot to bootc (Bootable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the MCO and on-cluster RHCOS operations doesn't currently use bootc at all, would that integration be needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will indeed need "direct" bootc support in the MCO (i.e. not use rpm-ostree at all anymore).
| (cloud provider specific). | ||
| * Deploy the Confidential Cluster Operator on the bootstrap node | ||
|
|
||
| * Confidential Cluster Operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be running as a core payload operator that's always present, or only deployed conditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be an operator part of the core payload but only running if needed.
| installer, passing in the URL of the external Trustee instance chosen above. | ||
| 1. The OpenShift installer generates a set of configuration files for the | ||
| external Trustee instance. | ||
| 1. If the cluster creator adds/removes/modifies MachineConfigs, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify on this point? The admin shouldn't be able to "modify configs on the fly" during installation. The MCO has a singular render generation phase, if that's what you're trying to fetch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the config that will be used for the external Trustee server will include the full config passed to the bootstrap node. If anything modifies this config, then the Trustee config will have to be re-generated. We don't expect the manifests to be modified live during the installation.
| This enhancement introduces some new API extensions: | ||
|
|
||
| * **Running nodes on cloud CVMs**: | ||
| For each supported cloud provider, confidential computing types and code need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide some examples for this? Just curious what that would look like in practice.
Also curious if this affects the ongoing MAPI->CAPI transition at all
| In phase 2, the initial configuration will be modified to tell Ignition to fetch | ||
| the new configuration from a remotely attested resource endpoint. The MCS will | ||
| not serve Ignition configs directly for nodes anymore but will store those as | ||
| resources in a Trustee instance. To access those configurations, the node will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the trustee instance responsible for asking the MCS for the contents through some in-cluster proxy, or would we have to have the MCS initiate that?
| is created, which hosts a temporary control plane used to create the final | ||
| control plane and worker nodes of the cluster. | ||
|
|
||
| In phase 1, the Confidential Cluster Operator is deployed on this bootstrap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we GA on phase 1, or would we techpreview phase 1, implement phase 2, and GA the feature after we complete phase 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will dev/tech-preview on phase 1 and phase 2.
I'm not sure about GA after phase 1.
ff97116 to
766b9a1
Compare
|
@uril: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| As part of the cluster installation process in cloud platforms, a bootstrap node | ||
| is created, which hosts a temporary control plane used to create the final | ||
| control plane and worker nodes of the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another part of the installation process we should consider, where the installer:
- generates the ignition file for the bootstrap node
- uploads bootstrap ignition to a cloud storage bucket
- puts pointer ignition in bootstrap node userdata redirecting bootstrap node to pull ignition from storage bucket (using a self-signed URL, although for Azure I think self-signed URL support is WIP and azure uses storage account keys)
It's unclear whether this model will continue to work with the remote attestation service. If the First Boot configuration from the attestation service can be merged alongside the bootstrap ignition bucket, then it would require fewer changes to the installer. For example (pseudo), the bootstrap pointer ignition would be injected with the additional attestation server source:
{
"ignition": {
"config": {
"merge": [
{
"source": "http://<registration-service>/ignition"
},
{
"source": "http://<cloud-bucket>/ignition"
}
]
}
}
}This would utilize ignition's merge functionality to grab the configs from both the attestation server and the cloud bucket.
But if it's a requirement that the remote attestation server is contacted first (or only) then the installer would presumably need to be updated to upload bootstrap ignition so it could be served by the attestation service.
This enhancement proposes the integration of confidential computing
capabilities into OpenShift cluster, enabling the deployment of
Confidential Clusters