Replies: 15 comments
-
Thanks for the proposal! Some questions below:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @kailun-qin for the detailed questions and some of these are still being worked out. I wouldn't say this is a generic solution for fork performance but to enable cloud frameworks to use Gramine. @ying2liu and I have been working closely with our customers to collect requirements and inputs for this design. Based on our interaction with customers, they are fine with the high-level design, but many things are still open-ended. We are currently working on a PoC and will have a better picture once we complete our PoC. Please let us know if you find any other requirements/use cases that may help fine tune this design to support cloud framework. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the great questions. Please excuse that I cannot answer all of them. This is an early proposal, but gathering feedback and questions like yours will help us to consider various aspects.
No, it only helps in certain scenario's were reusing of enclaves is possible functionally and from a security point of view.
The fork pool should be able to reuse enclaves instead of forking a new enclave everytime the fork pool would like to start a new enclave. The fork pool though has to be aware that they can reuse enclave she started earlier.
Initially we target the cloud frameworks. It is not a generic solution to forking like Nginx does and the solution does not suggest sending fork checkpoints and restoring them for reuse. This solution does not allow to emulate forking multiple times. This would require extensive additions that are currently not planned.
We're not expecting this. The payload receiver is designed to be stateless as well. Between two executions of the payload receiver (e.g., two cycles within the figure), the payload receiver will forget all its memory contents. If one needs state, some external party, e.g., the scheduler, needs to keep track of said state.
We've not fully defined the behavior. At this point, we're mainly concerned about an application that runs a single enclave only.
I do not understand the question.
The payload receiver should perform security checks to ensure it is an allowed workload. We're not requiring a particular form of attestation and leave this part to the payload receiver and how strict it would like to be. If you do not attest the workload in any way (e.g., by attesting and validating signatures), the enclave runs arbitrary code at which point TEE guarantees are nullified and no security can be provided.
For now, we're planning to always restore the payload receiver despite the exit code. The highest priority is to keep the enclave alive as long as possible and start a workload.
We've have not thought about the use of either of these features in isolation and will start to consider if it makes sense to have them independently as well. Do you have a particular use case in mind which would require either of these features independently? Thanks! |
Beta Was this translation helpful? Give feedback.
-
@vahldiek We met some customers this week and collected new detailed requirements. We will add some implementation in the PoC code and sync up with you when you come back. |
Beta Was this translation helpful? Give feedback.
-
On a general note, this sounds super-hard to implement, resetting the whole internal Gramine state correctly is quite challenging.
The idea here is just to recycle enclaves left by workloads which terminated (hence the new logic at
I'm afraid that this will end up as a heavily abused feature - there's no way from our side to ensure that the Payload Receiver allows only the interior workloads which come from the same source, and at the same time, the easiest way to implement Payload Receiver is just to allow workloads from anyone.
This is pointless, there are like 20 other trivial ways to survive such a cleanup/"protection" if your workload is malicious. This will only provide a fake sense of security and nothing more. |
Beta Was this translation helpful? Give feedback.
-
Actually, isn't it what our
The only way seems to put an annoying This is also in line with our current tactic -- we simply print "Hey, this option is insecure" for easily-abusable options like
But we're not talking about malicious workloads. Anjo pointed out that a benign app may forget to clean some resources, so Gramine will do it on apps behalf. Maybe "strengthen the security" is a wrong phrase, but zeroing out all VMAs after app exit is definitely reasonable (let's call it "cleanup for stability"). |
Beta Was this translation helpful? Give feedback.
-
I think it's more than this, because you also need to e.g. reset all FDs. But maybe you are right and it isn't actually that complex? I forgot that we already have a part of this logic implemented.
Yeah, but he explicitly said that this is for security and I think this was the intended meaning - look at the part about making Gramine code read-only.
Not really, doesn't |
Beta Was this translation helpful? Give feedback.
-
Maybe we could only allow running these apps from Protected Files (and disallow any key manipulation in the window between apps - if that's even possible). This way the new app could only be provided by the same entity as the old app (because the keys must have been provisioned while the old app was running). |
Beta Was this translation helpful? Give feedback.
-
(I might side-track the conversation on reusing enclaves but I wanted to add more context around "cloud frameworks")
this section gives me the impression that the main problem is just the startup time due to enclave creation but it does not talk about things like application packaging. For instance, existing containers need to be customized using tools like Furthermore, these "static" graminized containers lead to another challenge with "cloud frameworks" such as Kubernetes: the workload owners can add additional mounts (like storage, configs, secrets) that Gramine (manifest) would have to be able to take into account. Similarly, the workload owners will specify resource requests/limits, including how much EPC they request. The latter is void today but maybe one day we'll have per app/container EPC usage limits enforced by cgroups...
Container apps are mortal and guarded by their namespaces/cgroups that are container specific. I agree with @kailun-qin this design with the long-lived payload receiver will have implications to surrounding components. |
Beta Was this translation helpful? Give feedback.
-
I don't understand this part. EPC size/limit is not specified anywhere in the manifest (which you mention in that paragraph) and can't really be controlled from usermode (it's just a transparent cache). |
Beta Was this translation helpful? Give feedback.
-
That's what my comment tried to say too. The limits are not controlled today but afaik the cgroups is in plans. Maybe one day each container will get their own fair share of EPC based on the size request they've specified. Does this proposal support that use-case and how the quota of this payload receiver enclave is determined? |
Beta Was this translation helpful? Give feedback.
-
This would still be untrusted, so we could make it controllable from gramine cmdline instead of manifest (as it doesn't have to be measured).
If you want to have different payloads to have different host caching configs then it would be doable, but would complicate things, so I don't think we'll add this (especially that you can't control this from cgroups yet). We'd need to add an ocall to request reconfiguration of the current policy. Not too hard, but probably something for later. |
Beta Was this translation helpful? Give feedback.
-
Do we have any plans to work on this? This proposal seems complicated and quite risky in terms of security, and at the same time with the introduction of EDMM the enclave loading time isn't such a big issue as it was in the past. |
Beta Was this translation helpful? Give feedback.
-
I don't think any of us has plans to work on this in the near future. (Also, I'd like to mention #430 which is relevant, since it also has the idea of a re-use of enclaves.) We could move this to |
Beta Was this translation helpful? Give feedback.
-
Sounds good, will do. |
Beta Was this translation helpful? Give feedback.
-
Description of the problem
Modern Cloud Frameworks elastically deploy their workloads and frequently scale instances up and down. Efficient support is necessary in Gramine to enable the use of TEEs in these environments. In this regard the startup times and cost to acquire memory are prohibitively high.
The goal of this issue is to describe and discuss a potential path towards an increased adoption for these modern workloads. We build the solution for adoption in projects such as Teaclave, Inclavare, Confidential Containers, and Marblerun.
Existing (insufficient) solutions
Solutions so far build on a fork-server model. The fork server follows a simple workflow:
Naive fork-server implementations suffer from high startup latencies due to building a new enclave in step 3. While the use of EDMM can reduce this time substantially, acquiring memory is delayed till the point the workload needs the additional memory. In this solution, the startup cost and memory acquiring cannot be amortized over several workload invocations. This high cost is prohibitive to accepting this solution.
Pooling the enclave creation is a common optimization moving the cost into down times of the machine. The cost itself still occurs and due to the need to keep a pool around, commits substantial amounts of memory which cannot be used otherwise.
In general, cloud deployments demand a way to amortize the enclave creation and memory acquiring over time and not keeping large pools of enclaves around. In our solution we seek a common implementation to help elevate the cost.
Making fork-servers more efficient with enclave reuse
Our main objective is to be able to reuse enclaves after one workload finishes. This amortizes the enclave creation and memory acquiring cost over multiple workloads. Our intention is to use Gramine's ability to intercept the
exit()
system call and translate it into anexec()
system call restarting the application.The following figure describes the state diagram of Gramine, a payload receiver (PR) and the application. Initially, Gramine will create the enclave (step 1.) and start the payload receiver (step 2.). Based on the workload inputs from the scheduler of the cloud framework (step 3.), the payload receiver will prepare, measure, and attest/verify the workload and exec into the actual workload application (step 4.). The application starts to run and answer to requests. Once the scheduler decides to terminate the application, it will call
exit()
(step 5.) which Gramine intercepts and translates into anexec()
system call restarting the payload receiver (step 6.). At this point, step 3. to 6. can repeat as often as the enclave is kept alive. Once the execution environment is no longer needed (e.g., when the scheduler decides to reduce the number of available enclaves), the payload receiver terminates the enclave.We suggest to implement the translation of
exit() -> exec()
by enabling it in the manifest via a new optionsys.restart_on_exit = TRUE
. When enabled any regularexit()
system call will restart the application as specified within the manifest (loader.entrypoint
) with its arguments. Since it is part of the manifest, the exitance can be attested via the Intel SGX attestation in MREnclave.Security implications of reusing enclaves
HW attestation no longer reflects what executes: Similar to how Gramine extends the HW attestation via a manifest, this solution extends the attestation further to a trusted component controlling what application payloads to accept. For remote attestation, the payload receiver needs to measure, attest and verify any payloads. Any remote party then needs to trust the payload receiver to perform its software-based certificat generation. The trust can be extended, e.g., by using MRSigner of the enclave to use for any payload signature verification (both the enclave and the application payloads come from the same source). Alternatively, a restricted set of signers or valid payload measurements can be provided via trusted or protected files in the manifest of the payload receiver.
How to protect Gramine from application tampering: Multiple applications will use the same enclave and could potentially compromise the Gramine's memory, function pointers or completely take over the enclave. As a result, without further HW mechanisms, we suggest to only reuse enclaves when workloads come from a single tenant. Trust has to be placed in the workloads to not maliciously attack Gramine and the reusable enclave. We further suggest to strengthen the security by protecting Gramine's code with RX-only pages and reset as many data structures as possible during the reuse of an enclave. This is in addition to a regular cleanup during
exec()
which writes zeros to allocated memory and closes file descriptors.Specializing the manifest and security data structures in Gramine for each application
The manifest will be provided by the payload receiver and only include its own needed resources (like trusted or protected files) and mount points. To enable the application workload to successfully run, each application needs to temporarily extend the manifest/data structures inside Gramine. We suggest to enable manifest extensions during the
exec()
system call which will be enabled in the manifest of the payload receiver (sys.enable_manifest_extension_on_exec
) and the actual contents of the additional manifest will be send via a pseudo-system call as a file in/dev/gramine/exec_manifest_extension
.Some of the manifest options like the memory size will only be checked against the available memory, whereas others like (trusted files, and mount points) will update the internal data structures within Gramine. The extension of the data structures occurs during the
exec()
system call and not during the write of/dev/gramine/exec_manifest_extension
.Todos
exit()
by restarting the application as described in the initial manifestexit()
exec()
via pseudo system callContributors
This work is the result of a collaboration with @vijaydhanraj @ying2liu @dimakuv @bigdata-memory and extensive discussions with customers.
Beta Was this translation helpful? Give feedback.
All reactions