Efficiently Supporting Cloud Frameworks by reusing enclaves and specializing manifests #1280

vahldiek · 2022-04-21T20:05:21Z

vahldiek
Apr 21, 2022

Description of the problem

Modern Cloud Frameworks elastically deploy their workloads and frequently scale instances up and down. Efficient support is necessary in Gramine to enable the use of TEEs in these environments. In this regard the startup times and cost to acquire memory are prohibitively high.

The goal of this issue is to describe and discuss a potential path towards an increased adoption for these modern workloads. We build the solution for adoption in projects such as Teaclave, Inclavare, Confidential Containers, and Marblerun.

Existing (insufficient) solutions

Solutions so far build on a fork-server model. The fork server follows a simple workflow:

ask for work
receive workload
fork and execute new workload
go back to 1)

Naive fork-server implementations suffer from high startup latencies due to building a new enclave in step 3. While the use of EDMM can reduce this time substantially, acquiring memory is delayed till the point the workload needs the additional memory. In this solution, the startup cost and memory acquiring cannot be amortized over several workload invocations. This high cost is prohibitive to accepting this solution.

Pooling the enclave creation is a common optimization moving the cost into down times of the machine. The cost itself still occurs and due to the need to keep a pool around, commits substantial amounts of memory which cannot be used otherwise.

In general, cloud deployments demand a way to amortize the enclave creation and memory acquiring over time and not keeping large pools of enclaves around. In our solution we seek a common implementation to help elevate the cost.

Making fork-servers more efficient with enclave reuse

Our main objective is to be able to reuse enclaves after one workload finishes. This amortizes the enclave creation and memory acquiring cost over multiple workloads. Our intention is to use Gramine's ability to intercept the exit() system call and translate it into an exec() system call restarting the application.

The following figure describes the state diagram of Gramine, a payload receiver (PR) and the application. Initially, Gramine will create the enclave (step 1.) and start the payload receiver (step 2.). Based on the workload inputs from the scheduler of the cloud framework (step 3.), the payload receiver will prepare, measure, and attest/verify the workload and exec into the actual workload application (step 4.). The application starts to run and answer to requests. Once the scheduler decides to terminate the application, it will call exit() (step 5.) which Gramine intercepts and translates into an exec() system call restarting the payload receiver (step 6.). At this point, step 3. to 6. can repeat as often as the enclave is kept alive. Once the execution environment is no longer needed (e.g., when the scheduler decides to reduce the number of available enclaves), the payload receiver terminates the enclave.

We suggest to implement the translation of exit() -> exec() by enabling it in the manifest via a new option sys.restart_on_exit = TRUE. When enabled any regular exit() system call will restart the application as specified within the manifest (loader.entrypoint) with its arguments. Since it is part of the manifest, the exitance can be attested via the Intel SGX attestation in MREnclave.

Security implications of reusing enclaves

HW attestation no longer reflects what executes: Similar to how Gramine extends the HW attestation via a manifest, this solution extends the attestation further to a trusted component controlling what application payloads to accept. For remote attestation, the payload receiver needs to measure, attest and verify any payloads. Any remote party then needs to trust the payload receiver to perform its software-based certificat generation. The trust can be extended, e.g., by using MRSigner of the enclave to use for any payload signature verification (both the enclave and the application payloads come from the same source). Alternatively, a restricted set of signers or valid payload measurements can be provided via trusted or protected files in the manifest of the payload receiver.

How to protect Gramine from application tampering: Multiple applications will use the same enclave and could potentially compromise the Gramine's memory, function pointers or completely take over the enclave. As a result, without further HW mechanisms, we suggest to only reuse enclaves when workloads come from a single tenant. Trust has to be placed in the workloads to not maliciously attack Gramine and the reusable enclave. We further suggest to strengthen the security by protecting Gramine's code with RX-only pages and reset as many data structures as possible during the reuse of an enclave. This is in addition to a regular cleanup during exec() which writes zeros to allocated memory and closes file descriptors.

Specializing the manifest and security data structures in Gramine for each application

The manifest will be provided by the payload receiver and only include its own needed resources (like trusted or protected files) and mount points. To enable the application workload to successfully run, each application needs to temporarily extend the manifest/data structures inside Gramine. We suggest to enable manifest extensions during the exec() system call which will be enabled in the manifest of the payload receiver (sys.enable_manifest_extension_on_exec) and the actual contents of the additional manifest will be send via a pseudo-system call as a file in /dev/gramine/exec_manifest_extension.

Some of the manifest options like the memory size will only be checked against the available memory, whereas others like (trusted files, and mount points) will update the internal data structures within Gramine. The extension of the data structures occurs during the exec() system call and not during the write of /dev/gramine/exec_manifest_extension.

Todos

Implement restart of enclaves in exit() by restarting the application as described in the initial manifest
Analyze, and implement enclave cleaning when restarting the enclave after an exit()
Implement manifest specialization during exec() via pseudo system call

Contributors

This work is the result of a collaboration with @vijaydhanraj @ying2liu @dimakuv @bigdata-memory and extensive discussions with customers.

kailun-qin · 2022-04-22T08:09:22Z

kailun-qin
Apr 22, 2022
Maintainer

Thanks for the proposal! Some questions below:

Do we consider this as a generic solution for the fork performance issue, or just a specific solution targeting the fork-servers kind of scenarios?
How does the design co-funciton w/ the other existing solutions for improving fork performance, e.g., pooling the enclave creation?
Is this solution targeting more general fork-servers incl. nginx etc., or just aiming for integrating w/ the cloud deployments, e.g., Teaclave, Inclavare, Confidential Containers, and Marblerun?
In the cloud deployments, the workloads are usully designed to be stateless. But with this design, the payload receiver can be long-run (or multiple runs) and sometimes stateful (e.g., to save the previous tenant to enforce only reuse enclaves when workloads come from a single tenant). This may induce some related cloud infra changes as well. Is this something we'd expect?
What's the expected behavior when there are multiple (and/or nested) forks?
Do we need to also intercept fork (to check whether it's with exec) in this solution?
After exit() -> exec(), need the restarted application (specified in entrypoint) be measured or attested once again? If not, any security implications?
How is different exit status treated?
How is the support of different combinations of sys.restart_on_exit and sys.enable_manifest_extension_on_exec considered, or they just need to be enabled together?

0 replies

vijaydhanraj · 2022-04-22T17:11:16Z

vijaydhanraj
Apr 22, 2022

Thanks @kailun-qin for the detailed questions and some of these are still being worked out. I wouldn't say this is a generic solution for fork performance but to enable cloud frameworks to use Gramine.

@ying2liu and I have been working closely with our customers to collect requirements and inputs for this design. Based on our interaction with customers, they are fine with the high-level design, but many things are still open-ended. We are currently working on a PoC and will have a better picture once we complete our PoC.

Please let us know if you find any other requirements/use cases that may help fine tune this design to support cloud framework.

0 replies

vahldiek · 2022-04-22T17:48:54Z

vahldiek
Apr 22, 2022
Author

Thank you for the great questions. Please excuse that I cannot answer all of them. This is an early proposal, but gathering feedback and questions like yours will help us to consider various aspects.

Do we consider this as a generic solution for the fork performance issue, or just a specific solution targeting the fork-servers kind of scenarios?

No, it only helps in certain scenario's were reusing of enclaves is possible functionally and from a security point of view.

How does the design co-funciton w/ the other existing solutions for improving fork performance, e.g., pooling the enclave creation?

The fork pool should be able to reuse enclaves instead of forking a new enclave everytime the fork pool would like to start a new enclave. The fork pool though has to be aware that they can reuse enclave she started earlier.

Is this solution targeting more general fork-servers incl. nginx etc., or just aiming for integrating w/ the cloud deployments, e.g., Teaclave, Inclavare, Confidential Containers, and Marblerun?

Initially we target the cloud frameworks. It is not a generic solution to forking like Nginx does and the solution does not suggest sending fork checkpoints and restoring them for reuse. This solution does not allow to emulate forking multiple times. This would require extensive additions that are currently not planned.

In the cloud deployments, the workloads are usully designed to be stateless. But with this design, the payload receiver can be long-run (or multiple runs) and sometimes stateful (e.g., to save the previous tenant to enforce only reuse enclaves when workloads come from a single tenant). This may induce some related cloud infra changes as well. Is this something we'd expect?

We're not expecting this. The payload receiver is designed to be stateless as well. Between two executions of the payload receiver (e.g., two cycles within the figure), the payload receiver will forget all its memory contents. If one needs state, some external party, e.g., the scheduler, needs to keep track of said state.

What's the expected behavior when there are multiple (and/or nested) forks?

We've not fully defined the behavior. At this point, we're mainly concerned about an application that runs a single enclave only.

Do we need to also intercept fork (to check whether it's with exec) in this solution?

I do not understand the question.

After exit() -> exec(), need the restarted application (specified in entrypoint) be measured or attested once again? If not, any security implications?

The payload receiver should perform security checks to ensure it is an allowed workload. We're not requiring a particular form of attestation and leave this part to the payload receiver and how strict it would like to be. If you do not attest the workload in any way (e.g., by attesting and validating signatures), the enclave runs arbitrary code at which point TEE guarantees are nullified and no security can be provided.

How is different exit status treated?

For now, we're planning to always restore the payload receiver despite the exit code. The highest priority is to keep the enclave alive as long as possible and start a workload.

How is the support of different combinations of sys.restart_on_exit and sys.enable_manifest_extension_on_exec considered, or they just need to be enabled together?

We've have not thought about the use of either of these features in isolation and will start to consider if it makes sense to have them independently as well. Do you have a particular use case in mind which would require either of these features independently?

Thanks!

0 replies

ying2liu · 2022-04-22T18:18:30Z

ying2liu
Apr 22, 2022

@vahldiek We met some customers this week and collected new detailed requirements. We will add some implementation in the PoC code and sync up with you when you come back.

0 replies

mkow · 2022-04-26T01:21:22Z

mkow
Apr 26, 2022
Maintainer

On a general note, this sounds super-hard to implement, resetting the whole internal Gramine state correctly is quite challenging.

@kailun-qin:
Do we need to also intercept fork (to check whether it's with exec) in this solution?

The idea here is just to recycle enclaves left by workloads which terminated (hence the new logic at exit()).

@vahldiek:
How to protect Gramine from application tampering: Multiple applications will use the same enclave and could potentially compromise the Gramine's memory, function pointers or completely take over the enclave. As a result, without further HW mechanisms, we suggest to only reuse enclaves when workloads come from a single tenant.

I'm afraid that this will end up as a heavily abused feature - there's no way from our side to ensure that the Payload Receiver allows only the interior workloads which come from the same source, and at the same time, the easiest way to implement Payload Receiver is just to allow workloads from anyone.
Any idea how to make this harder to use insecurely?

@vahldiek:
We further suggest to strengthen the security by protecting Gramine's code with RX-only pages and reset as many data structures as possible during the reuse of an enclave. This is in addition to a regular cleanup during exec() which writes zeros to allocated memory and closes file descriptors.

This is pointless, there are like 20 other trivial ways to survive such a cleanup/"protection" if your workload is malicious. This will only provide a fake sense of security and nothing more.

0 replies

dimakuv · 2022-04-26T07:43:10Z

dimakuv
Apr 26, 2022

@mkow
...resetting the whole internal Gramine state correctly is quite challenging.

Actually, isn't it what our execve() emulation supposed to do? See https://github.com/gramineproject/gramine/blob/9774ce1ca8551b868c1ca6c423dfcf1b010115d4/LibOS/shim/src/sys/shim_exec.c#L32?plain.

@mkow
I'm afraid that this will end up as a heavily abused feature - there's no way from our side to ensure that the Payload Receiver allows only the interior workloads which come from the same source, and at the same time, the easiest way to implement Payload Receiver is just to allow workloads from anyone.

Any idea how to make this harder to use insecurely?

The only way seems to put an annoying log_always message when the new manifest option is enabled :)

This is also in line with our current tactic -- we simply print "Hey, this option is insecure" for easily-abusable options like sgx.allowed_files.

@mkow
This is pointless, there are like 20 other trivial ways to survive such a cleanup/"protection" if your workload is malicious. This will only provide a fake sense of security and nothing more.

But we're not talking about malicious workloads. Anjo pointed out that a benign app may forget to clean some resources, so Gramine will do it on apps behalf. Maybe "strengthen the security" is a wrong phrase, but zeroing out all VMAs after app exit is definitely reasonable (let's call it "cleanup for stability").

0 replies

mkow · 2022-04-26T13:14:55Z

mkow
Apr 26, 2022
Maintainer

Actually, isn't it what our execve() emulation supposed to do? See https://github.com/gramineproject/gramine/blob/9774ce1ca8551b868c1ca6c423dfcf1b010115d4/LibOS/shim/src/sys/shim_exec.c#L32?plain.

I think it's more than this, because you also need to e.g. reset all FDs. But maybe you are right and it isn't actually that complex? I forgot that we already have a part of this logic implemented.

But we're not talking about malicious workloads. Anjo pointed out that a benign app may forget to clean some resources, so Gramine will do it on apps behalf. Maybe "strengthen the security" is a wrong phrase

Yeah, but he explicitly said that this is for security and I think this was the intended meaning - look at the part about making Gramine code read-only.

zeroing out all VMAs after app exit is definitely reasonable (let's call it "cleanup for stability").

Not really, doesn't mmap() do this anyways?

0 replies

boryspoplawski · 2022-04-26T14:25:44Z

boryspoplawski
Apr 26, 2022
Collaborator

Any idea how to make this harder to use insecurely?

Maybe we could only allow running these apps from Protected Files (and disallow any key manipulation in the window between apps - if that's even possible). This way the new app could only be provided by the same entity as the old app (because the keys must have been provisioned while the old app was running).

0 replies

mythi · 2022-05-10T11:52:00Z

mythi
May 10, 2022

(I might side-track the conversation on reusing enclaves but I wanted to add more context around "cloud frameworks")

Existing (insufficient) solutions

this section gives me the impression that the main problem is just the startup time due to enclave creation but it does not talk about things like application packaging. For instance, existing containers need to be customized using tools like gsc and that's a user experience challenge.

Furthermore, these "static" graminized containers lead to another challenge with "cloud frameworks" such as Kubernetes: the workload owners can add additional mounts (like storage, configs, secrets) that Gramine (manifest) would have to be able to take into account. Similarly, the workload owners will specify resource requests/limits, including how much EPC they request. The latter is void today but maybe one day we'll have per app/container EPC usage limits enforced by cgroups...

In the cloud deployments, the workloads are usully designed to be stateless. But with this design, the payload receiver can
be long-run (or multiple runs) and sometimes stateful (e.g., to save the previous tenant to enforce only reuse enclaves when
workloads come from a single tenant). This may induce some related cloud infra changes as well. Is this something we'd expect?

We're not expecting this. The payload receiver is designed to be stateless as well. Between two executions of the payload
receiver (e.g., two cycles within the figure), the payload receiver will forget all its memory contents. If one needs state, some
external party, e.g., the scheduler, needs to keep track of said state.

Container apps are mortal and guarded by their namespaces/cgroups that are container specific. I agree with @kailun-qin this design with the long-lived payload receiver will have implications to surrounding components.

0 replies

mkow · 2022-05-10T12:12:17Z

mkow
May 10, 2022
Maintainer

Similarly, the workload owners will specify resource requests/limits, including how much EPC they request. The latter is void today but maybe one day we'll have per app/container EPC usage limits enforced by cgroups...

I don't understand this part. EPC size/limit is not specified anywhere in the manifest (which you mention in that paragraph) and can't really be controlled from usermode (it's just a transparent cache).

0 replies

mythi · 2022-05-10T14:02:57Z

mythi
May 10, 2022

Similarly, the workload owners will specify resource requests/limits, including how much EPC they request. The latter is void today but maybe one day we'll have per app/container EPC usage limits enforced by cgroups...

I don't understand this part. EPC size/limit is not specified anywhere in the manifest (which you mention in that paragraph) and can't really be controlled from usermode (it's just a transparent cache).

That's what my comment tried to say too. The limits are not controlled today but afaik the cgroups is in plans. Maybe one day each container will get their own fair share of EPC based on the size request they've specified. Does this proposal support that use-case and how the quota of this payload receiver enclave is determined?

0 replies

mkow · 2022-05-11T11:49:51Z

mkow
May 11, 2022
Maintainer

That's what my comment tried to say too. The limits are not controlled today but afaik the cgroups is in plans. Maybe one day each container will get their own fair share of EPC based on the size request they've specified.

This would still be untrusted, so we could make it controllable from gramine cmdline instead of manifest (as it doesn't have to be measured).

Does this proposal support that use-case and how the quota of this payload receiver enclave is determined?

If you want to have different payloads to have different host caching configs then it would be doable, but would complicate things, so I don't think we'll add this (especially that you can't control this from cgroups yet). We'd need to add an ocall to request reconfiguration of the current policy. Not too hard, but probably something for later.
Also, doing it this way wouldn't be trusted for the host anyways - it's hard to do in a way where the host would know the requested sizes upfront, so, it may get a surprise from an enclave requesting all available EPC for itself :)

0 replies

mkow · 2023-04-09T19:54:52Z

mkow
Apr 9, 2023
Maintainer

Do we have any plans to work on this? This proposal seems complicated and quite risky in terms of security, and at the same time with the introduction of EDMM the enclave loading time isn't such a big issue as it was in the past.

0 replies

dimakuv · 2023-04-11T07:38:22Z

dimakuv
Apr 11, 2023

I don't think any of us has plans to work on this in the near future. (Also, I'd like to mention #430 which is relevant, since it also has the idea of a re-use of enclaves.)

We could move this to Discussions -> Ideas?

0 replies

mkow · 2023-04-11T12:13:33Z

mkow
Apr 11, 2023
Maintainer

We could move this to Discussions -> Ideas?

Sounds good, will do.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiently Supporting Cloud Frameworks by reusing enclaves and specializing manifests #1280

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 15 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Efficiently Supporting Cloud Frameworks by reusing enclaves and specializing manifests #1280

vahldiek Apr 21, 2022

Description of the problem

Existing (insufficient) solutions

Making fork-servers more efficient with enclave reuse

Security implications of reusing enclaves

Specializing the manifest and security data structures in Gramine for each application

Todos

Contributors

Replies: 15 comments

kailun-qin Apr 22, 2022 Maintainer

vijaydhanraj Apr 22, 2022

vahldiek Apr 22, 2022 Author

ying2liu Apr 22, 2022

mkow Apr 26, 2022 Maintainer

dimakuv Apr 26, 2022

mkow Apr 26, 2022 Maintainer

boryspoplawski Apr 26, 2022 Collaborator

mythi May 10, 2022

mkow May 10, 2022 Maintainer

mythi May 10, 2022

mkow May 11, 2022 Maintainer

mkow Apr 9, 2023 Maintainer

dimakuv Apr 11, 2023

mkow Apr 11, 2023 Maintainer

vahldiek
Apr 21, 2022

kailun-qin
Apr 22, 2022
Maintainer

vijaydhanraj
Apr 22, 2022

vahldiek
Apr 22, 2022
Author

ying2liu
Apr 22, 2022

mkow
Apr 26, 2022
Maintainer

dimakuv
Apr 26, 2022

mkow
Apr 26, 2022
Maintainer

boryspoplawski
Apr 26, 2022
Collaborator

mythi
May 10, 2022

mkow
May 10, 2022
Maintainer

mythi
May 10, 2022

mkow
May 11, 2022
Maintainer

mkow
Apr 9, 2023
Maintainer

dimakuv
Apr 11, 2023

mkow
Apr 11, 2023
Maintainer