-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LibOS does not know about PAL allocating pages #50
Comments
I ran into the same problem when playing with manifests of 50,000 trusted files. My application depleted the pre-allocated 64MB, such that internal PAL pages started to "eat" into the heap shared with LibOS. This led to LibOS panicking if it accidentally allocated pages at same addresses. The (hacky) solution that works for me now is here: gramineproject/graphene#1363. Basically, it does this: --- a/Pal/src/host/Linux-SGX/db_main.c
+++ b/Pal/src/host/Linux-SGX/db_main.c
@@ -63,6 +63,16 @@ void _DkGetAvailableUserAddressRange (PAL_PTR * start, PAL_PTR * end,
{
*start = (PAL_PTR) pal_sec.heap_min;
*end = (PAL_PTR) get_enclave_pages(NULL, g_page_size, /*is_pal_internal=*/false);
+
+ /* FIXME: hack to keep some heap for internal PAL objects allocated at runtime (recall that
+ * LibOS does not keep track of PAL memory, so without this hack it could overwrite internal
+ * PAL memory). This hack is probabilistic and brittle. */
+ *end = SATURATED_P_SUB(*end, 2 * 1024 * g_page_size, *start); /* 8MB reserved for PAL stuff */
+ if (*end <= *start) {
+ SGX_DBG(DBG_E, "Not enough enclave memory, please increase enclave size!\n");
+ ocall_exit(1, /*is_exitgroup=*/true);
+ }
+
*hole_start = SATURATED_P_SUB(pal_sec.exec_addr, MEMORY_GAP, *start);
*hole_end = SATURATED_P_ADD(pal_sec.exec_addr + pal_sec.exec_size, MEMORY_GAP, *end);
} |
@boryspoplawski Are you working on a comprehensive fix for this issue? |
Not at the moment, I just thought that rewriting vma bookkeeping in LibOS (which we need to do) and adding support for this fits well together. |
I just stumbled into this issue again on non-sgx Graphene with a PyTorch workload with massive numbers of threads (created and destroyed multiple times). There are some leaks in Graphene that resulted in 64MB of pre-allocated PAL space ( The culprit is this: https://github.com/oscarlab/graphene/blob/master/Pal/src/slab.c#L55 For future reference, what happens in PAL is smth like this: |
We circumvent this by having |
This was resolved by #839 and co, closing. |
Currently when PAL allocates a page (for its internal purposes e.g. memory for event objects) LibOS has no idea about it. This sometimes leads to LibOS allocating a page on the same address and lies of Man ALL IS LOST ALL IS LOST the pon̷y he comes he c̶̮om
es he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*gle̠̅s ͎a̧͈͖re not reaͨl ZALGΌ IS TOƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ HE COMES(source)
The text was updated successfully, but these errors were encountered: