Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LibOS does not know about PAL allocating pages #50

Closed
boryspoplawski opened this issue Oct 8, 2019 · 6 comments
Closed

LibOS does not know about PAL allocating pages #50

boryspoplawski opened this issue Oct 8, 2019 · 6 comments
Labels
enhancement New feature or request P: 2

Comments

@boryspoplawski
Copy link
Contributor

Currently when PAL allocates a page (for its internal purposes e.g. memory for event objects) LibOS has no idea about it. This sometimes leads to LibOS allocating a page on the same address and lie​s of Man ALL IS LOST ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*​gle̠̅s ͎a̧͈͖re n​ot reaͨl ZALGΌ IS TOƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ HE COMES
(source)

@dimakuv
Copy link

dimakuv commented Mar 5, 2020

I ran into the same problem when playing with manifests of 50,000 trusted files. My application depleted the pre-allocated 64MB, such that internal PAL pages started to "eat" into the heap shared with LibOS. This led to LibOS panicking if it accidentally allocated pages at same addresses.

The (hacky) solution that works for me now is here: gramineproject/graphene#1363. Basically, it does this:

--- a/Pal/src/host/Linux-SGX/db_main.c
+++ b/Pal/src/host/Linux-SGX/db_main.c
@@ -63,6 +63,16 @@ void _DkGetAvailableUserAddressRange (PAL_PTR * start, PAL_PTR * end,
 {
     *start = (PAL_PTR) pal_sec.heap_min;
     *end = (PAL_PTR) get_enclave_pages(NULL, g_page_size, /*is_pal_internal=*/false);
+
+    /* FIXME: hack to keep some heap for internal PAL objects allocated at runtime (recall that
+     * LibOS does not keep track of PAL memory, so without this hack it could overwrite internal
+     * PAL memory). This hack is probabilistic and brittle. */
+    *end = SATURATED_P_SUB(*end, 2 * 1024 * g_page_size, *start);  /* 8MB reserved for PAL stuff */
+    if (*end <= *start) {
+        SGX_DBG(DBG_E, "Not enough enclave memory, please increase enclave size!\n");
+        ocall_exit(1, /*is_exitgroup=*/true);
+    }
+
     *hole_start = SATURATED_P_SUB(pal_sec.exec_addr, MEMORY_GAP, *start);
     *hole_end = SATURATED_P_ADD(pal_sec.exec_addr + pal_sec.exec_size, MEMORY_GAP, *end);
 }

@dimakuv
Copy link

dimakuv commented Mar 5, 2020

@boryspoplawski Are you working on a comprehensive fix for this issue?

@boryspoplawski
Copy link
Contributor Author

Not at the moment, I just thought that rewriting vma bookkeeping in LibOS (which we need to do) and adding support for this fits well together.

@dimakuv
Copy link

dimakuv commented Aug 18, 2020

I just stumbled into this issue again on non-sgx Graphene with a PyTorch workload with massive numbers of threads (created and destroyed multiple times). There are some leaks in Graphene that resulted in 64MB of pre-allocated PAL space (g_mem_pool) being depleted, and PAL started allocating memory for things like PAL_HANDLE event using host-level mmap (see below). At the same time, LibOS allocated memory at the same addresses for things like shim_thread. This led to memory corruptions and funny side effects.

The culprit is this: https://github.com/oscarlab/graphene/blob/master/Pal/src/slab.c#L55

For future reference, what happens in PAL is smth like this: malloc(pal handle) -> slab_alloc() -> not enough space in slab-manager buffers, need to enlarge_slab_mgr() -> system_malloc() -> __malloc() -> out of pre-allocated 64MB so fall back to _DkVirtualMemoryAlloc() -> host level mmap().

@dimakuv
Copy link

dimakuv commented Jul 14, 2021

We circumvent this by having loader.pal_internal_mem_size manifest option -- both our PALs allocate this amount of memory at startup and inform the LibOS to never use this PAL-allocated memory. This is just a workaround because it is static, not dynamic. But it works Okish for now. So keeping this issue but with very low priority.

@dimakuv
Copy link

dimakuv commented Mar 9, 2023

This was resolved by #839 and co, closing.

@dimakuv dimakuv closed this as completed Mar 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P: 2
Projects
None yet
Development

No branches or pull requests

3 participants