LibOS does not know about PAL allocating pages #50

boryspoplawski · 2019-10-08T17:22:39Z

Currently when PAL allocates a page (for its internal purposes e.g. memory for event objects) LibOS has no idea about it. This sometimes leads to LibOS allocating a page on the same address and lies of Man ALL IS LOST ALL IS LOST the pon̷y he comes he c̶̮om~~es he come~~s the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*gle̠̅s ͎a̧͈͖re not reaͨl ZALGΌ IS TOƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ HE COMES
(source)

dimakuv · 2020-03-05T02:15:26Z

I ran into the same problem when playing with manifests of 50,000 trusted files. My application depleted the pre-allocated 64MB, such that internal PAL pages started to "eat" into the heap shared with LibOS. This led to LibOS panicking if it accidentally allocated pages at same addresses.

The (hacky) solution that works for me now is here: gramineproject/graphene#1363. Basically, it does this:

--- a/Pal/src/host/Linux-SGX/db_main.c
+++ b/Pal/src/host/Linux-SGX/db_main.c
@@ -63,6 +63,16 @@ void _DkGetAvailableUserAddressRange (PAL_PTR * start, PAL_PTR * end,
 {
     *start = (PAL_PTR) pal_sec.heap_min;
     *end = (PAL_PTR) get_enclave_pages(NULL, g_page_size, /*is_pal_internal=*/false);
+
+    /* FIXME: hack to keep some heap for internal PAL objects allocated at runtime (recall that
+     * LibOS does not keep track of PAL memory, so without this hack it could overwrite internal
+     * PAL memory). This hack is probabilistic and brittle. */
+    *end = SATURATED_P_SUB(*end, 2 * 1024 * g_page_size, *start);  /* 8MB reserved for PAL stuff */
+    if (*end <= *start) {
+        SGX_DBG(DBG_E, "Not enough enclave memory, please increase enclave size!\n");
+        ocall_exit(1, /*is_exitgroup=*/true);
+    }
+
     *hole_start = SATURATED_P_SUB(pal_sec.exec_addr, MEMORY_GAP, *start);
     *hole_end = SATURATED_P_ADD(pal_sec.exec_addr + pal_sec.exec_size, MEMORY_GAP, *end);
 }

dimakuv · 2020-03-05T02:16:03Z

@boryspoplawski Are you working on a comprehensive fix for this issue?

boryspoplawski · 2020-03-05T08:26:01Z

Not at the moment, I just thought that rewriting vma bookkeeping in LibOS (which we need to do) and adding support for this fits well together.

dimakuv · 2020-08-18T22:33:46Z

I just stumbled into this issue again on non-sgx Graphene with a PyTorch workload with massive numbers of threads (created and destroyed multiple times). There are some leaks in Graphene that resulted in 64MB of pre-allocated PAL space (g_mem_pool) being depleted, and PAL started allocating memory for things like PAL_HANDLE event using host-level mmap (see below). At the same time, LibOS allocated memory at the same addresses for things like shim_thread. This led to memory corruptions and funny side effects.

The culprit is this: https://github.com/oscarlab/graphene/blob/master/Pal/src/slab.c#L55

For future reference, what happens in PAL is smth like this: malloc(pal handle) -> slab_alloc() -> not enough space in slab-manager buffers, need to enlarge_slab_mgr() -> system_malloc() -> __malloc() -> out of pre-allocated 64MB so fall back to _DkVirtualMemoryAlloc() -> host level mmap().

dimakuv · 2021-07-14T14:48:18Z

We circumvent this by having loader.pal_internal_mem_size manifest option -- both our PALs allocate this amount of memory at startup and inform the LibOS to never use this PAL-allocated memory. This is just a workaround because it is static, not dynamic. But it works Okish for now. So keeping this issue but with very low priority.

dimakuv · 2023-03-09T14:24:04Z

This was resolved by #839 and co, closing.

mkow transferred this issue from gramineproject/graphene Sep 15, 2021

mkow added enhancement New feature or request P: 2 labels Sep 15, 2021

boryspoplawski mentioned this issue Jul 11, 2022

[RFC] PAL memory bookkeeping #741

Closed

dimakuv closed this as completed Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LibOS does not know about PAL allocating pages #50

LibOS does not know about PAL allocating pages #50

boryspoplawski commented Oct 8, 2019

dimakuv commented Mar 5, 2020

dimakuv commented Mar 5, 2020

boryspoplawski commented Mar 5, 2020

dimakuv commented Aug 18, 2020

dimakuv commented Jul 14, 2021

dimakuv commented Mar 9, 2023

LibOS does not know about PAL allocating pages #50

LibOS does not know about PAL allocating pages #50

Comments

boryspoplawski commented Oct 8, 2019

dimakuv commented Mar 5, 2020

dimakuv commented Mar 5, 2020

boryspoplawski commented Mar 5, 2020

dimakuv commented Aug 18, 2020

dimakuv commented Jul 14, 2021

dimakuv commented Mar 9, 2023