-
Notifications
You must be signed in to change notification settings - Fork 261
Workloads (Redis, Curl, R) failing with Out of memory PAL error after new manifest syntax to define lists of SGX trusted files. #2680
Comments
@jinengandhi-intel Could you attach a I assume that the difference is in how this final manifest file is generated on Ubuntu vs on RHEL/CentOS. |
@jinengandhi-intel Also could you attach I want to look at two versions of this |
Manifest SGX file for Ubuntu are attached here. For RHEL, I am awaiting the manifest files from my colleague |
Please find the manifest.sgx files for RHEL attached. |
RHEL manifest files are ~10MB in size... This feels like way too much for the initial 64MB pre-allocated by Graphene. @aniket-intelx @jinengandhi-intel Can any of you run the failing workload (e.g., |
@dimakuv @aniket-intelx will be sharing the rest of the details soon |
We debugged and the problem is in Graphene's pre-allocated internal PAL memory pool of 64MB. We fail on graphene/Pal/src/host/Linux-SGX/db_main.c Line 683 in 33a68bc
But we read graphene/Pal/src/host/Linux-SGX/db_main.c Line 703 in 33a68bc
So we get a chicken-and-egg problem. Easy solution: if we detect that the manifest size is greater than some threshold (I recommend 1MB), then we immediately increase internal PAL memory by additional 64MB. |
Description of the problem
On some systems (tried with RHEL, CentOS servers) we are seeing a regression with some of the workloads mentioned in the bug title. Not seeing the same issue on Ubuntu client as well as servers. This is a regression that was introduced with the recent commit:
Define SGX allowed/trusted/protected files as TOML arrays
ddc01ba
We have tried changing the loader.pal_internal_mem_size to as high as 16G but the test still continues to fail.
Logs for the same are attached to the report here.
R_example_trace_log_RHEL.txt
redis_trace_log_RHEL.txt
Curl_trace_log_RHEL.txt
Steps to reproduce
Take a SGX enabled, build Graphene and run any of the above workloads.
Expected results
Workloads should PASS.
Actual results
The text was updated successfully, but these errors were encountered: