Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pal/Linux-SGX] Pre-allocate more PAL memory on huge manifest files #26

Merged
merged 1 commit into from
Sep 16, 2021

Conversation

dimakuv
Copy link

@dimakuv dimakuv commented Sep 10, 2021

Description of the changes

The tomlc99 library to parse TOML files uses a lot of memory during toml_parse(). Sometimes apps have very large manifest files (>10MB), and parsing these manifest files requires more than 64MB of preallocated memory pool in Linux-SGX PAL.

Since this function is called before PAL can read loader.pal_internal_mem_size, there is no way to learn the additional memory size for PAL from the manifest. Thus we use a heuristic to pre-allocate some PAL memory before toml_parse().

I understand that this fix is... dubious, but couldn't come up with something better.

Fixes gramineproject/graphene#2680

How to test this PR?

Try with huge manifest files. E.g., I did this in our Bash example:

diff --git a/Examples/bash/manifest.template b/Examples/bash/manifest.template
+# loader.pal_internal_mem_size = "128M"
+sgx.trusted_files.bla = "file:/home/dimakuv/graphene"

This resulted in an 18MB bash.manifest.sgx file. Without this PR, Graphene fails at toml_parse() step:

$ graphene-sgx bash -c ls
error: *** Out-of-memory in PAL (try increasing `loader.pal_internal_mem_size`) ***

(Increasing loader.pal_internal_mem_size doesn't help because toml_parse() fails before being able to read this manifest option.)

With this PR and without loader.pal_internal_mem_size = "128M", Graphene correctly complains:

$ graphene-sgx bash -c ls
error: Too small `loader.pal_internal_mem_size`, need at least 134217728 bytes

Finally, with this PR and with loader.pal_internal_mem_size = "128M", Graphene works:

$ graphene-sgx bash -c ls
Makefile
README.md
bash.manifest
... and so on ...

This change is Reviewable

Copy link
Contributor

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


Pal/src/host/Linux-SGX/db_main.c, line 686 at r1 (raw file):

     *
     * FIXME: this is a quick hack, we need proper memory allocation in PAL. */
    if (manifest_size > 10 * 1024 * 1024) {

log_warning about that? "This is a huge manifest, preallocating 128MB of internal memory"


Pal/src/host/Linux-SGX/db_main.c, line 723 at r1 (raw file):

    if (pal_internal_mem_size < g_pal_internal_mem_size) {
        log_error("Too small `loader.pal_internal_mem_size`, need at least %lu bytes",

This will be confusing for the user. How about something like "Too small loader.pal_internal_mem_size, need at least %lu bytes because the manifest is large"?

Also, maybe add %lu bytes (%luMB) since the user is likely to use megabytes in the manifest?

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)


Pal/src/host/Linux-SGX/db_main.c, line 686 at r1 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

log_warning about that? "This is a huge manifest, preallocating 128MB of internal memory"

Done. Used log_always because log_warning will not be output (default log level is ERROR, and we didn't yet read the manifest options), and because log_error doesn't fit here logically.


Pal/src/host/Linux-SGX/db_main.c, line 723 at r1 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

This will be confusing for the user. How about something like "Too small loader.pal_internal_mem_size, need at least %lu bytes because the manifest is large"?

Also, maybe add %lu bytes (%luMB) since the user is likely to use megabytes in the manifest?

Done.

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)

a discussion (no related file):
FYI: I accidentally commited my local changes to bash/manifest.template in one of the previous fixup commits. Removed it now.


Copy link
Contributor

@jinengandhi-intel jinengandhi-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @pwmarcz)

a discussion (no related file):
We went ahead and tested the PR where like Dmitrii mentioned it prompts the user to add the loader.pal_internal_mem_size relevant for the workload in the manifest file and the test passes post that.


pwmarcz
pwmarcz previously approved these changes Sep 14, 2021
Copy link
Contributor

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I went through your testing procedure manually and it works for me.

Reviewed 1 of 2 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I request a review from @boryspoplawski or @mkow ? I wonder if they maybe come up with something better than this hack...

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners (waiting on @jinengandhi-intel)

a discussion (no related file):

Previously, jinengandhi-intel wrote…

We went ahead and tested the PR where like Dmitrii mentioned it prompts the user to add the loader.pal_internal_mem_size relevant for the workload in the manifest file and the test passes post that.

Thanks for this!


Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you meant 18MB not GB :)

Reviewed 1 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @jinengandhi-intel)


Pal/src/host/Linux-SGX/db_main.c, line 724 at r3 (raw file):

    }

    if (pal_internal_mem_size < g_pal_internal_mem_size) {

So if we do not specify loader.pal_internal_mem_size (default is 0) and the manifest is big, this will trigger and exit. Is that intended?


Pal/src/host/Linux-SGX/db_main.c, line 729 at r3 (raw file):

        ocall_exit(1, true);
    }
    g_pal_internal_mem_size = pal_internal_mem_size;

How does it work? In normal/default case this will be 0 (just like in the old version).

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look that bad. A bit hacky, but I don't see a better way, without major rework of manifest handling.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @jinengandhi-intel)

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the description in the PR. Sure it is MB.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @jinengandhi-intel)


Pal/src/host/Linux-SGX/db_main.c, line 724 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

So if we do not specify loader.pal_internal_mem_size (default is 0) and the manifest is big, this will trigger and exit. Is that intended?

This will not trigger. Because loader.pal_internal_mem_size == g_pal_internal_mem_size == 0, and this IF clause is a strict <


Pal/src/host/Linux-SGX/db_main.c, line 729 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

How does it work? In normal/default case this will be 0 (just like in the old version).

That was exactly my intention -- for normal cases (not-too-big manifest), the behavior is like it was before.

Is your question -- how do we have any PAL internal memory if g_pal_internal_mem_size == 0? This is because this g_pal_internal_mem_size variable denotes the dynamically allocated PAL memory. And we always have the additional *statically allocated 64MB of memory in the form of a BSS global variable g_mem_pool.

So the default behavior (when loader.pal_internal_mem_size == 0) is to only use 64MB of g_mem_pool. This is even described in our ReadTheDocs manifest-syntax page.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @jinengandhi-intel)


Pal/src/host/Linux-SGX/db_main.c, line 724 at r3 (raw file):

and the manifest is big

By this I meant g_pal_internal_mem_size > 0 (and default loader.pal_internal_mem_size == 0 )


Pal/src/host/Linux-SGX/db_main.c, line 729 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

That was exactly my intention -- for normal cases (not-too-big manifest), the behavior is like it was before.

Is your question -- how do we have any PAL internal memory if g_pal_internal_mem_size == 0? This is because this g_pal_internal_mem_size variable denotes the dynamically allocated PAL memory. And we always have the additional *statically allocated 64MB of memory in the form of a BSS global variable g_mem_pool.

So the default behavior (when loader.pal_internal_mem_size == 0) is to only use 64MB of g_mem_pool. This is even described in our ReadTheDocs manifest-syntax page.

Thanks for explanation

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @jinengandhi-intel)


Pal/src/host/Linux-SGX/db_main.c, line 724 at r3 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

and the manifest is big

By this I meant g_pal_internal_mem_size > 0 (and default loader.pal_internal_mem_size == 0 )

Ah, sorry, misunderstood the first time.

Yes, this is intended -- your scenario would mean that the user fed a big manifest to Gramine, and Gramine decided to allocate 64/128 MB of additional PAL memory. Since this "allocation of additional PAL memory" is controlled by the loader.pal_internal_mem_size manifest option, Gramine cannot just silently continue working. Otherwise what does it mean? The user will think that Gramine is not using additional PAL memory, and if the user will decide later at some point to set loader.pal_internal_mem_size = 32M, how should Gramine react?

To remove this ambiguity, I added this check and I force the user to specify loader.pal_internal_mem_size explicitly.

boryspoplawski
boryspoplawski previously approved these changes Sep 15, 2021
Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, "fixup! " found in commit messages' one-liners (waiting on @jinengandhi-intel)


Pal/src/host/Linux-SGX/db_main.c, line 724 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ah, sorry, misunderstood the first time.

Yes, this is intended -- your scenario would mean that the user fed a big manifest to Gramine, and Gramine decided to allocate 64/128 MB of additional PAL memory. Since this "allocation of additional PAL memory" is controlled by the loader.pal_internal_mem_size manifest option, Gramine cannot just silently continue working. Otherwise what does it mean? The user will think that Gramine is not using additional PAL memory, and if the user will decide later at some point to set loader.pal_internal_mem_size = 32M, how should Gramine react?

To remove this ambiguity, I added this check and I force the user to specify loader.pal_internal_mem_size explicitly.

Sounds good

The tomlc99 library to parse TOML files uses a lot of memory during
`toml_parse()`. Sometimes apps have very large manifest files (>10MB),
and parsing these manifest files requires more than 128MB of
preallocated memory pool in Linux-SGX PAL.

Since this function is called before PAL can read
`loader.pal_internal_mem_size`, there is no way to learn the additional
memory size for PAL from the manifest. Thus we use a heuristic to
pre-allocate some PAL memory before `toml_parse()`.

Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
@dimakuv dimakuv dismissed stale reviews from boryspoplawski and pwmarcz via f52e492 September 16, 2021 09:37
@dimakuv dimakuv force-pushed the dimakuv/prealloc-more-pal-mem-on-manifest branch from fbbd9af to f52e492 Compare September 16, 2021 09:37
Copy link
Contributor

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jinengandhi-intel)

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jinengandhi-intel)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r2, 1 of 1 files at r4, all commit messages.
Dismissed @jinengandhi-intel from a discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Thanks for this!

@jinengandhi-intel: You put a blocking comment, but from its contents it seems that wasn't intended, so I'm dismissing it.


@mkow mkow merged commit f52e492 into master Sep 16, 2021
@mkow mkow deleted the dimakuv/prealloc-more-pal-mem-on-manifest branch September 16, 2021 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants