Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

"Out-of-memory in library OS" when creating and destroying a lot of threads #969

Closed
lejunzhu opened this issue Sep 3, 2019 · 5 comments · Fixed by #1949
Closed

"Out-of-memory in library OS" when creating and destroying a lot of threads #969

lejunzhu opened this issue Sep 3, 2019 · 5 comments · Fixed by #1949
Assignees

Comments

@lejunzhu
Copy link
Contributor

lejunzhu commented Sep 3, 2019

I'm using Graphene SGX latest code on main branch. I ran a C program that simply creates a thread, destroy it, then going on to create another one. After such loop is repeated 6445 times, the process dies with "Out-of-memory in library OS"

The C program I use is:

#include <pthread.h>
#include <stdio.h>
int i = 0;
void *threadproc(void *p)
{
        fprintf(stdout, "%d\n", i);
        fflush(stdout);
        i++;
        return NULL;
}
int main()
{
        pthread_t t;
        int j;
        for (j = 0; j < 10000; ++j) {
                while (i != j) {}
                pthread_create(&t, NULL, threadproc, NULL);
                pthread_join(t, NULL);
        }
        printf("Done\n");
        return 0;
}

The inline debug log is attached.
log.txt

@mkow
Copy link
Member

mkow commented Sep 3, 2019

Thanks for including the repro code! Unfortunately I'm not surprised by the outcome, our codebase is plagued by memory leaks.
Could you check whether it also fails without SGX? (i.e. on Linux PAL) Does it also fail if you remove fprintf and fflush from the thread function?

@lejunzhu
Copy link
Contributor Author

lejunzhu commented Sep 3, 2019

Same thing happens when I remove fprintf and fflush.
Non SGX version can complete 10,000 loops, but the memory is much larger also, so I'm not sure if I hit a memory leak or not.

@yamahata
Copy link
Contributor

yamahata commented Oct 8, 2019

#1030 in Pal/Linux case:
Now PAL/Linux stack and tcb is allocated/freed by malloc/free.
On the other hand, free(in PAL stack case) in pal is actually nop!
See __free() in Pal/src/slab.c.

In Pal/Linux-SGX case, I'm not sure.
Can you please enable SLAB_DEBUG (see slabmgr.h) and track down the source of memory use?
(or other thing?)

@dimakuv
Copy link
Contributor

dimakuv commented Dec 5, 2019

With #1199, I was able to surpass >30,000 threads under Graphene-SGX and then I got bored and killed the process. This PR removes the shim_thread leak.

But I still observe leaks of some other objects. Maybe it's PAL_HANDLEs? I didn't look into PAL logic.

@boryspoplawski
Copy link
Contributor

On the current master I'm hitting OOM (256MB enclave) ater ~5800 threads (with the exact same code as above). When I add a wait(NULL) call after pthread_join (bug #1068) I hit Internal memory fault after ~15 threads in SGX and 50% of times without SGX.
Summary: I don't think this got any better.
I'm currently trying to fix some parts of threading in Graphene, hopefully that will help (or at least reduce leaks and crashes).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants