Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc thread_local destructor cause use after free #2519

Open
sbabbi opened this issue May 25, 2017 · 26 comments
Open

gcc thread_local destructor cause use after free #2519

sbabbi opened this issue May 25, 2017 · 26 comments

Comments

@sbabbi
Copy link

sbabbi commented May 25, 2017

Hello,

I posted this issue on the gcc bug tracker, but I am not entirely sure if this is supposed to be a GCC bug or a mingw bug.

I see it has also been reported here, however at the moment I can not create a sourceforge account to comment on that. I apologize if this is not the proper place.


The issue here, to my best understanding, is the following.

  • At the very first access of a thread-local object in a new program, emutls_init is called via __emutls_get_address. This function allocates some memory for the storage using malloc (see emutls_alloc), and registers emutls_destroy to be executed at thread exit, using __gthread_key_create.

  • Immediately after, the TLS init function (generated by GCC) is called. This calls the C++ constructor on the storage provided at the previous point. It also uses __gthread_key_create to register the C++ destructor for execution at thread exit.

  • When a thread terminates, _pthread_cleanup_dest in winpthread is called. This function is responsible for calling all the destructors registered with pthread_key_create. Unfortunately, the execution order of such destructors is unspecified. In this case, it happens that emutls_destroy is called before the C++ destructor, which then executes on deallocated memory.


A quick workaround would be to reverse the loop in winpthread/src/thread.c:842. This will cause the destructors to be called in reverse order, and it will probably work correctly in most of the cases.
The only other solution would be to merge emutls+TLS init on GCC side, in such a way to have a single destructor callback to both invoke the C++ destructor and cleanup the memory.

@mati865
Copy link
Collaborator

mati865 commented May 25, 2017

CC @lhmouse

@lhmouse
Copy link
Contributor

lhmouse commented May 25, 2017

Yes I knew this bug a couple of years ago... but who the heck has been caring about this problem?

The emutls thing is utterly broken. I happened to report a similar issue a few hours ago (if you LoadLibrary() a DLL and the DLL creates some thread_local objects, then FreeLibrary()'ing that DLL is very likely to lead to a crash), not to mention that __emutls_get_address() aborts the process once it runs out of memory.

So here comes the rule of thumb: Don't use thread_local on GCC for MinGW targets.

@sbabbi
Copy link
Author

sbabbi commented May 26, 2017

Well.. pretty sad. I might post a patch to winpthread to workaround this specific issue.

@yeputons
Copy link

My understanding is slightly different:

  1. On thread_local variable creation GCC calls __emutls_get_address, which calls __gthread_key_create, which presumably calls pthread_key_create from winpthreads by MinGW. It adds corresponding "destructor" (emutls_destroy which deallocates memory) into an array of destructors. At the moment, logic is as follows: try inserting at _pthread_key_sch if available, go towards the end of the array, then wrap, then reallocate if the array is filled. However, it looks _pthread_key_sch is the smallest available place in the array.
  2. After completing the constructor successfully GCC calls __cxa_thread_atexit to register destructor. It's implemented in mingw-w64-crt (not winpthreads), it calls __mingw_cxa_thread_atexit, which adds destructor to a separate TLS no. tls_dtors_slot.
  3. When std::thread finishes, it returns control to pthread_create_wrapper from winpthreads, it's the routine which actually runs the thread and is passed _beginthreadex. This routine calls _pthread_cleanup_dest in the thread before terminating with _endthreadex. This in turn calls emutls_destroy and memory is deallocated.
  4. Now that _endthreadex is called and the system knows for sure the thread is terminating, tls_callback from mingw-w64-crt is called with DLL_THREAD_DETACH and runs destructors from bullet 2 above.

So, the core issue looks like there are two separate mechanisms for calling something at thread exit: pthread_key_create (which calls destructors before pthread-emulated thread is terminated) and __cxa_thread_atexit (which calls destructors when thread is actually terminating). Memory is allocated through the former, destructors are called through the latter, hence the use-after-free.

@yeputons
Copy link

yeputons commented Apr 28, 2021

Moreover, it looks like on Linux (at the very least) pthread_key_create destructors are called strictly after __cxa_thread_atexit, and on Windows it happens the other way around. Here is a sample code to test it:

#include <stdio.h>
#include <pthread.h>
#include <cxxabi.h>

int a, b, c, d, e;
extern int __dso_handle;

void mydestr_atexit(void *arg) {
    printf("mydestr_atexit(%p)\n", arg);
}

void mydestr_pthread_key(void *arg) {
    printf("mydestr_pthread_key(%p)\n", arg);
}

void* thread(void *arg) {
    printf("thread %p\n", arg);
    abi::__cxa_thread_atexit(mydestr_atexit, &b, &__dso_handle);

    {
        pthread_key_t key;
        pthread_key_create(&key, mydestr_pthread_key);
        pthread_setspecific(key, &c);
    }

    abi::__cxa_thread_atexit(mydestr_atexit, &d, &__dso_handle);

    {
        pthread_key_t key;
        pthread_key_create(&key, mydestr_pthread_key);
        pthread_setspecific(key, &e);
    }
    return nullptr;
}

int main() {
    printf("a=%p\nb=%p\nc=%p\nd=%p\ne=%p\n", &a, &b, &c, &d, &e);
    pthread_t th;
    pthread_create(&th, nullptr, thread, &a);
    pthread_join(th, nullptr);
}

Moreover, it looks like destructors by pthread_key can easily be called in a wrong order: from oldest keys to newest. So it feels kinda wrong to use it for TLS emulation.

@jeremyd2019
Copy link
Member

I "discovered" this and tried to make some progress, but there's still a case that results in use-after-free:
https://sourceforge.net/p/mingw-w64/mailman/message/37186085/

@yeputons
Copy link

Will it help if _pthread_cleanup_dest in winpthreads calls destructors registered in crt by __mingw_cxa_thread_atexit (i.e. run_dtor_list from mingw-w64-crt\crt\tls_atexit.c)?

I believe something similar was suggested in this message, possibly for another issue: https://sourceforge.net/p/mingw-w64/mailman/message/37140300/

  1. Make crt expose a function like __mingw_cxa_thread_finalize which calls callbacks registered by __mingw_cxa_thread_atexit and removes them from the list. That way we don't have to modify existing logic near DLL_THREAD_DETACH.
    void __mingw_cxa_thread_finalize(void) {
        assert(inited);  // Not sure why it's needed, but sounds logical.
        dtor_obj * p = (dtor_obj *)TlsGetValue(tls_dtors_slot);
        run_dtor_list(&p);
        TlsSetValue(tls_dtors_slot, p);
    }
  2. Make winpthreads call __mingw_cxa_thread_finalize in _pthread_cleanup_dest. _pthread_cleanup_dest is called either by pthread_exit, or a terminating thread created via pthread_create, or similar DLL_THREAD_DETACH event (in which case we care less about order).

Such behavior would be a little more consistent with what happens on Linux: "user-defined" thread_atexit callbacks run first, "low-level" pthread cleanup runs last. It is assumed by at least some consumers, e.g. emutls, maybe others.

Should help with this specific issue. Not sure how it plays with dynamic linking, configuring, etc. The patch would be quite small, though.

Also, I assume it's impossible for crt to be detached from a thread before winpthreads, as the latter depends(?) on the former.

@jeremyd2019
Copy link
Member

This particular mingw-w64 "crt" code is present in each module, ie always statically linked. So each module has its own copy, and winpthreads calling it would call its copy only. Unless I'm missing something

@jeremyd2019
Copy link
Member

Would probably be better to discuss such details on mingw-w64-public list though.

@lygstate
Copy link
Contributor

Can we turn the mingw's implementation of _Thread_local to be consistence with msvc?
It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.

@lhmouse
Copy link
Contributor

lhmouse commented Sep 15, 2022

Can we turn the mingw's implementation of _Thread_local to be consistence with msvc?

It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.

Please read this series of articles first: http://www.nynaeve.net/?p=183

@lygstate
Copy link
Contributor

Can we turn the mingw's implementation of _Thread_local to be consistence with msvc?
It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.

Please read this series of articles first: http://www.nynaeve.net/?p=183

Yeap, I've read it before, so I post it here

@lhmouse
Copy link
Contributor

lhmouse commented Sep 16, 2022

Can we turn the mingw's implementation of _Thread_local to be consistence with msvc?
It's seems msvc'c __declspec(thread) doesn't depends on winpthreads at all.

Please read this series of articles first: http://www.nynaeve.net/?p=183

Yeap, I've read it before, so I post it here

Then you should have known that there is no such support in GCC. It is consequently impossible at least for now.

@lhmouse
Copy link
Contributor

lhmouse commented Sep 26, 2022

I think it's time that we give the posix thread model up. See #13259.

@oltolm
Copy link
Contributor

oltolm commented Dec 22, 2023

@lhmouse, sorry for the ping. Is this the same bug? Are there any plans to fix it? How does it affect the program?

Thread 10 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2308.0x36ac]
0x0000000001ef8d0c in std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long long>, true>::_M_next (this=0xfeeefeeefeeefeee)
    at C:/msys64/mingw64/include/c++/13.2.0/bits/hashtable_policy.h:377
377           { return static_cast<_Hash_node*>(this->_M_nxt); }
(gdb) bt
#0  0x0000000001ef8d0c in std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long long>, true>::_M_next (
    this=0xfeeefeeefeeefeee) at C:/msys64/mingw64/include/c++/13.2.0/bits/hashtable_policy.h:377
#1  0x00000000023171a6 in std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long long>, true> > >::_M_deallocate_nodes (this=0x335b49a8, __n=0xfeeefeeefeeefeee) at C:/msys64/mingw64/include/c++/13.2.0/bits/hashtable_policy.h:2041
#2  0x0000000001f3cac7 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long long>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long long> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::clear (this=0x335b49a8)
    at C:/msys64/mingw64/include/c++/13.2.0/bits/hashtable.h:2509
#3  0x0000000001f3cf08 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long long>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long long> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable (this=0x335b49a8, __in_chrg=<optimized out>)
    at C:/msys64/mingw64/include/c++/13.2.0/bits/hashtable.h:1593
#4  0x0000000002066308 in std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned long long, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, unsigned long long> > >::~unordered_map (this=0x335b49a8, __in_chrg=<optimized out>) at C:/msys64/mingw64/include/c++/13.2.0/bits/unordered_map.h:109
#5  0x000000000173a72d in run_dtor_list (ptr=<optimized out>) at C:/M/B/src/mingw-w64/mingw-w64-crt/crt/tls_atexit.c:62
#6  run_dtor_list (ptr=0x2fd1f3d0) at C:/M/B/src/mingw-w64/mingw-w64-crt/crt/tls_atexit.c:56
#7  tls_callback (hDllHandle=<optimized out>, dwReason=<optimized out>, lpReserved=<optimized out>) at C:/M/B/src/mingw-w64/mingw-w64-crt/crt/tls_atexit.c:165
#8  0x00007ffc020c9a1d in ntdll!RtlActivateActivationContextUnsafeFast () from C:\WINDOWS\SYSTEM32\ntdll.dll
#9  0x00007ffc020c9aff in ntdll!RtlActivateActivationContextUnsafeFast () from C:\WINDOWS\SYSTEM32\ntdll.dll
#10 0x00007ffc020c763b in ntdll!LdrShutdownThread () from C:\WINDOWS\SYSTEM32\ntdll.dll
#11 0x00007ffc0210468e in ntdll!RtlExitUserThread () from C:\WINDOWS\SYSTEM32\ntdll.dll
#12 0x00007ffc005aafa7 in msvcrt!_endthreadex () from C:\WINDOWS\System32\msvcrt.dll
#13 0x00000000006342b2 in thread_base::finalize (_self=7608) at C:/src/rpcs3/Utilities/Thread.cpp:2241
#14 0x000000003003d494 in ?? ()
#15 0x0000000000001db8 in ?? ()
#16 0x0000000000000000 in ?? ()

@lhmouse
Copy link
Contributor

lhmouse commented Dec 22, 2023

@oltolm Someone expressed negative thoughts in mingw-w64 so I am afraid there is currently no plan on it.

A proper fix for it requires hacking the CRT to invoke thread-local destructors before calling exit() 1 2. If you have time please try my personal toolchains here: https://gcc-mcf.lhmouse.com/

Footnotes

  1. https://github.com/lhmouse/mcfgthread/blob/2084f4d98e945cc12a5cafea1916da7256b01db8/patches/mingw-w64-10.0.patch#L87

  2. https://github.com/lhmouse/mcfgthread/blob/2084f4d98e945cc12a5cafea1916da7256b01db8/mcfgthread/cxa.c#L123

@Andarwinux
Copy link
Contributor

gcc14 win32 thread model seems to support c++11 thread. gcc14 will be released soon, so maybe consider use win32 thread model instead.

@lhmouse
Copy link
Contributor

lhmouse commented Feb 6, 2024

gcc14 win32 thread model seems to support c++11 thread. gcc14 will be released soon, so maybe consider use win32 thread model instead.

I don't think it will solve the issue: Destructors of static objects are called by the VCRT/UCRT exit() function, and destructors of thread-local objects are called by the TLS callback which happens after exit() calls ExitProcess(). The cause of this problem is always there.

And in addition, switching the thread model is an ABI break.

@oltolm
Copy link
Contributor

oltolm commented Mar 11, 2024

How does Clang solve this problem?

@lhmouse
Copy link
Contributor

lhmouse commented Mar 12, 2024

How does Clang solve this problem?

It has nothing to do with Clang; the magic lies in UCRT. We have https://github.com/mingw-w64/mingw-w64/blob/dbfdf802580c0d6b90b91995dab019f2a7787a8e/mingw-w64-crt/crt/tls_atexit.c#L108-L119 which causes UCRT to invoke destructors of thread-local objects before static ones. For MSVCRT we have an emulated (non-conforming) one in 'mingw-w64-crt/misc/register_tls_atexit.c'.

@oltolm
Copy link
Contributor

oltolm commented Mar 13, 2024

https://github.com/mingw-w64/mingw-w64/blob/dbfdf802580c0d6b90b91995dab019f2a7787a8e/mingw-w64-crt/misc/register_tls_atexit.c

So the advantage of MCF over MSVCRT is correctness and performance and the advantage of MCF over UCRT is performance? Is that correct?

Why not create a minimal environment/subsystem for MCF with development tools only? It is probably source compatible, so building packages would be easy. Sadly I can not use it without pacman. I would contribute to the MCF subsystem.

@lhmouse
Copy link
Contributor

lhmouse commented Mar 13, 2024

So the advantage of MCF over MSVCRT is correctness and performance and the advantage of MCF over UCRT is performance? Is that correct?

Yes.

Why not create a minimal environment/subsystem for MCF with development tools only? It is probably source compatible, so building packages would be easy. Sadly I can not use it without pacman. I would contribute to the MCF subsystem.

See https://gcc-mcf.lhmouse.com/ which is largely repackaged MSYS2 environments.

@oltolm
Copy link
Contributor

oltolm commented Mar 13, 2024

See https://gcc-mcf.lhmouse.com/ which is largely repackaged MSYS2 environments.

I know about it, it's great, but as I said I can't use it without pacman. pacman is too convenient.

@lhmouse
Copy link
Contributor

lhmouse commented Mar 14, 2024

I know about it, it's great, but as I said I can't use it without pacman. pacman is too convenient.

How about this? https://github.com/lhmouse/MINGW-packages

One reason that I do not upload pacman packages is that they are all 'drop-in' replacements for MSYS2 ones, and I have to configure my local pacman.conf to ignore them; otherwise they might be upgraded by pacman to MSYS2 ones.

I can upload a snapshot of my local packages. Once you have these packages you can bootstrap them yourself.

@lhmouse
Copy link
Contributor

lhmouse commented Mar 14, 2024

@oltolm pacman packages can be found here: https://files.lhmouse.com/pkg-snapshots/ . Source scripts are in https://github.com/lhmouse/MINGW-packages .

Installing these packages locally with pacman -U will produce environments for building everything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants