Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive mutex returns ENORECOVERABLE (56) with wasi-threads #2466

Closed
denizsokmen opened this issue Aug 14, 2023 · 3 comments · Fixed by WebAssembly/wasi-libc#433
Closed

Comments

@denizsokmen
Copy link

denizsokmen commented Aug 14, 2023

I've encountered an issue where two threads locking the same recursive mutex secretly fails. Also I've noticed that the lock failure only happens from the spawned thread.

Here's a minimal reproduction case where the program will abort if lock returns non-zero:

#!/bin/bash

set -e

if [ ! -d "wasm-micro-runtime" ]; then
    git clone --depth=1 https://github.com/bytecodealliance/wasm-micro-runtime.git
    pushd wasm-micro-runtime/product-mini/platforms/linux
    cmake -DWAMR_BUILD_LIB_WASI_THREADS=1 .
    make -j 8
    popd
fi

if [ ! -d "wasi-sdk-20.0" ]; then
    wget https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz
    tar xzf wasi-sdk-20.0-linux.tar.gz
fi

wasi-sdk-20.0/bin/clang++ --target=wasm32-wasi-threads --sysroot=wasi-sdk-20.0/share/wasi-sysroot/ -Wl,--import-memory -Wl,--export-memory -Wl,--shared-memory -Wl,--initial-memory=655360 test.cpp -o test.wasm -pthread

for i in $(seq 1 1000); do
    echo "Running $i / 1000"
    ./wasm-micro-runtime/product-mini/platforms/linux/iwasm test.wasm
done

Where test.cpp is:

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>

static    pthread_mutex_t mutex;
static int cntr = 0;

int main(int argc, char **argv) {
    cntr = 0;
    pthread_mutexattr_t attr;
    pthread_mutexattr_init( &attr );
    pthread_mutexattr_settype( &attr, PTHREAD_MUTEX_RECURSIVE );
    pthread_mutex_init(&mutex, &attr);
    pthread_mutexattr_destroy( &attr);

    pthread_t threadID;
    pthread_create(&threadID, NULL, [](void* arg) -> void* {
        for(int i = 0; i < 10000; i++) {
            int ret = 0;
           // do {
                ret = pthread_mutex_lock(&mutex);
           // } while(ret != 0);

            cntr++;
            if (ret != 0) {
                printf("BOOM %d\n", ret);
                abort();
            }
            pthread_mutex_unlock(&mutex);
        }
        return NULL;
    }, NULL);

    for(int i = 0; i < 10000; i++) {
        int ret = 0;
        // do {
            ret = pthread_mutex_lock(&mutex);

        //} while(ret != 0);

        cntr++;
        if (ret != 0) {
            abort();
        }

        pthread_mutex_unlock(&mutex);
    }

    pthread_join(threadID, NULL);
    printf("counted: %d\n", cntr);
    return 0;
}
@denizsokmen denizsokmen changed the title Recursive mutex returns ENORECOVERABLE (56) when there's certain amount of initial memory allocated with wasi-threads Recursive mutex returns ENORECOVERABLE (56) with wasi-threads Aug 14, 2023
@denizsokmen
Copy link
Author

OK I think I figured out what the issue is, it's about main() being assigned threadid of 0x3fffffff by default: https://github.com/WebAssembly/wasi-libc/blob/9f51a7102085ec6a6ced5778f0864c9af9f50000/libc-top-half/musl/src/env/__init_tls.c#L83
e.g.

  (func $_start (type 9)
    (local i32)
    block  ;; label = @1
      block  ;; label = @2
        i32.const 0
        i32.const 0
        i32.const 1
        i32.atomic.rmw.cmpxchg offset=3824
        br_if 0 (;@2;)
        call $__wasi_init_tp  <----------------
        call $__wasm_call_ctors
        call $__main_void
        local.set 0
        call $__wasm_call_dtors
        local.get 0
        br_if 1 (;@1;)
        return
      end
      unreachable
      unreachable
    end
    local.get 0
    call $__wasi_proc_exit
    unreachable)

since it's not spawned by the runtime, it doesn't get assigned a tid by the host, so it stays 0x3fffffff.

And then later on pthread_mutex_lock goes to this line because it's not a PTHREAD_MUTEX_NORMAL mutex but a recursive one: https://github.com/WebAssembly/wasi-libc/blob/9f51a7102085ec6a6ced5778f0864c9af9f50000/libc-top-half/musl/src/thread/pthread_mutex_trylock.c#L75

which then goes to here and sees that the mutex is being held by a thread with the id of 0x3fffffff which is an invalid thread id according to this code but it's actually our main(): https://github.com/WebAssembly/wasi-libc/blob/9f51a7102085ec6a6ced5778f0864c9af9f50000/libc-top-half/musl/src/thread/pthread_mutex_trylock.c#L24

The same issue can't be reproduced if it's two wasi-threads locking the same mutex. But when it's main() vs. a wasi-thread, it's a problem.

yamt added a commit to yamt/wasi-libc that referenced this issue Aug 15, 2023
the robust mutex logic in musl seems to assume that
the bit 29 of TIDs is always zero for some reasons.

from https://git.musl-libc.org/cgit/musl/commit/?id=099b89d3840c30d7dd962e18668c2e6d39f0c626
> note that the kernel ABI also reserves bit 29
> not to appear in any tid,

i'm not sure if the assumption is true or not, given that
FUTEX_TID_MASK is 0x3fffffff.

anyway, when using non-default type of mutex like recursive mutex,
it causes problems as we actually use TID 0x3fffffff for the main thread.

as we don't support robust mutex anyway, this commit simply
comments out the problematic condition.

fixes: bytecodealliance/wasm-micro-runtime#2466
abrown pushed a commit to WebAssembly/wasi-libc that referenced this issue Aug 21, 2023
the robust mutex logic in musl seems to assume that
the bit 29 of TIDs is always zero for some reasons.

from https://git.musl-libc.org/cgit/musl/commit/?id=099b89d3840c30d7dd962e18668c2e6d39f0c626
> note that the kernel ABI also reserves bit 29
> not to appear in any tid,

i'm not sure if the assumption is true or not, given that
FUTEX_TID_MASK is 0x3fffffff.

anyway, when using non-default type of mutex like recursive mutex,
it causes problems as we actually use TID 0x3fffffff for the main thread.

as we don't support robust mutex anyway, this commit simply
comments out the problematic condition.

fixes: bytecodealliance/wasm-micro-runtime#2466
@loganek
Copy link
Collaborator

loganek commented Aug 23, 2023

@denizsokmen it looks like that was a wasi-libc issue. Can we resolve the ticket now given the fix has already been merged?

@denizsokmen
Copy link
Author

@loganek Yes, that fixes the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants