Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ways to debug deadlocks #243

Open
gitmalong opened this issue Dec 29, 2022 · 13 comments
Open

Ways to debug deadlocks #243

gitmalong opened this issue Dec 29, 2022 · 13 comments

Comments

@gitmalong
Copy link

Hi!

What is the recommended approach to debug dead locks associated with Dashmap? Is there some tooling for that purpose?

Thanks

@xacrimon
Copy link
Owner

Hey, so this can be a bit tricky. Usually I just fire up lmdb and dump the backtraces of every thread when they happen and go through their callstacks. Does that help?

@gitmalong
Copy link
Author

Thanks I will give that a try. For tokio::sync::RwLock the tokio-console crate can be used. Think my issue is not related to this library so I gonna close the issue.

@gitmalong
Copy link
Author

Could something like this be integrated into Dashmap for debugging purposes? https://lib.rs/crates/no_deadlocks

@gitmalong gitmalong reopened this Dec 29, 2022
@gitmalong
Copy link
Author

gitmalong commented Dec 29, 2022

Hey, so this can be a bit tricky. Usually I just fire up lmdb and dump the backtraces of every thread when they happen and go through their callstacks. Does that help?

Do you have any references or doc for that (lmdb)?

@gitmalong
Copy link
Author

I wanted to give https://github.com/BurtonQin/lockbud a try but it does not work on Mac OS (https://github.com/BurtonQin/lockbud). Which approach do you normally take to get the backtraces of each thread?

@xacrimon
Copy link
Owner

Sorry, I made a typo. I rely on the lldb debugger to do this. I run my program and then do thread apply all bt and sift through the backtraces.

@gitmalong
Copy link
Author

gitmalong commented Dec 30, 2022

I ran (lldb) thread backtrace all and probably found the dead lock call that I also figured out through my logs. However I can't find another lock that blocks.

thread #8, name = 'tokio-runtime-worker'
    frame #0: 0x000000019209e5e4 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001920da638 libsystem_pthread.dylib`_pthread_cond_wait + 1232
    frame #2: 0x00000001010708cc butterbrot`dashmap::lock::RawRwLock::lock_exclusive_slow::h3eceab46f26c3724 + 624
    frame #3: 0x0000000100a4e75c butterbrot`butterbrot_rust::core::load_and_store::_$u7b$$u7b$closure$u7d$$u7d$::h07759cdff2954bfc + 3948

Is it correct that there must be another RawRwLock or dashmap::lock in the backtraces to confirm that there is a Deadlock (if they relate to the same dashmap)? Unfortunately I have not found another one.

@gitmalong
Copy link
Author

gitmalong commented Dec 31, 2022

Hi @xacrimon .

In #79 @notgull says

I think "don't hold a lock across a .await" should be documented.

Might this be the root cause of my issue cause I have something like:

let account = dm.get_mut(&account);

if let Some(mut a) = account {
      a.value_mut().save("update").await; // <-- Holding accross await point 
}

@gitmalong
Copy link
Author

Reproducer

    #[tokio::test]
    async fn dashmap_async_test() {
        struct CanAsyncSave {};
        impl CanAsyncSave {
            pub async fn save(&mut self) {
                tokio::time::sleep(Duration::from_millis(1)).await;
            }
        }
        let dm: Arc<DashMap<String, CanAsyncSave>> = Default::default();
        let dm_clone = dm.clone();

        tokio::task::spawn(async move {
            for _ in 0..100 {
                let mut entry = dm_clone.get_mut("1").unwrap();
                let val = entry.value_mut();
                val.save().await;
            }
        });

        for _ in 0..100 {
            dm.insert("1".into(), CanAsyncSave {});
            let mut entry = dm.get_mut("1").unwrap();
            let val = entry.value_mut();
            val.save().await;
        }
    }

@dariusc93
Copy link

Hi @xacrimon .

In #79 @notgull says

I think "don't hold a lock across a .await" should be documented.

Might this be the root cause of my issue cause I have something like:

let account = dm.get_mut(&account);

if let Some(mut a) = account {
      a.value_mut().save("update").await; // <-- Holding accross await point 
}

Sounds pretty similar to if you were using std mutex, parking_lot, etc., where you want to avoid locking across an await. In cases like that i would just stick with a lock that is await aware, avoid awaiting on it. or change functionality around

@matildasmeds
Copy link

This very helpful post covers how to make Dashmap not deadlock in async code: https://draft.ryhl.io/blog/shared-mutable-state/ It also explains why the compiler does not warn about these deadlocks.

The gist is never to await in anything while holding a lock. A lock is taken when accessing the Dashmap, and released when the guard is dropped... If I got it right that is.

The recommended approach is to never access Dashmap directly in async code, but through a convenience wrapper.

@httpjamesm
Copy link

Ran into this as I was looping over my DashMap's iterator and performing an async operation within it. This deadlocked my app unpredictably.

My solution was to collect the values from the iterator synchronously first, then loop over the collected values to perform my async operation.

@leontoeides
Copy link

leontoeides commented Mar 22, 2024

I was reading Alice's blog that was mentioned in this thread.

Alice points out that the compiler doesn't complain about a holding guard (or reference) over an await because it's Send.

I wonder if RefMut, RefMulti, etc. Send trait could be behind a feature gate? Or maybe no_send could be a feature? I'm sure there's a good reason these types are Send but if its possible for users to turn that off, it could make debugging easier?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants