Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocking with rwlock #1205

Open
arn-backpack opened this issue Oct 24, 2024 · 1 comment
Open

Deadlocking with rwlock #1205

arn-backpack opened this issue Oct 24, 2024 · 1 comment

Comments

@arn-backpack
Copy link

arn-backpack commented Oct 24, 2024

Hello,

I have run into some strange behaviour where rayon seems to deadlock when interacting with a rwlock. I have seen other issues such as this #592, but it doesn't quite seem to fit the below example.

In the below example we

  • Let a task acquire a read lock.
  • Let another task block on acquiring the write lock.
  • Let several tasks block on acquiring a read lock.

However, the task which successfully acquired the read lock seems to deadlock on its rayon processing, even though there should be available rayon threads to process it. We are only creating CORES / 2 read lock tasks so there should be plenty of rayon threads which are not blocking on acquiring the read lock.

use parking_lot::RwLock;
use rayon::iter::ParallelIterator;
use rayon::prelude::IntoParallelRefIterator;
use rayon::ThreadPoolBuilder;
use std::sync::Arc;
use std::thread::sleep;


pub const CORES: usize = 16;


#[derive(Debug)]
struct Engine {
    data: u64,
}

fn main() {
    ThreadPoolBuilder::new()
        .thread_name(|i| format!("rayon-thread-{}", i))
        .build_global()
        .unwrap();

    let engine = Arc::new(RwLock::new(Engine { data: 0 }));

    rayon::spawn({
        let engine = engine.clone();
        move || loop {
            {
                // Attempt to acquire the lock after the first read task below has acquired the read lock.
                sleep(std::time::Duration::from_millis(50));
                println!("Acquiring write lock");
                let mut lock = engine.write();
                println!("Writing data");
                lock.data += 1;
            }
            println!("Data written");
            sleep(std::time::Duration::from_secs(3));
        }
    });

    let tasks: Vec<_> = (0..CORES / 2).collect();
    for task_number in &tasks {
        // The first task wont sleep and will acquire the read lock before the above write task attempts to acquire the write lock.
        sleep(std::time::Duration::from_millis(*task_number as u64 * 100));
        my_task(*task_number, engine.clone());
    }

    sleep(std::time::Duration::from_secs(1_000));
}

fn my_task(task_number: usize, lock: Arc<RwLock<Engine>>) {
    rayon::spawn(move || {
        let thread = std::thread::current();
        println!("Attempting to acquire lock task={} on thread={}", task_number, thread.name().unwrap());

        let data = lock.read();
        println!("Successfully acquired lock task={}", task_number);
        sleep(std::time::Duration::from_millis(1_000));
        let mut list = Vec::new();
        list.extend(0..CORES);
        let _: Vec<_> = list
            .par_iter()
            .map(|idx| {
                let binding = std::thread::current();
                sleep(std::time::Duration::from_secs(1));
                println!("Task={} thread={} processing it={}", task_number, binding.name().unwrap(), idx);
                idx
            })
            .collect();
        let thread = std::thread::current();
        // This never prints.
        println!("Processing completed task={} thread={} data={:?}", task_number, thread.name().unwrap(), *data);
    });
}
@cuviper
Copy link
Member

cuviper commented Oct 24, 2024

I think it is the same fundamental issue as #592. Your task that's holding the read lock is splitting into many recursive joins via par_iter. When the first part of a join finishes, it will wait for its other half to finish as well, so it enters work-stealing. If that steals another read-lock job, it will block due to the waiting writer, and that's a deadlock since it prevents the current reader from finishing.

If you attach a debugger, it should be possible to see this in the thread backtraces.

Since you're using parking_lot, you could work around it in this particular example by calling read_recursive instead, since that ignores waiting writers -- "starving" them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants