You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have run into some strange behaviour where rayon seems to deadlock when interacting with a rwlock. I have seen other issues such as this #592, but it doesn't quite seem to fit the below example.
In the below example we
Let a task acquire a read lock.
Let another task block on acquiring the write lock.
Let several tasks block on acquiring a read lock.
However, the task which successfully acquired the read lock seems to deadlock on its rayon processing, even though there should be available rayon threads to process it. We are only creating CORES / 2 read lock tasks so there should be plenty of rayon threads which are not blocking on acquiring the read lock.
use parking_lot::RwLock;use rayon::iter::ParallelIterator;use rayon::prelude::IntoParallelRefIterator;use rayon::ThreadPoolBuilder;use std::sync::Arc;use std::thread::sleep;pubconstCORES:usize = 16;#[derive(Debug)]structEngine{data:u64,}fnmain(){ThreadPoolBuilder::new().thread_name(|i| format!("rayon-thread-{}", i)).build_global().unwrap();let engine = Arc::new(RwLock::new(Engine{data:0}));
rayon::spawn({let engine = engine.clone();move || loop{{// Attempt to acquire the lock after the first read task below has acquired the read lock.sleep(std::time::Duration::from_millis(50));println!("Acquiring write lock");letmut lock = engine.write();println!("Writing data");
lock.data += 1;}println!("Data written");sleep(std::time::Duration::from_secs(3));}});let tasks:Vec<_> = (0..CORES / 2).collect();for task_number in&tasks {// The first task wont sleep and will acquire the read lock before the above write task attempts to acquire the write lock.sleep(std::time::Duration::from_millis(*task_number asu64*100));my_task(*task_number, engine.clone());}sleep(std::time::Duration::from_secs(1_000));}fnmy_task(task_number:usize,lock:Arc<RwLock<Engine>>){
rayon::spawn(move || {let thread = std::thread::current();println!("Attempting to acquire lock task={} on thread={}", task_number, thread.name().unwrap());let data = lock.read();println!("Successfully acquired lock task={}", task_number);sleep(std::time::Duration::from_millis(1_000));letmut list = Vec::new();
list.extend(0..CORES);let _:Vec<_> = list
.par_iter().map(|idx| {let binding = std::thread::current();sleep(std::time::Duration::from_secs(1));println!("Task={} thread={} processing it={}", task_number, binding.name().unwrap(), idx);
idx
}).collect();let thread = std::thread::current();// This never prints.println!("Processing completed task={} thread={} data={:?}", task_number, thread.name().unwrap(),*data);});}
The text was updated successfully, but these errors were encountered:
I think it is the same fundamental issue as #592. Your task that's holding the read lock is splitting into many recursive joins via par_iter. When the first part of a join finishes, it will wait for its other half to finish as well, so it enters work-stealing. If that steals another read-lock job, it will block due to the waiting writer, and that's a deadlock since it prevents the current reader from finishing.
If you attach a debugger, it should be possible to see this in the thread backtraces.
Since you're using parking_lot, you could work around it in this particular example by calling read_recursive instead, since that ignores waiting writers -- "starving" them.
Hello,
I have run into some strange behaviour where rayon seems to deadlock when interacting with a rwlock. I have seen other issues such as this #592, but it doesn't quite seem to fit the below example.
In the below example we
However, the task which successfully acquired the read lock seems to deadlock on its rayon processing, even though there should be available rayon threads to process it. We are only creating
CORES / 2
read lock tasks so there should be plenty of rayon threads which are not blocking on acquiring the read lock.The text was updated successfully, but these errors were encountered: