Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlib is not thread safe #25

Closed
jalil-salame opened this issue Jul 29, 2023 · 11 comments · Fixed by #26
Closed

dlib is not thread safe #25

jalil-salame opened this issue Jul 29, 2023 · 11 comments · Fixed by #26

Comments

@jalil-salame
Copy link
Contributor

Trying to speed up predictions by using rayon causes a core dump within dlib's allocator, some of the structures should be marked as either !Sync or !Send (or both).

@JSanchesDovichi
Copy link
Contributor

JSanchesDovichi commented Jul 29, 2023

Hello there!
Previously i successfully parallelize predictions using this crate (i didn't use rayon, but tokio, but i believe that wouldn't be a problem), but i believe we tried different approaches.
Do you have any code snippet to show us so we can figure out if anything went wrong?
I'm unsure if you're trying to use rayon to speed up a process inside this crate, or a high-level understanding of a prediction.

Thanks for bringing this up :D

@jalil-salame
Copy link
Contributor Author

jalil-salame commented Jul 29, 2023

I am running the face detector and landmark predictor inside of rayon.

This is the code (without rayon), it breaks if you do into_par_iter instead of into_iter

https://github.com/jalil-salame/face-stabilizer/blob/484c02a72f622307968a2933de1b8174ca460c4b/src/main.rs#L89-L103

This is the code that calls into dlib:

https://github.com/jalil-salame/face-stabilizer/blob/484c02a72f622307968a2933de1b8174ca460c4b/landmark-extractor/src/lib.rs#L119-L132

Using gdb I find the crash on the memory allocator from dlib (C++):

#0  0x00007ffff7b6d009 in __memset_avx2_unaligned_erms () from /nix/store/ayg065nw0xi1zsyi8glfh5pn4sfqd8xg-glibc-2.37-8/lib/libc.so.6
#1  0x0000555555fd89d1 in void dlib::impl_fhog::init_hog<float, dlib::memory_manager_stateless_kernel_1<char>, dlib::memory_manager_stateless_kernel_1<char> >(dlib::array<dlib::array2d<float, dlib::memory_manager_stateless_kernel_1<char> >, dlib::memory_manager_stateless_kernel_1<char> >&, int, int, int, int) ()
#2  0x0000555555fe2189 in void dlib::impl_fhog::impl_extract_fhog_features<dlib::matrix<dlib::rgb_pixel, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout>, dlib::array<dlib::array2d<float, dlib::memory_manager_stateless_kernel_1<char> >, dlib::memory_manager_stateless_kernel_1<char> > >(dlib::matrix<dlib::rgb_pixel, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout> const&, dlib::array<dlib::array2d<float, dlib::memory_manager_stateless_kernel_1<char> >, dlib::memory_manager_stateless_kernel_1<char> >&, int, int, int) ()
#3  0x0000555555ff0de9 in void dlib::object_detector<dlib::scan_fhog_pyramid<dlib::pyramid_down<6u>, dlib::default_fhog_feature_extractor> >::operator()<dlib::matrix<dlib::rgb_pixel, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout> >(dlib::matrix<dlib::rgb_pixel, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout> const&, std::vector<dlib::rect_detection, std::allocator<dlib::rect_detection> >&, double) [clone .constprop.0] ()
#4  0x0000555555ff3dfc in __cpp_closure_9405994709588434887 ()
#5  0x0000555555fc9277 in <dlib_face_recognition::face_detection::hog::FaceDetector as dlib_face_recognition::face_detection::base::FaceDetectorTrait>::face_locations::h13a9c7e3fb1a1f33 ()
#6  0x00005555556d6a50 in landmark_extractor::extract_landmarks::h19c74ed3caa2461f ()
#7  0x00005555556c9016 in <rayon::iter::map::MapFolder<C,F> as rayon::iter::plumbing::Folder<T>>::consume::h29a4ebbc12e017fa ()
#8  0x000055555568f708 in rayon::iter::plumbing::Folder::consume_iter::h22c3cde534b77f5d ()
#9  0x00005555556c01d7 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#10 0x00005555556cabaf in rayon_core::join::join_context::{{closure}}::hd8c172b9dc1ce682 ()
#11 0x00005555556cb0ea in rayon_core::registry::in_worker::heda56fb9e44646f1 ()
#12 0x00005555556c0340 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#13 0x00005555556cabaf in rayon_core::join::join_context::{{closure}}::hd8c172b9dc1ce682 ()
#14 0x00005555556cb0ea in rayon_core::registry::in_worker::heda56fb9e44646f1 ()
#15 0x00005555556c0340 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#16 0x00005555556cabaf in rayon_core::join::join_context::{{closure}}::hd8c172b9dc1ce682 ()
#17 0x00005555556cb0ea in rayon_core::registry::in_worker::heda56fb9e44646f1 ()
#18 0x00005555556c0340 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#19 0x00005555556cabaf in rayon_core::join::join_context::{{closure}}::hd8c172b9dc1ce682 ()
#20 0x00005555556cb0ea in rayon_core::registry::in_worker::heda56fb9e44646f1 ()
#21 0x00005555556c0340 in rayon::iter::plumbing::bridge_producer_consumer::helper::h85b815c6acc52d9c ()
#22 0x00005555556d5797 in <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h7e18d7c6e4f01ef3 ()
#23 0x0000555555680323 in rayon_core::registry::WorkerThread::wait_until_cold::h7f249316ded4d7d9 ()
#24 0x00005555561456c8 in rayon_core::registry::ThreadBuilder::run::h9dd80136da8f4724 ()
#25 0x000055555614962a in std::sys_common::backtrace::__rust_begin_short_backtrace::haffca54ca4ae1e6b ()
#26 0x0000555556149c15 in core::ops::function::FnOnce::call_once{{vtable.shim}}::h7a25643fd4689018 ()
#27 0x00005555561abb35 in std::sys::unix::thread::Thread::new::thread_start::hfffad9b0b70f2f00 ()
#28 0x00007ffff7a9fdd4 in start_thread () from /nix/store/ayg065nw0xi1zsyi8glfh5pn4sfqd8xg-glibc-2.37-8/lib/libc.so.6
#29 0x00007ffff7b219b0 in clone3 () from /nix/store/ayg065nw0xi1zsyi8glfh5pn4sfqd8xg-glibc-2.37-8/lib/libc.so.6

It is flaky (seems to work about every other time).

You can run it like face-stabilizer extract-features /path/to/dir/with/images/of/faces

@jalil-salame
Copy link
Contributor Author

I am running it on master so that I can use your C++14 fix

@JSanchesDovichi
Copy link
Contributor

JSanchesDovichi commented Jul 30, 2023

Hello! Sorry for the late reply, i took some time to try and understand your code and what you wanted to achieve.

Please do correct me if i`m wrong, but i guess you wanted to use rayon to find rectangles in multiple images in an asynchronous way.

If that's correct, i believe there may be some mistake in your code sample.

I did a small reproducible code to test this:

use dlib_face_recognition::{FaceDetector, FaceDetectorTrait, ImageMatrix};
use rayon::prelude::*;
use serde::{Deserialize, Serialize};
use std::{collections::HashMap, fs::read_dir};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SerializableRectangle {
   pub left: i64,
   pub top: i64,
   pub right: i64,
   pub bottom: i64,
}

fn find_rectangles(image: &String) -> Vec<SerializableRectangle> {
   println!("{image}");

   let mut rect_results: Vec<SerializableRectangle> = vec![];

   let image = image::open(image).unwrap().to_rgb8();
   let matrix = ImageMatrix::from_image(&image);

   let detector = FaceDetector::default();

   let face_locations = detector.face_locations(&matrix);

   for found_rectangle in face_locations.iter() {
       let serializable_rectangle = SerializableRectangle {
           left: found_rectangle.left,
           right: found_rectangle.right,
           top: found_rectangle.top,
           bottom: found_rectangle.bottom,
       };

       rect_results.push(serializable_rectangle);
   }

   rect_results
}

fn main() {
   let Ok(paths) = read_dir("./images") else {
       panic!("Unable to read images folder!")
   };

   let mut images: HashMap<String, Vec<SerializableRectangle>> = HashMap::new();

   for path in paths.flatten() {
       let string = path.path().display().to_string();
       images.insert(string, vec![]);
   }

   images.par_iter_mut().for_each(|entry| {
       let rects = find_rectangles(entry.0);
       for rect in rects {
           entry.1.push(rect);
       }
   });

   for entry in images {
       println!("{:?}", entry.1)
   }
}

This works without any errors, proving that the structs in this crate are indeed thread-safe (at least, with this code approach).
In another possibility, it could be something about trying to load the detector files in such a way, i have this happen to me before, but at this point, i'm unsure about that direction.

I really hope to have been able to help or guess your code objective correctly.

Please do try this out, and keep us informed!

@jalil-salame
Copy link
Contributor Author

That is basically what I do, with a few minor tweaks:

  • I do fewer allocations
  • I keep the detector around (and pass only a &Detector referece),
  • I also run the landmark predictor on the rects

So your code would look like this (might not compile, I haven't tested it):

use dlib_face_recognition::{FaceDetector, FaceDetectorTrait, ImageMatrix, LandmarkPredictor, LandmarkPredictorTrait};
use rayon::prelude::*;
use serde::{Deserialize, Serialize};
use std::{collections::HashMap, fs::read_dir};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SerializableRectangle {
   pub left: i64,
   pub top: i64,
   pub right: i64,
   pub bottom: i64,
}

fn find_rectangles(image: &String, detector: &FaceDetector, predictor: &LandmarkPredictor) -> Vec<SerializableRectangle> {
   println!("{image}");

   let mut rect_results: Vec<SerializableRectangle> = vec![];

   let image = image::open(image).unwrap().to_rgb8();
   let matrix = ImageMatrix::from_image(&image);

   let face_locations = detector.face_locations(&matrix);

   for found_rectangle in face_locations.iter() {
       let serializable_rectangle = SerializableRectangle {
           left: found_rectangle.left,
           right: found_rectangle.right,
           top: found_rectangle.top,
           bottom: found_rectangle.bottom,
       };

       rect_results.push(serializable_rectangle);
       // Also save the facial landmarks here
   }

   rect_results
}

fn main() {
   let Ok(paths) = read_dir("./images") else {
       panic!("Unable to read images folder!")
   };

   let mut images: HashMap<String, Vec<SerializableRectangle>> = HashMap::new();

   for path in paths.flatten() {
       let string = path.path().display().to_string();
       images.insert(string, vec![]);
   }

   let detector = FaceDetector::default();
   let predictor = LandmarkPredictor::new("/path/to/model");

   images.par_iter_mut().for_each(|entry| {
       let rects = find_rectangles(entry.0, &detector, &predictor);
       for rect in rects {
           entry.1.push(rect);
       }
   });

   for entry in images {
       println!("{:?}", entry.1)
   }
}

@jalil-salame
Copy link
Contributor Author

I'll try to write a minimal example next weekend when I have time

@JSanchesDovichi
Copy link
Contributor

Hello!
After some more investigation, i did reach the same segfault as @jalil-salame, and can confirm that truly the structs should be marked for !Sync and !Send, except the LandmarkPredictor, which is thread-safe as per my previous extensive-testing with a personal project.

The problem is: negative impls are a nightly locked feature of the language, meaning we cannot do this PR and still use this crate in a stable Rust compiler toolchain. This would also force whatever projects are using this crate to also be forced to run Nightly Rust.

The negative impls RFC is here: rust-lang/rust#68318.

The code/PR itself is pretty easy by itself, so not a problem except that.
I do find this important, and would vouch for this to be implemented at some point, but the question of the nightly toolchain still remains.

Ultimately, the decision should be by @kerryeon.

@jalil-salame
Copy link
Contributor Author

jalil-salame commented Jul 30, 2023

The problem is: negative impls are a nightly locked feature of the language, meaning we cannot do this PR and still use this crate in a stable Rust compiler toolchain. This would also force whatever projects are using this crate to also be forced to run Nightly Rust.

Ultimately, the decision should be by @kerryeon.

You can work around that limitation by adding some phantom data to them (specifically !Sync/!Send phantom data, like UnsafeCell)

If I remember correctly

@jalil-salame
Copy link
Contributor Author

The problem is: negative impls are a nightly locked feature of the language, meaning we cannot do this PR and still use this crate in a stable Rust compiler toolchain. This would also force whatever projects are using this crate to also be forced to run Nightly Rust.

Ultimately, the decision should be by @kerryeon.

You can work around that limitation by adding some phantom data to them (specifically !Sync/!Send phantom data, like UnsafeCell)

If I remember correctly

PhantomData will work, using PhantomData<UnsafeCell<()>> won't, as UnsafeCell will be Send.

The right choice should be adding a field with PhatomData<*const ()> as raw pointers are neither Send nor Sync.

I don't know how the dlib types work, if they have interior mutability, they should be !Sync, if they have thread local data, they should be !Send.

Having them be !Send and !Sync would be the safe choice.

@JSanchesDovichi
Copy link
Contributor

The problem is: negative impls are a nightly locked feature of the language, meaning we cannot do this PR and still use this crate in a stable Rust compiler toolchain. This would also force whatever projects are using this crate to also be forced to run Nightly Rust.
Ultimately, the decision should be by @kerryeon.

You can work around that limitation by adding some phantom data to them (specifically !Sync/!Send phantom data, like UnsafeCell)
If I remember correctly

PhantomData will work, using PhantomData<UnsafeCell<()>> won't, as UnsafeCell will be Send.

The right choice should be adding a field with PhatomData<*const ()> as raw pointers are neither Send nor Sync.

I don't know how the dlib types work, if they have interior mutability, they should be !Sync, if they have thread local data, they should be !Send.

Having them be !Send and !Sync would be the safe choice.

Just did some code testing, PhantomData solution seems to work, albeit a little jank, but it avoids the nightly problem.
Will take some time later on tonight to write a PR for that + documentation on the negative impl issue.

Thanks for all the info and cooperation!

@HoKim98
Copy link
Member

HoKim98 commented Aug 31, 2023

Hi, I read the issue and confirmed that dlib is not thread safe as the following comment: davisking/dlib#340 (comment)

And it seems that both Send and Sync features are not supported.

HoKim98 added a commit that referenced this issue Aug 31, 2023
fix(#25): Mark certain structs as !Sync
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants