-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fract4dc sanitize #154
fract4dc sanitize #154
Conversation
That's a pretty nice way to solve the problem.. there's a certain elegance to that vs starting threads paused |
There's a 3rd approach: using pthread_self inside I was wondering about the possibility of gtkfracal.py connects to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really have no clue about the clever stuff! I have applied the patch and run the tests locally and got all to pass - still some segfaults though (I'll see if I get a chance to collect some data).
This feels like something that would benefit from std::thread and I agree that calling set_tid() in the thread itself would be a far better idea as that removes all data races from that code path |
723b4fe
to
667fa16
Compare
Tried with std::thread and code looks neater. Not sure I'm doing the canonical way though. |
667fa16
to
8c9c196
Compare
I have this interesting output from the thread sanitizer: SUMMARY: ThreadSanitizer: data race object.h:459 in _Py_INCREF(_object*)
==================
==================
WARNING: ThreadSanitizer: data race (pid=67153)
Write of size 8 at 0x0001153544e0 by main thread:
#0 _Py_INCREF(_object*) object.h:459 (fract4dc.cpython-38-darwin.so:x86_64+0x2c0b1)
#1 _Py_XINCREF(_object*) object.h:532 (fract4dc.cpython-38-darwin.so:x86_64+0x2b9eb)
#2 calc_args::set_cmap(_object*) calcargs.cpp:24 (fract4dc.cpython-38-darwin.so:x86_64+0x2b979)
#3 parse_calc_args(_object*, _object*) calcs.cpp:172 (fract4dc.cpython-38-darwin.so:x86_64+0x12c31)
#4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:41 (fract4dc.cpython-38-darwin.so:x86_64+0x10ef1)
#5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x34d0)
#6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)
Previous write of size 8 at 0x0001153544e0 by thread T11:
#0 _Py_DECREF(char const*, int, _object*) object.h:470 (fract4dc.cpython-38-darwin.so:x86_64+0x2c168)
#1 _Py_XDECREF(_object*) object.h:541 (fract4dc.cpython-38-darwin.so:x86_64+0x2bf77)
#2 calc_args::~calc_args() calcargs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x2bea0)
#3 calc_args::~calc_args() calcargs.cpp:48 (fract4dc.cpython-38-darwin.so:x86_64+0x2c018)
#4 calculation_thread(calc_args*) calcs.cpp:97 (fract4dc.cpython-38-darwin.so:x86_64+0x13947)
#5 decltype(std::__1::forward<void* (*)(calc_args*)>(fp)(std::__1::forward<calc_args*>(fp0))) std::__1::__invoke<void* (*)(calc_args*), calc_args*>(void* (*&&)(calc_args*), calc_args*&&) type_traits:3530 (fract4dc.cpython-38-darwin.so:x86_64+0x173af)
#6 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*>&, std::__1::__tuple_indices<2ul>) thread:277 (fract4dc.cpython-38-darwin.so:x86_64+0x16e88)
#7 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*> >(void*) thread:287 (fract4dc.cpython-38-darwin.so:x86_64+0x144b0)
Thread T11 (tid=3087829, finished) created by main thread at:
#0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
#1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x14306)
#2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x13d39)
#3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13a1a)
#4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x1154f)
#5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x34d0)
#6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)
SUMMARY: ThreadSanitizer: data race object.h:459 in _Py_INCREF(_object*)
================== It's complaining because we're increasing and decreasing the ref count of the same pycmap object in two different threads: the main one and the one created for the calculation. That happens, I guess, because another call to pycalc came in, before the thread created in a previous call has ended. |
more importantly, it's complaining because the decref is in the destructor, meaning the object lifetime finishes and at the same time, the object is incref'd.. this feels like maybe there's a mutex missing around the calc_args object? |
Actually there was a missing mutex in there. I had to use a static mutex because it's not the calc_args instance itself causing the problem but multiple instances of calc_args increasing and decreasing the ref_count of the same instance of cmap, pfo, image and site. I also got this fixed (hopefully): WARNING: ThreadSanitizer: data race (pid=69030)
Read of size 1 at 0x7b28000172b4 by thread T9:
#0 FDSite::image_changed(int, int, int, int) site.cpp:62 (fract4dc.cpython-38-darwin.so:x86_64+0x43d66)
#1 fractFunc::image_changed(int, int, int, int) const fractfunc.h:58 (fract4dc.cpython-38-darwin.so:x86_64+0x3760f)
#2 fractFunc::reset_progress(float) fractfunc.cpp:129 (fract4dc.cpython-38-darwin.so:x86_64+0x39f7d)
#3 fractFunc::draw(int, int, float, float) fractfunc.cpp:246 (fract4dc.cpython-38-darwin.so:x86_64+0x3dd17)
#4 fractFunc::draw_all() fractfunc.cpp:170 (fract4dc.cpython-38-darwin.so:x86_64+0x3b65f)
#5 calc calcfunc.cpp:38 (fract4dc.cpython-38-darwin.so:x86_64+0x34869)
#6 calculation_thread(calc_args*) calcs.cpp:85 (fract4dc.cpython-38-darwin.so:x86_64+0x13de5)
#7 decltype(std::__1::forward<void* (*)(calc_args*)>(fp)(std::__1::forward<calc_args*>(fp0))) std::__1::__invoke<void* (*)(calc_args*), calc_args*>(void* (*&&)(calc_args*), calc_args*&&) type_traits:3530 (fract4dc.cpython-38-darwin.so:x86_64+0x178af)
#8 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*>&, std::__1::__tuple_indices<2ul>) thread:277 (fract4dc.cpython-38-darwin.so:x86_64+0x17388)
#9 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*> >(void*) thread:287 (fract4dc.cpython-38-darwin.so:x86_64+0x149b0)
Previous write of size 1 at 0x7b28000172b4 by main thread:
[failed to restore the stack]
Location is heap block of size 152 at 0x7b28000172a0 allocated by main thread:
#0 operator new(unsigned long) <null>:5353440 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x5492b)
#1 sites::pyfdsite_create(_object*, _object*) sites.cpp:18 (fract4dc.cpython-38-darwin.so:x86_64+0xacd0)
#2 pyfdsite_create(_object*, _object*) fract4dmodule.cpp:128 (fract4dc.cpython-38-darwin.so:x86_64+0x35c0)
#3 cfunction_call_varargs <null>:5353440 (Python:x86_64+0x1a555)
Thread T9 (tid=3141054, running) created by main thread at:
#0 pthread_create <null>:5353488 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
#1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x14806)
#2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x14239)
#3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13f1a)
#4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x11a4f)
#5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x39d0)
#6 cfunction_call_varargs <null>:5353488 (Python:x86_64+0x1a4c1)
SUMMARY: ThreadSanitizer: data race site.cpp:62 in FDSite::image_changed(int, int, int, int) And finally I have this last one identified by threadsanitizer which I run into when closing the program: ==================
WARNING: ThreadSanitizer: thread leak (pid=69316)
Thread T4 (tid=3146887, finished) created by main thread at:
#0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
#1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x146f6)
#2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x14129)
#3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13e0a)
#4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x1193f)
#5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x38c0)
#6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)
And 1 more similar thread leaks.
SUMMARY: ThreadSanitizer: thread leak __threading_support:336 in std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*)
================== I haven't figured out this one yet |
Btw I had 3 in a row pushes passing the tests, that's encouraging in regard of #73 |
fract4d/c/model/site.cpp
Outdated
@@ -96,6 +93,7 @@ void FDSite::status_changed(int status_val) | |||
bool FDSite::is_interrupted() | |||
{ | |||
//fprintf(stderr,"int: %d\n",interrupted); | |||
const std::lock_guard<std::mutex> lock(interrupt_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't quite understand why you need a mutex for the interrupted var. We're just checking the current value of a bool. Nothing bad happens if we don't detect that we were interrupted until the next time we check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it’s because of the volatile qualifier that threadsanitizer complains.
Didn’t want to remove the qualifier, I’m not sure I’m missing something about the reason to have it. But maybe it makes sense to just remove it.
void calc_args::set_cmap(PyObject *pycmap_) | ||
{ | ||
pycmap = pycmap_; | ||
cmap = colormaps::cmap_fromcapsule(pycmap); | ||
const std::lock_guard<std::mutex> lock(ref_count_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the ideal pattern would be if only the main thread ever dealt with Python refcounts and all the calc threads could ignore it. But haven't thought much about how complicated that would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve been playing around with GIL acquisition to try to address the last threadsanitizer report. No luck so far.
I guess the GTK signals triggering the calculation process out of the interpreter thread had something to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interrupted should most probably be std::atomic<> and that makes the mutex and data race go away. Volatile is not the right language feature in this case as it doesn't mean thread safe and only causes the compiler to always emit read instructions when the variable is referenced, it doesn't make the compiler ensure cache cohearency or atomicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton. I have some open-ended comments but don't feel they are must-fix
You are very welcome. Appreciate so much your feedback. |
7b20f04
to
9417883
Compare
I've just made interrupted std::atomic and removed the mutex. Threadsanitizer warning is gone as @DX-MON predicted. |
Address issues detected with UBSAN ASAN TSAN
I've already managed to find one: #73 (comment)
Having #133 would help thanks to Menson ability to add sanitizers out of the box