fract4dc sanitize #154

mindhells · 2020-07-13T14:41:17Z

Address issues detected with UBSAN ASAN TSAN

I've already managed to find one: #73 (comment)

Having #133 would help thanks to Menson ability to add sanitizers out of the box

dragonmux · 2020-07-13T15:27:06Z

That's a pretty nice way to solve the problem.. there's a certain elegance to that vs starting threads paused

mindhells · 2020-07-14T09:15:55Z

That's a pretty nice way to solve the problem.. there's a certain elegance to that vs starting threads paused

There's a 3rd approach: using pthread_self inside calculation_thread and setting site->set_tid(0) before creating the thread.

I was wondering about the possibility of site->wait being called after pthread_create and before site->set_tid(tid). Since the site object is reused by fractal.py that seems to be a possibility but I assume we are protected by the python global interpreter lock, are we?

gtkfracal.py connects to Gtk.DrawingArea to trigger the fract4c.calc, can we guarantee GIL is not released by the GTK main loop?, as stated in the manual:

One of the weirder parts of the code is how we deal with threading. Basically we want the calculation of the fractal to happen on a different thread (or multiple threads for SMP) from the main UI, so you can interrupt at any point. This is complicated by the fact that Python only allows a single thread in the Global Interpreter Lock, and that PyGTK is often compiled by Linux distribution vendors without thread support, meaning this lock is not released when running the GTK main loop. (This last is probably nottrue any more, but I haven’t checked.)

cjmayo

I really have no clue about the clever stuff! I have applied the patch and run the tests locally and got all to pass - still some segfaults though (I'll see if I get a chance to collect some data).

dragonmux · 2020-07-14T19:22:17Z

This feels like something that would benefit from std::thread and I agree that calling set_tid() in the thread itself would be a far better idea as that removes all data races from that code path

mindhells · 2020-07-15T12:24:54Z

This feels like something that would benefit from std::thread and I agree that calling set_tid() in the thread itself would be a far better idea as that removes all data races from that code path

Tried with std::thread and code looks neater. Not sure I'm doing the canonical way though.
On the other hand, if there's consensus about using this approach, we might need to change threadpool as well, to keep consistency.

fract4d/c/fract4dc/calcs.cpp

mindhells · 2020-07-16T16:00:39Z

I have this interesting output from the thread sanitizer:

SUMMARY: ThreadSanitizer: data race object.h:459 in _Py_INCREF(_object*)
==================
==================
WARNING: ThreadSanitizer: data race (pid=67153)
  Write of size 8 at 0x0001153544e0 by main thread:
    #0 _Py_INCREF(_object*) object.h:459 (fract4dc.cpython-38-darwin.so:x86_64+0x2c0b1)
    #1 _Py_XINCREF(_object*) object.h:532 (fract4dc.cpython-38-darwin.so:x86_64+0x2b9eb)
    #2 calc_args::set_cmap(_object*) calcargs.cpp:24 (fract4dc.cpython-38-darwin.so:x86_64+0x2b979)
    #3 parse_calc_args(_object*, _object*) calcs.cpp:172 (fract4dc.cpython-38-darwin.so:x86_64+0x12c31)
    #4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:41 (fract4dc.cpython-38-darwin.so:x86_64+0x10ef1)
    #5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x34d0)
    #6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)

  Previous write of size 8 at 0x0001153544e0 by thread T11:
    #0 _Py_DECREF(char const*, int, _object*) object.h:470 (fract4dc.cpython-38-darwin.so:x86_64+0x2c168)
    #1 _Py_XDECREF(_object*) object.h:541 (fract4dc.cpython-38-darwin.so:x86_64+0x2bf77)
    #2 calc_args::~calc_args() calcargs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x2bea0)
    #3 calc_args::~calc_args() calcargs.cpp:48 (fract4dc.cpython-38-darwin.so:x86_64+0x2c018)
    #4 calculation_thread(calc_args*) calcs.cpp:97 (fract4dc.cpython-38-darwin.so:x86_64+0x13947)
    #5 decltype(std::__1::forward<void* (*)(calc_args*)>(fp)(std::__1::forward<calc_args*>(fp0))) std::__1::__invoke<void* (*)(calc_args*), calc_args*>(void* (*&&)(calc_args*), calc_args*&&) type_traits:3530 (fract4dc.cpython-38-darwin.so:x86_64+0x173af)
    #6 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*>&, std::__1::__tuple_indices<2ul>) thread:277 (fract4dc.cpython-38-darwin.so:x86_64+0x16e88)
    #7 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*> >(void*) thread:287 (fract4dc.cpython-38-darwin.so:x86_64+0x144b0)

  Thread T11 (tid=3087829, finished) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
    #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x14306)
    #2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x13d39)
    #3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13a1a)
    #4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x1154f)
    #5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x34d0)
    #6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)

SUMMARY: ThreadSanitizer: data race object.h:459 in _Py_INCREF(_object*)
==================

It's complaining because we're increasing and decreasing the ref count of the same pycmap object in two different threads: the main one and the one created for the calculation. That happens, I guess, because another call to pycalc came in, before the thread created in a previous call has ended.

dragonmux · 2020-07-16T22:25:20Z

more importantly, it's complaining because the decref is in the destructor, meaning the object lifetime finishes and at the same time, the object is incref'd.. this feels like maybe there's a mutex missing around the calc_args object?

mindhells · 2020-07-17T08:50:46Z

more importantly, it's complaining because the decref is in the destructor, meaning the object lifetime finishes and at the same time, the object is incref'd.. this feels like maybe there's a mutex missing around the calc_args object?

Actually there was a missing mutex in there. I had to use a static mutex because it's not the calc_args instance itself causing the problem but multiple instances of calc_args increasing and decreasing the ref_count of the same instance of cmap, pfo, image and site.

I also got this fixed (hopefully):

WARNING: ThreadSanitizer: data race (pid=69030)
  Read of size 1 at 0x7b28000172b4 by thread T9:
    #0 FDSite::image_changed(int, int, int, int) site.cpp:62 (fract4dc.cpython-38-darwin.so:x86_64+0x43d66)
    #1 fractFunc::image_changed(int, int, int, int) const fractfunc.h:58 (fract4dc.cpython-38-darwin.so:x86_64+0x3760f)
    #2 fractFunc::reset_progress(float) fractfunc.cpp:129 (fract4dc.cpython-38-darwin.so:x86_64+0x39f7d)
    #3 fractFunc::draw(int, int, float, float) fractfunc.cpp:246 (fract4dc.cpython-38-darwin.so:x86_64+0x3dd17)
    #4 fractFunc::draw_all() fractfunc.cpp:170 (fract4dc.cpython-38-darwin.so:x86_64+0x3b65f)
    #5 calc calcfunc.cpp:38 (fract4dc.cpython-38-darwin.so:x86_64+0x34869)
    #6 calculation_thread(calc_args*) calcs.cpp:85 (fract4dc.cpython-38-darwin.so:x86_64+0x13de5)
    #7 decltype(std::__1::forward<void* (*)(calc_args*)>(fp)(std::__1::forward<calc_args*>(fp0))) std::__1::__invoke<void* (*)(calc_args*), calc_args*>(void* (*&&)(calc_args*), calc_args*&&) type_traits:3530 (fract4dc.cpython-38-darwin.so:x86_64+0x178af)
    #8 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*>&, std::__1::__tuple_indices<2ul>) thread:277 (fract4dc.cpython-38-darwin.so:x86_64+0x17388)
    #9 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*> >(void*) thread:287 (fract4dc.cpython-38-darwin.so:x86_64+0x149b0)

  Previous write of size 1 at 0x7b28000172b4 by main thread:
    [failed to restore the stack]

  Location is heap block of size 152 at 0x7b28000172a0 allocated by main thread:
    #0 operator new(unsigned long) <null>:5353440 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x5492b)
    #1 sites::pyfdsite_create(_object*, _object*) sites.cpp:18 (fract4dc.cpython-38-darwin.so:x86_64+0xacd0)
    #2 pyfdsite_create(_object*, _object*) fract4dmodule.cpp:128 (fract4dc.cpython-38-darwin.so:x86_64+0x35c0)
    #3 cfunction_call_varargs <null>:5353440 (Python:x86_64+0x1a555)

  Thread T9 (tid=3141054, running) created by main thread at:
    #0 pthread_create <null>:5353488 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
    #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x14806)
    #2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x14239)
    #3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13f1a)
    #4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x11a4f)
    #5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x39d0)
    #6 cfunction_call_varargs <null>:5353488 (Python:x86_64+0x1a4c1)

SUMMARY: ThreadSanitizer: data race site.cpp:62 in FDSite::image_changed(int, int, int, int)

And finally I have this last one identified by threadsanitizer which I run into when closing the program:

==================
WARNING: ThreadSanitizer: thread leak (pid=69316)
  Thread T4 (tid=3146887, finished) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
    #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x146f6)
    #2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x14129)
    #3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13e0a)
    #4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x1193f)
    #5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x38c0)
    #6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)

  And 1 more similar thread leaks.

SUMMARY: ThreadSanitizer: thread leak __threading_support:336 in std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*)
==================

I haven't figured out this one yet

mindhells · 2020-07-17T08:53:06Z

Btw I had 3 in a row pushes passing the tests, that's encouraging in regard of #73

edyoung · 2020-07-18T01:49:35Z

fract4d/c/model/site.cpp

@@ -96,6 +93,7 @@ void FDSite::status_changed(int status_val)
 bool FDSite::is_interrupted()
 {
    //fprintf(stderr,"int: %d\n",interrupted);
+    const std::lock_guard<std::mutex> lock(interrupt_lock);


don't quite understand why you need a mutex for the interrupted var. We're just checking the current value of a bool. Nothing bad happens if we don't detect that we were interrupted until the next time we check.

I think it’s because of the volatile qualifier that threadsanitizer complains.
Didn’t want to remove the qualifier, I’m not sure I’m missing something about the reason to have it. But maybe it makes sense to just remove it.

edyoung · 2020-07-18T01:54:08Z

fract4d/c/fract4dc/calcargs.cpp

 void calc_args::set_cmap(PyObject *pycmap_)
 {
    pycmap = pycmap_;
    cmap = colormaps::cmap_fromcapsule(pycmap);
+    const std::lock_guard<std::mutex> lock(ref_count_mutex);


I feel like the ideal pattern would be if only the main thread ever dealt with Python refcounts and all the calc threads could ignore it. But haven't thought much about how complicated that would be.

I’ve been playing around with GIL acquisition to try to address the last threadsanitizer report. No luck so far.
I guess the GTK signals triggering the calculation process out of the interpreter thread had something to do.

interrupted should most probably be std::atomic<> and that makes the mutex and data race go away. Volatile is not the right language feature in this case as it doesn't mean thread safe and only causes the compiler to always emit read instructions when the variable is referenced, it doesn't make the compiler ensure cache cohearency or atomicity.

edyoung

Thanks a ton. I have some open-ended comments but don't feel they are must-fix

mindhells · 2020-07-18T11:30:48Z

You are very welcome. Appreciate so much your feedback.
Don’t hesitate to leave any comments open-ended or not.

mindhells · 2020-07-19T16:57:50Z

I've just made interrupted std::atomic and removed the mutex. Threadsanitizer warning is gone as @DX-MON predicted.
I'd like to keep playing with GIL and find a more robust solution for the ref counting in calc_args. I'll do it in a different PR.

dragonmux approved these changes Jul 13, 2020

View reviewed changes

mindhells requested review from cjmayo and edyoung July 14, 2020 09:24

cjmayo reviewed Jul 14, 2020

View reviewed changes

mindhells force-pushed the fract4dc-sanitize branch from 723b4fe to 667fa16 Compare July 15, 2020 12:12

dragonmux approved these changes Jul 15, 2020

View reviewed changes

fract4d/c/fract4dc/calcs.cpp Outdated Show resolved Hide resolved

mindhells force-pushed the fract4dc-sanitize branch from 667fa16 to 8c9c196 Compare July 15, 2020 13:54

edyoung reviewed Jul 18, 2020

View reviewed changes

edyoung approved these changes Jul 18, 2020

View reviewed changes

mindhells added 5 commits July 19, 2020 17:53

fix heap-use-after-free

9ba75d1

fix data race on calc_args ref counting management (

d90ee81

fix data race on site interrupted flag

28a2ab8

fix debug prints

03c379c

use std::atomic instead volatile and mutex for site interrupted flag

9417883

mindhells force-pushed the fract4dc-sanitize branch from 7b20f04 to 9417883 Compare July 19, 2020 16:53

mindhells marked this pull request as ready for review July 19, 2020 16:58

mindhells mentioned this pull request Jul 19, 2020

Release 4.3 #149

Merged

edyoung merged commit 50642b2 into fract4d:master Jul 19, 2020

mindhells deleted the fract4dc-sanitize branch August 7, 2020 08:45

mindhells mentioned this pull request Aug 17, 2020

Hold the GIL while deleting python objects #166

Merged

mindhells mentioned this pull request Oct 22, 2020

Freezing while drawing the initial fractal #177

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fract4dc sanitize #154

fract4dc sanitize #154

mindhells commented Jul 13, 2020

dragonmux commented Jul 13, 2020

mindhells commented Jul 14, 2020 •

edited

Loading

cjmayo left a comment

dragonmux commented Jul 14, 2020

mindhells commented Jul 15, 2020

mindhells commented Jul 16, 2020

dragonmux commented Jul 16, 2020

mindhells commented Jul 17, 2020

mindhells commented Jul 17, 2020

edyoung Jul 18, 2020 •

edited

Loading

mindhells Jul 18, 2020

edyoung Jul 18, 2020

mindhells Jul 18, 2020

dragonmux Jul 18, 2020

edyoung left a comment

mindhells commented Jul 18, 2020

mindhells commented Jul 19, 2020

fract4dc sanitize #154

fract4dc sanitize #154

Conversation

mindhells commented Jul 13, 2020

dragonmux commented Jul 13, 2020

mindhells commented Jul 14, 2020 • edited Loading

cjmayo left a comment

Choose a reason for hiding this comment

dragonmux commented Jul 14, 2020

mindhells commented Jul 15, 2020

mindhells commented Jul 16, 2020

dragonmux commented Jul 16, 2020

mindhells commented Jul 17, 2020

mindhells commented Jul 17, 2020

edyoung Jul 18, 2020 • edited Loading

Choose a reason for hiding this comment

mindhells Jul 18, 2020

Choose a reason for hiding this comment

edyoung Jul 18, 2020

Choose a reason for hiding this comment

mindhells Jul 18, 2020

Choose a reason for hiding this comment

dragonmux Jul 18, 2020

Choose a reason for hiding this comment

edyoung left a comment

Choose a reason for hiding this comment

mindhells commented Jul 18, 2020

mindhells commented Jul 19, 2020

mindhells commented Jul 14, 2020 •

edited

Loading

edyoung Jul 18, 2020 •

edited

Loading