Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fract4dc sanitize #154

Merged
merged 5 commits into from
Jul 19, 2020
Merged

fract4dc sanitize #154

merged 5 commits into from
Jul 19, 2020

Conversation

mindhells
Copy link
Member

Address issues detected with UBSAN ASAN TSAN

I've already managed to find one: #73 (comment)

Having #133 would help thanks to Menson ability to add sanitizers out of the box

@dragonmux
Copy link
Member

That's a pretty nice way to solve the problem.. there's a certain elegance to that vs starting threads paused

@mindhells
Copy link
Member Author

mindhells commented Jul 14, 2020

That's a pretty nice way to solve the problem.. there's a certain elegance to that vs starting threads paused

There's a 3rd approach: using pthread_self inside calculation_thread and setting site->set_tid(0) before creating the thread.

I was wondering about the possibility of site->wait being called after pthread_create and before site->set_tid(tid). Since the site object is reused by fractal.py that seems to be a possibility but I assume we are protected by the python global interpreter lock, are we?

gtkfracal.py connects to Gtk.DrawingArea to trigger the fract4c.calc, can we guarantee GIL is not released by the GTK main loop?, as stated in the manual:

One of the weirder parts of the code is how we deal with threading. Basically we want the calculation of the fractal to happen on a different thread (or multiple threads for SMP) from the main UI, so you can interrupt at any point. This is complicated by the fact that Python only allows a single thread in the Global Interpreter Lock, and that PyGTK is often compiled by Linux distribution vendors without thread support, meaning this lock is not released when running the GTK main loop. (This last is probably nottrue any more, but I haven’t checked.)

@mindhells mindhells requested review from cjmayo and edyoung July 14, 2020 09:24
Copy link
Collaborator

@cjmayo cjmayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really have no clue about the clever stuff! I have applied the patch and run the tests locally and got all to pass - still some segfaults though (I'll see if I get a chance to collect some data).

@dragonmux
Copy link
Member

This feels like something that would benefit from std::thread and I agree that calling set_tid() in the thread itself would be a far better idea as that removes all data races from that code path

@mindhells
Copy link
Member Author

This feels like something that would benefit from std::thread and I agree that calling set_tid() in the thread itself would be a far better idea as that removes all data races from that code path

Tried with std::thread and code looks neater. Not sure I'm doing the canonical way though.
On the other hand, if there's consensus about using this approach, we might need to change threadpool as well, to keep consistency.

fract4d/c/fract4dc/calcs.cpp Outdated Show resolved Hide resolved
@mindhells
Copy link
Member Author

I have this interesting output from the thread sanitizer:

SUMMARY: ThreadSanitizer: data race object.h:459 in _Py_INCREF(_object*)
==================
==================
WARNING: ThreadSanitizer: data race (pid=67153)
  Write of size 8 at 0x0001153544e0 by main thread:
    #0 _Py_INCREF(_object*) object.h:459 (fract4dc.cpython-38-darwin.so:x86_64+0x2c0b1)
    #1 _Py_XINCREF(_object*) object.h:532 (fract4dc.cpython-38-darwin.so:x86_64+0x2b9eb)
    #2 calc_args::set_cmap(_object*) calcargs.cpp:24 (fract4dc.cpython-38-darwin.so:x86_64+0x2b979)
    #3 parse_calc_args(_object*, _object*) calcs.cpp:172 (fract4dc.cpython-38-darwin.so:x86_64+0x12c31)
    #4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:41 (fract4dc.cpython-38-darwin.so:x86_64+0x10ef1)
    #5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x34d0)
    #6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)

  Previous write of size 8 at 0x0001153544e0 by thread T11:
    #0 _Py_DECREF(char const*, int, _object*) object.h:470 (fract4dc.cpython-38-darwin.so:x86_64+0x2c168)
    #1 _Py_XDECREF(_object*) object.h:541 (fract4dc.cpython-38-darwin.so:x86_64+0x2bf77)
    #2 calc_args::~calc_args() calcargs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x2bea0)
    #3 calc_args::~calc_args() calcargs.cpp:48 (fract4dc.cpython-38-darwin.so:x86_64+0x2c018)
    #4 calculation_thread(calc_args*) calcs.cpp:97 (fract4dc.cpython-38-darwin.so:x86_64+0x13947)
    #5 decltype(std::__1::forward<void* (*)(calc_args*)>(fp)(std::__1::forward<calc_args*>(fp0))) std::__1::__invoke<void* (*)(calc_args*), calc_args*>(void* (*&&)(calc_args*), calc_args*&&) type_traits:3530 (fract4dc.cpython-38-darwin.so:x86_64+0x173af)
    #6 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*>&, std::__1::__tuple_indices<2ul>) thread:277 (fract4dc.cpython-38-darwin.so:x86_64+0x16e88)
    #7 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*> >(void*) thread:287 (fract4dc.cpython-38-darwin.so:x86_64+0x144b0)

  Thread T11 (tid=3087829, finished) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
    #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x14306)
    #2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x13d39)
    #3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13a1a)
    #4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x1154f)
    #5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x34d0)
    #6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)

SUMMARY: ThreadSanitizer: data race object.h:459 in _Py_INCREF(_object*)
==================

It's complaining because we're increasing and decreasing the ref count of the same pycmap object in two different threads: the main one and the one created for the calculation. That happens, I guess, because another call to pycalc came in, before the thread created in a previous call has ended.

@dragonmux
Copy link
Member

more importantly, it's complaining because the decref is in the destructor, meaning the object lifetime finishes and at the same time, the object is incref'd.. this feels like maybe there's a mutex missing around the calc_args object?

@mindhells
Copy link
Member Author

more importantly, it's complaining because the decref is in the destructor, meaning the object lifetime finishes and at the same time, the object is incref'd.. this feels like maybe there's a mutex missing around the calc_args object?

Actually there was a missing mutex in there. I had to use a static mutex because it's not the calc_args instance itself causing the problem but multiple instances of calc_args increasing and decreasing the ref_count of the same instance of cmap, pfo, image and site.

I also got this fixed (hopefully):

WARNING: ThreadSanitizer: data race (pid=69030)
  Read of size 1 at 0x7b28000172b4 by thread T9:
    #0 FDSite::image_changed(int, int, int, int) site.cpp:62 (fract4dc.cpython-38-darwin.so:x86_64+0x43d66)
    #1 fractFunc::image_changed(int, int, int, int) const fractfunc.h:58 (fract4dc.cpython-38-darwin.so:x86_64+0x3760f)
    #2 fractFunc::reset_progress(float) fractfunc.cpp:129 (fract4dc.cpython-38-darwin.so:x86_64+0x39f7d)
    #3 fractFunc::draw(int, int, float, float) fractfunc.cpp:246 (fract4dc.cpython-38-darwin.so:x86_64+0x3dd17)
    #4 fractFunc::draw_all() fractfunc.cpp:170 (fract4dc.cpython-38-darwin.so:x86_64+0x3b65f)
    #5 calc calcfunc.cpp:38 (fract4dc.cpython-38-darwin.so:x86_64+0x34869)
    #6 calculation_thread(calc_args*) calcs.cpp:85 (fract4dc.cpython-38-darwin.so:x86_64+0x13de5)
    #7 decltype(std::__1::forward<void* (*)(calc_args*)>(fp)(std::__1::forward<calc_args*>(fp0))) std::__1::__invoke<void* (*)(calc_args*), calc_args*>(void* (*&&)(calc_args*), calc_args*&&) type_traits:3530 (fract4dc.cpython-38-darwin.so:x86_64+0x178af)
    #8 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*>&, std::__1::__tuple_indices<2ul>) thread:277 (fract4dc.cpython-38-darwin.so:x86_64+0x17388)
    #9 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void* (*)(calc_args*), calc_args*> >(void*) thread:287 (fract4dc.cpython-38-darwin.so:x86_64+0x149b0)

  Previous write of size 1 at 0x7b28000172b4 by main thread:
    [failed to restore the stack]

  Location is heap block of size 152 at 0x7b28000172a0 allocated by main thread:
    #0 operator new(unsigned long) <null>:5353440 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x5492b)
    #1 sites::pyfdsite_create(_object*, _object*) sites.cpp:18 (fract4dc.cpython-38-darwin.so:x86_64+0xacd0)
    #2 pyfdsite_create(_object*, _object*) fract4dmodule.cpp:128 (fract4dc.cpython-38-darwin.so:x86_64+0x35c0)
    #3 cfunction_call_varargs <null>:5353440 (Python:x86_64+0x1a555)

  Thread T9 (tid=3141054, running) created by main thread at:
    #0 pthread_create <null>:5353488 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
    #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x14806)
    #2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x14239)
    #3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13f1a)
    #4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x11a4f)
    #5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x39d0)
    #6 cfunction_call_varargs <null>:5353488 (Python:x86_64+0x1a4c1)

SUMMARY: ThreadSanitizer: data race site.cpp:62 in FDSite::image_changed(int, int, int, int)

And finally I have this last one identified by threadsanitizer which I run into when closing the program:

==================
WARNING: ThreadSanitizer: thread leak (pid=69316)
  Thread T4 (tid=3146887, finished) created by main thread at:
    #0 pthread_create <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x8aad)
    #1 std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:336 (fract4dc.cpython-38-darwin.so:x86_64+0x146f6)
    #2 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:303 (fract4dc.cpython-38-darwin.so:x86_64+0x14129)
    #3 std::__1::thread::thread<void* (&)(calc_args*), calc_args*&, void>(void* (&)(calc_args*), calc_args*&) thread:295 (fract4dc.cpython-38-darwin.so:x86_64+0x13e0a)
    #4 calcs::pycalc(_object*, _object*, _object*) calcs.cpp:53 (fract4dc.cpython-38-darwin.so:x86_64+0x1193f)
    #5 pycalc(_object*, _object*, _object*) fract4dmodule.cpp:147 (fract4dc.cpython-38-darwin.so:x86_64+0x38c0)
    #6 cfunction_call_varargs <null> (Python:x86_64+0x1a4c1)

  And 1 more similar thread leaks.

SUMMARY: ThreadSanitizer: thread leak __threading_support:336 in std::__1::__libcpp_thread_create(_opaque_pthread_t**, void* (*)(void*), void*)
==================

I haven't figured out this one yet

@mindhells
Copy link
Member Author

Btw I had 3 in a row pushes passing the tests, that's encouraging in regard of #73

@@ -96,6 +93,7 @@ void FDSite::status_changed(int status_val)
bool FDSite::is_interrupted()
{
//fprintf(stderr,"int: %d\n",interrupted);
const std::lock_guard<std::mutex> lock(interrupt_lock);
Copy link
Member

@edyoung edyoung Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't quite understand why you need a mutex for the interrupted var. We're just checking the current value of a bool. Nothing bad happens if we don't detect that we were interrupted until the next time we check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s because of the volatile qualifier that threadsanitizer complains.
Didn’t want to remove the qualifier, I’m not sure I’m missing something about the reason to have it. But maybe it makes sense to just remove it.

void calc_args::set_cmap(PyObject *pycmap_)
{
pycmap = pycmap_;
cmap = colormaps::cmap_fromcapsule(pycmap);
const std::lock_guard<std::mutex> lock(ref_count_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the ideal pattern would be if only the main thread ever dealt with Python refcounts and all the calc threads could ignore it. But haven't thought much about how complicated that would be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve been playing around with GIL acquisition to try to address the last threadsanitizer report. No luck so far.
I guess the GTK signals triggering the calculation process out of the interpreter thread had something to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interrupted should most probably be std::atomic<> and that makes the mutex and data race go away. Volatile is not the right language feature in this case as it doesn't mean thread safe and only causes the compiler to always emit read instructions when the variable is referenced, it doesn't make the compiler ensure cache cohearency or atomicity.

Copy link
Member

@edyoung edyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton. I have some open-ended comments but don't feel they are must-fix

@mindhells
Copy link
Member Author

You are very welcome. Appreciate so much your feedback.
Don’t hesitate to leave any comments open-ended or not.

@mindhells
Copy link
Member Author

I've just made interrupted std::atomic and removed the mutex. Threadsanitizer warning is gone as @DX-MON predicted.
I'd like to keep playing with GIL and find a more robust solution for the ref counting in calc_args. I'll do it in a different PR.

@mindhells mindhells marked this pull request as ready for review July 19, 2020 16:58
@mindhells mindhells mentioned this pull request Jul 19, 2020
@edyoung edyoung merged commit 50642b2 into fract4d:master Jul 19, 2020
@mindhells mindhells deleted the fract4dc-sanitize branch August 7, 2020 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants