Skip to content

Conversation

@dkl
Copy link
Contributor

@dkl dkl commented Dec 8, 2020

If a unit test throws an exception, usually the test framework will catch it cleanly. However, for multi-threaded tests, the test code needs to catch exceptions explicitly, because apparently the Boost Unit Test Framework is not thread-safe.

Here I've seen the multithreaded_async_pipe unit test triggering multiple exceptions in parallel (because the async_pipe constructor failed), which resulted in multiple parallel std::terminate() calls. Then each std::terminate() triggered a SIGABRT for that thread, causing each thread to invoke the Boost Unit Test Framework's signal handler (boost_execution_monitor_jumping_signal_handler()), which however is not thread-safe (doing the same longjmp() in all threads, resulting in stack corruption, SIGSEGV, sometimes even hangs).

As solution, I think using std::future should work – it supports transporting the exceptions from the thread to the parent and will re-throw them there (but note that obviously we'll only see the 1st, not all exceptions from all threads).

Although I've only had problems with the multithreaded_async_pipe test, and that was essentially only triggered by coincidence (it tries to create N_cores * 100 pipes, which fails here due to hitting the maximum number of open files soft-limit (ulimit -n, 1024)), I still think it's worth it to fix the test suite behaviour in this case, especially because it sometimes ended up hanging to the stack smashing. Thus I've also converted all other unit tests from std::thread to std::future + std::async.

If a unit test throws an exception, usually the test framework will
catch it cleanly. However, for multi-threaded tests, the test code needs
to catch exceptions explicitly, because apparently the Boost Unit Test
Framework is not thread-safe.

Here I've seen the multithreaded_async_pipe unit test triggering
multiple exceptions in parallel (because the async_pipe constructor
failed), which resulted in multiple parallel std::terminate() calls.
Then each std::terminate() triggered a SIGABRT for that thread,
causing each thread to invoke the Boost Unit Test Framework's signal
handler (boost_execution_monitor_jumping_signal_handler()),
which however is not thread-safe (doing the same longjmp() in all
threads, resulting in stack corruption, SIGSEGV, sometimes even hangs).

As solution, I think using std::future should work – it supports
transporting the exceptions from the thread to the parent and will
re-throw them there (but note that obviously we'll only see the 1st,
not all exceptions from all threads).

Although I've only had problems with the multithreaded_async_pipe test,
and that was essentially only triggered by coincidence (it tries to
create N_cores * 100 pipes, which fails here due to hitting the maximum
number of open files soft-limit (ulimit -n, 1024)), I still think it's
worth it to fix the test suite behaviour in this case, especially
because it sometimes ended up hanging to the stack smashing.
Thus I've also converted all other unit tests from std::thread to
std::future + std::async.
@klemens-morgenstern
Copy link
Collaborator

What is this addressing? A false negative or just a crash on test failure?

@dkl
Copy link
Contributor Author

dkl commented Oct 14, 2021

The main problem is undefined behaviour in the test suite, in case of multi-threading and exceptions. It could cause the test suite to randomly pass, fail, print garbage messages, crash, or even hang. Obviously that's not great for debugging, regardless of whether the actual error that has happened is a false-positive or real bug. In this patch I tried to fix all potential cases which could run into that problem, to ensure the test suite has consistent behaviour.

@klemens-morgenstern
Copy link
Collaborator

I think adding noexcept to the threads is better, causing a reliable std::terminate .

@dkl
Copy link
Contributor Author

dkl commented Oct 15, 2021

Interesting idea; if it works, it should be applied to all the threads in the test suite, not only in test/async.cpp. (thinking of test/async_pipe.cpp, which is where I saw the problem)

However I'm not sure about std::terminate. As mentioned in the description, multiple parallel std::terminate was part of the problem (something weird about it using abort()/SIGABRT internally, causing it to invoke the test framework's non-thread-safe signal handler...). I tried std::future with std::launch::async specifically to avoid the multiple parallel std::terminate calls. (std::thread::~thread calls std::terminate if the thread is still running during stack unwinding, while std::future from std::launch::async waits for the thread to finish)

klemens-morgenstern added a commit that referenced this pull request May 18, 2022
klemens-morgenstern added a commit that referenced this pull request May 19, 2022
Squash after invalid branch & merge conflict.

* Fixed file_descriptor move assignment operator to return a reference to 'this'. Issue # 219

* Returning *this instead of erroneous *this. Issue # 219

* Removed unneeded WNOHANG.

* Closes #190

* Closes #121

* Attempting to fix wchar_t build error on circle.

* Closes #197.

* Changed child(pid_t) signature.

* Multiple fixes.

* Closes #189.

* Closes #191.

* Added missing work guard on windows.

* Trying to catch windows early complete.

* Increased log level on windows.

* Multiple windows test fixes

* Removed overly constraint tests.

* fix missing headers

* Closes klemens-morgenstern/boost-process#218

* Update executor.hpp

explicit cast to int to silence this: `error: non-constant-expression cannot be narrowed from type 'unsigned long' to 'int' in initializer list [-Wc++11-narrowing]`

* Fix posix implementation of move constructor/assignment in file_descriptor

* Adjust docs `@boost` relative paths

* Fixed UB for large environment names.

* Closes #207.

* Drone setup

* Added include for filesystem::fstream.

* Disabled useless tests.

* Fixed environment length checks.

* Pipe test & warning fixes.

* Disabled warnings & added windows include fix.

* More test fixes.

* Removed some tests from apple build.

* Removed some tests from apple build.

* Disabled OSX tests via build script & fixed windows examples.

* TSA fix attempt.

Co-authored-by: James Baker <[email protected]>
Co-authored-by: silent <[email protected]>
Co-authored-by: ikrijan <[email protected]>
Co-authored-by: Shauren <[email protected]>
Co-authored-by: alandefreitas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants