Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible miscompilation of Boost.Test 1.65.1 with NDK r15c on armeabi-v7a #542

Closed
b-spencer opened this issue Oct 12, 2017 · 17 comments
Closed
Assignees
Labels

Comments

@b-spencer
Copy link

Description

I've found what appears to be a miscompilation issue when using Android NDK r15c with out-of-the-box Boost.Test from boost-1.65.1 on Android ARM (32-bit) using clang. It looks like the void register_parameters( rt::parameters_store& store ) anonymous namespace function in boost/test/impl/unit_test_parameters.ipp ends up with multiple copies of its small string literals (such as "--") stored as PC-relative data within the function. Some such small strings are repeated many times in the function, and while some of them end up backed by the same storage in the compiled code, there seem to be multiple constant pools even within the function.

This would be fine, except it seems like the code in the Boost.Test boost::unit_test::basic_cstring that finds the end of the small string literals (for its m_end member, when constructed from a string literal) seems to get "hauled up" as a common subexpression by an optimization pass (this is a guess) and then the wrong m_end value gets supplied for some of the instances of "--" or other small repeated literals. So, the code ends up being compiled such that an expression likebasic_cstring("--")gets compiled with m_begin set to the first byte of one copy of "--" and m_end set to the '\0' byte of another copy of "--". Oops. Sometimes this ends up setting m_end to a position in memory before m_begin. It may be easier to understand when looking at it live in the debugger.

I did notice some code in basic_cstring that seems like it's prone to undefined behaviour (some iffy pointer arithmetic that allows pointers to move outside of the allocated regions and perhaps expects a pointer plus size_t to wrap around), but I don't think any of that is actually happening, and I tried rewriting those functions with safer implementations and it made no difference.

Oddly, one thing that does make a difference is removing the basic_cstring copy constructor implementation and replacing it with an = default in-class definition. This seems to result in significantly different code generated and avoids the problem.

Thankfully, this is easy to reproduce with out-of-the-box NDK r15c and boost-1.65.1. I've prepared a small repository that demonstrates the error:

https://github.com/b-spencer/boost-test-vs-android-ndk-bug

Here's an example of running the scripts in that repository (built on Linux and run on an actual Android device):

$ ./0-prep.sh
--2017-10-12 15:04:11--  https://dl.bintray.com/boostorg/release/1.65.1/source/boost_1_65_1.tar.bz2
...
$ ./1-build.sh
[armeabi-v7a] Compile++ thumb: program <= compiler_log_formatter.cpp
[armeabi-v7a] Compile++ thumb: program <= debug.cpp
[armeabi-v7a] Compile++ thumb: program <= decorator.cpp
[armeabi-v7a] Compile++ thumb: program <= execution_monitor.cpp
[armeabi-v7a] Compile++ thumb: program <= framework.cpp
[armeabi-v7a] Compile++ thumb: program <= junit_log_formatter.cpp
[armeabi-v7a] Compile++ thumb: program <= plain_report_formatter.cpp
[armeabi-v7a] Compile++ thumb: program <= progress_monitor.cpp
[armeabi-v7a] Compile++ thumb: program <= results_collector.cpp
[armeabi-v7a] Compile++ thumb: program <= results_reporter.cpp
[armeabi-v7a] Compile++ thumb: program <= test_framework_init_observer.cpp
[armeabi-v7a] Compile++ thumb: program <= test_tools.cpp
[armeabi-v7a] Compile++ thumb: program <= test_tree.cpp
[armeabi-v7a] Compile++ thumb: program <= unit_test_log.cpp
[armeabi-v7a] Compile++ thumb: program <= unit_test_main.cpp
[armeabi-v7a] Compile++ thumb: program <= unit_test_monitor.cpp
[armeabi-v7a] Compile++ thumb: program <= unit_test_parameters.cpp
[armeabi-v7a] Compile++ thumb: program <= xml_log_formatter.cpp
[armeabi-v7a] Compile++ thumb: program <= xml_report_formatter.cpp
[armeabi-v7a] Compile++ thumb: program <= test.cpp
[armeabi-v7a] Prebuilt       : libc++_shared.so <= <NDK>/sources/cxx-stl/llvm-libc++/libs/armeabi-v7a/
[armeabi-v7a] Install        : libc++_shared.so => libs/armeabi-v7a/libc++_shared.so
[armeabi-v7a] Executable     : program
[armeabi-v7a] Install        : program => libs/armeabi-v7a/program
$ ./2-run.sh
libs/armeabi-v7a/libc++_shared.so: 1 file pushed. 0.9 MB/s (677448 bytes in 0.721s)
libs/armeabi-v7a/program: 1 file pushed. 1.1 MB/s (321108 bytes in 0.269s)
2 files pushed. 0.7 MB/s (998556 bytes in 1.285s)
Boost.Test framework internal error: unknown reason

The last line is an unexpected error caused by an exception being thrown from code that tries to validate the contents of some of the strings that are mal-constructed. It so happens that the exception is thrown and caught before the mismatched pointers crash the program.

This does not seem to happen with android-ndk-r13b (also using clang), which is the only other version I tried.

If this issue is more appropriately directed to clang itself (or Boost), please let me know.

Thanks.

Environment Details

Not all of these will be relevant to every bug, but please provide as much
information as you can.

  • NDK Version: 15.2.4203891 (r15c)
  • Build sytem: ndk-build
  • Host OS: Linux (Debian 9)
  • Compiler: clang
  • ABI: armeabi-v7a
  • STL: c++_shared
  • NDK API level: Tried 19, 24, and 26
  • Device API level: 24
@DanAlbert
Copy link
Member

@stephenhines: fyi, this does behave differently at -O0

libs/armeabi-v7a/libc++_shared.so: 1 file pushed. 9.1 MB/s (636520 bytes in 0.066s)
libs/armeabi-v7a/program: 1 file pushed. 6.9 MB/s (669272 bytes in 0.093s)
2 files pushed. 7.4 MB/s (1305792 bytes in 0.168s)
Test setup error: test tree is empty

Given that there are no tests, I'm guessing that's the expected error.

@stephenhines
Copy link
Collaborator

Yi is going to take a closer look at this.

@kongy
Copy link
Collaborator

kongy commented Oct 13, 2017

Building unit_test_parameters.o with ToT clang fixes the error.

@b-spencer
Copy link
Author

Test setup error: test tree is empty

It seems I forgot to state what the expected behaviour was, but yes, that's it. If you need me to pinpoint where the error is with some details from gdb, I can do that, but it looks like you're reproduced it.

Thanks.

@kongy
Copy link
Collaborator

kongy commented Oct 14, 2017

#define BOOST_TEST_MAIN
#include <boost/test/unit_test.hpp>

BOOST_AUTO_TEST_CASE( empty_test ) {
    BOOST_TEST( true );
}

Adding an empty test makes this more obvious. I will bisect the upstream change that fixed the issue.

@b-spencer
Copy link
Author

Just curious: Is it a matter of waiting for r16 for a fix? Thanks!

@DanAlbert
Copy link
Member

I'm testing a Clang update for the NDK now. I don't know if Yi (from the compiler team) ever got a chance to look at this. Most of the time when we run into issues like this the bug has already been fixed upstream and all we need to do is update, so hopefully it's fixed.

I'll post an update when the canary build has a new compiler and you can try it out.

@b-spencer
Copy link
Author

It appears that r16 does not fix this.

@DanAlbert
Copy link
Member

The clang update was not ready in time for r16.

@pirama-arumuga-nainar
Copy link
Collaborator

It's not clear if it is fixed in current AOSP Clang (which will presumably go to r17). @kongy you should test this for the current AOSP clang and also on the update you are working on.

@kongy
Copy link
Collaborator

kongy commented Nov 15, 2017

Current aosp Clang has already fixed this issue.

@DanAlbert
Copy link
Member

Referring to clang-4393122, right? If so, this is fixed in the canary builds of r17.

@b-spencer
Copy link
Author

b-spencer commented Nov 15, 2017

Actually, it looks like NDK r16 is possibly miscompiling other far-away code. I've got some code here that is assigning a short std::string's 2-byte "\x80\x80" data to register r6 (as 0x8080 in the low bytes of the register) and then trying to use it when inline-constructing multiple strings. But after using r6 to construct the first string, an exception is raised and caught, and the value of r6 is not preserved for the remainder of the call. So the next 2-byte std::string is initialized with whatever arbitrary values are sitting in the bottom of r6. I'll try to work up a separate report, but it may take some time to narrow it down enough to post.

@b-spencer
Copy link
Author

b-spencer commented Nov 17, 2017

Separate bug report coming soon for r16. Seems there are some issues with -Oz. (#573)

@DanAlbert
Copy link
Member

Confirmed fixed in master. Will be in r17.

@rprichard
Copy link
Collaborator

FWIW: Given this description:

It looks like the void register_parameters( rt::parameters_store& store ) anonymous namespace function in boost/test/impl/unit_test_parameters.ipp ends up with multiple copies of its small string literals (such as "--") stored as PC-relative data within the function. Some such small strings are repeated many times in the function, and while some of them end up backed by the same storage in the compiled code, there seem to be multiple constant pools even within the function.

This looks like a duplicate of #642 (which is fixed in the r17 compiler by leaving an optimization off by default). The simplest workaround is -mllvm -arm-promote-constant=0. The upstream bug is https://bugs.llvm.org/show_bug.cgi?id=32780.

@b-spencer
Copy link
Author

I can confirm that adding -mllvm -arm-promote-constant=0 with r16b prevents the original problem that lead to this bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants