Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug in NDK r16 #579

Closed
planethcom opened this issue Nov 27, 2017 · 8 comments
Closed

Potential bug in NDK r16 #579

planethcom opened this issue Nov 27, 2017 · 8 comments

Comments

@planethcom
Copy link

Hi
We've integrated Ableton Link in all our apps since a few months in our apps.
Ableton Link is a technology that synchronizes beat, phase and tempo of Ableton Live and Link-enabled apps over a wireless network.
The initial integration was done in April this year with NDK r14. The current live version is compiled with NDK r15c.

The integration is based on the Ableton Link cross platform library ( https://github.com/Ableton/link ) and partially on Peter Brinkmann’s Ableton Link for PD Android example (https://github.com/libpd/abl_link ).
Ableton Link uses ifaddrs for the communication over WLAN : https://github.com/libpd/abl_link/tree/master/external/android-ifaddrs

Last week drove some first tests with NDK r16.
Compiling with NDK r16 worked without any problems.

But the release builds (not the debug builds) ran into crashes on ARMv7 devices as soon as Ableton Link starts to communicate with other peers over the wireless network.
All devices with ARMv7 CPU architecture ran into these crashes, independent of the installed Android version. I’ve tested it on Android 5.x, 6.x and 7.x (devices: Nexus 7 2013, Samsung Galaxy S5, Nexus 6). I don’t know how it is on Android 8.x, as we only have ARM64v8a devices running on the latest OS. But I’m pretty sure that the behavior will be the same.

Devices with ARM64v8a CPU architecture work without any problems (tested on Android 7.x and 8.x).
When compiling the same code with NDK r15c, all CPU architectures work perfectly fine.

Here's the log output of the r16 crashes:

11-23 08:00:30.531 4396-4438/? A/libc: Fatal signal 7 (SIGBUS), code 1, fault addr 0x449d in tid 4438 (neth.linktomidi)
11-23 08:00:30.588 190-190/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
11-23 08:00:30.588 190-190/? A/DEBUG: Build fingerprint: 'google/razor/flo:6.0.1/MOB30X/3036618:user/release-keys'
11-23 08:00:30.588 190-190/? A/DEBUG: Revision: '0'
11-23 08:00:30.588 190-190/? A/DEBUG: ABI: 'arm'
11-23 08:00:30.588 190-190/? A/DEBUG: pid: 4396, tid: 4438, name: neth.linktomidi >>> com.planeth.linktomidi <<<
11-23 08:00:30.588 190-190/? A/DEBUG: signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0x449d
11-23 08:00:30.612 190-190/? A/DEBUG: r0 b6c994b5 r1 0000449d r2 a03fce3c r3 a03fce8e
11-23 08:00:30.612 190-190/? A/DEBUG: r4 00004499 r5 a0adb968 r6 d66ab45d r7 a0cf37c8
11-23 08:00:30.612 190-190/? A/DEBUG: r8 a03e6000 r9 b3abb3d8 sl a0aaff40 fp a0cf389c
11-23 08:00:30.612 190-190/? A/DEBUG: ip a0fdf804 sp a0cf37c0 lr a0f633cb pc a0f633e4 cpsr 000f0030
11-23 08:00:30.627 190-190/? A/DEBUG: backtrace:
11-23 08:00:30.627 190-190/? A/DEBUG: #00 pc 0006e3e4 /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so
11-23 08:00:30.627 190-190/? A/DEBUG: #1 pc 0006e3c7 /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so (_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EED2Ev+10)
11-23 08:00:30.628 190-190/? A/DEBUG: #2 pc 0007e34b /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so (ZNK7ableton4util16SafeAsyncHandlerINS_9platforms4asio6SocketILj512EE4ImplEEclIJRKSt10error_codeRKjEEEvDpOT+82)
11-23 08:00:30.628 190-190/? A/DEBUG: #3 pc 0007e24b /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so (_ZN4asio6detail27reactive_socket_recvfrom_opINS_17mutable_buffers_1ENS_2ip14basic_endpointINS3_3udpEEEN7ableton4util16SafeAsyncHandlerINS7_9platforms4asio6SocketILj512EE4ImplEEEE11do_completeEPvPNS0_19scheduler_operationERKSt10error_codej+90)
11-23 08:00:30.628 190-190/? A/DEBUG: #4 pc 00070d43 /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so (_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code+218)
11-23 08:00:30.628 190-190/? A/DEBUG: #5 pc 00070bbd /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so (_ZN4asio6detail9scheduler3runERSt10error_code+92)
11-23 08:00:30.628 190-190/? A/DEBUG: #6 pc 00070b35 /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so (_ZN4asio10io_context3runEv+32)
11-23 08:00:30.628 190-190/? A/DEBUG: #7 pc 000766d9 /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so (ZZN7ableton9platforms4asio7ContextINS0_5posix13ScanIpIfAddrsENS_4util7NullLogEEC1INS_4link10ControllerISt8functionIFvjEESB_IFvNS9_5TempoEEENS0_5linux5ClockILi1EEES7_E23UdpSendExceptionHandlerEEET_ENKUlRN4asio10io_contextESL_E_clESP_SL+38)
11-23 08:00:30.628 190-190/? A/DEBUG: #8 pc 000b53cf /data/app/com.planeth.linktomidi-1/lib/arm/libabl-link-facade.so
11-23 08:00:30.629 190-190/? A/DEBUG: #9 pc 0003f45f /system/lib/libc.so (_ZL15__pthread_startPv+30)
11-23 08:00:30.629 190-190/? A/DEBUG: #10 pc 00019b43 /system/lib/libc.so (__start_thread+6)
11-23 08:00:31.391 190-190/? A/DEBUG: Tombstone written to: /data/tombstones/tombstone_09
11-23 08:00:31.391 190-190/? E/DEBUG: AM write failed: Broken pipe


I did the same tests with Peter Brinkmann’s Ableton Link for PD Android example (https://github.com/libpd/abl_link ).
Compiling with NDK r15c: All devices work without problems.
Compiling with NDK r16: ARMv7 devices run into crashes, as soon as network traffic between the app and at least one other Ableton Link peer occurs.
The crash can, but usually does not occur right after joining the Ableton Link session.
It can be forced either by doing tempo changes on one of the other peers, or you can force it by turning the wireless LAN off and then on again (on the peer with the r16 compiled app).

Here's the log of the example app:

11-23 17:01:36.388 17870-17899/? A/libc: Fatal signal 7 (SIGBUS), code 1, fault addr 0xb510b375 in tid 17899 (r.abllinksample)
11-23 17:01:36.448 193-193/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
11-23 17:01:36.448 193-193/? A/DEBUG: Build fingerprint: 'google/razor/flo:6.0.1/MOB30X/3036618:user/release-keys'
11-23 17:01:36.448 193-193/? A/DEBUG: Revision: '0'
11-23 17:01:36.448 193-193/? A/DEBUG: ABI: 'arm'
11-23 17:01:36.449 193-193/? A/DEBUG: pid: 17870, tid: 17899, name: r.abllinksample >>> com.noisepages.nettoyeur.abllinksample <<<
11-23 17:01:36.449 193-193/? A/DEBUG: signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0xb510b375
11-23 17:01:36.476 193-193/? A/DEBUG: r0 b6cf34b5 r1 b510b375 r2 b38a683c r3 b38a688e
11-23 17:01:36.476 193-193/? A/DEBUG: r4 b510b371 r5 b3877768 r6 77f5e481 r7 a19c37c8
11-23 17:01:36.477 193-193/? A/DEBUG: r8 ab338d00 r9 b36ab368 sl ab338740 fp a19c389c
11-23 17:01:36.477 193-193/? A/DEBUG: ip aecb78c8 sp a19c37c0 lr aec3ef7b pc aec3ef94 cpsr 800f0030
11-23 17:01:36.493 193-193/? A/DEBUG: backtrace:
11-23 17:01:36.493 193-193/? A/DEBUG: #00 pc 0006df94 /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so
11-23 17:01:36.494 193-193/? A/DEBUG: #1 pc 0006df77 /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so (_ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EED2Ev+10)
11-23 17:01:36.494 193-193/? A/DEBUG: #2 pc 0007956b /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so (ZNK7ableton4util16SafeAsyncHandlerINS_9platforms4asio6SocketILj512EE4ImplEEclIJRKSt10error_codeRKjEEEvDpOT+82)
11-23 17:01:36.494 193-193/? A/DEBUG: #3 pc 0007946b /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so (_ZN4asio6detail27reactive_socket_recvfrom_opINS_17mutable_buffers_1ENS_2ip14basic_endpointINS3_3udpEEEN7ableton4util16SafeAsyncHandlerINS7_9platforms4asio6SocketILj512EE4ImplEEEE11do_completeEPvPNS0_19scheduler_operationERKSt10error_codej+90)
11-23 17:01:36.494 193-193/? A/DEBUG: #4 pc 0006da8f /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so (_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code+218)
11-23 17:01:36.495 193-193/? A/DEBUG: #5 pc 0006d8b1 /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so (_ZN4asio6detail9scheduler3runERSt10error_code+92)
11-23 17:01:36.495 193-193/? A/DEBUG: #6 pc 0006d829 /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so (_ZN4asio10io_context3runEv+32)
11-23 17:01:36.495 193-193/? A/DEBUG: #7 pc 0007019f /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so (ZZN7ableton9platforms4asio7ContextINS0_5posix13ScanIpIfAddrsENS_4util7NullLogEEC1INS_4link10ControllerISt8functionIFvjEESB_IFvNS9_5TempoEEENS0_3stl5ClockES7_E23UdpSendExceptionHandlerEEET_ENKUlRN4asio10io_contextESK_E_clESO_SK+40)
11-23 17:01:36.495 193-193/? A/DEBUG: #8 pc 000b30a7 /data/app/com.noisepages.nettoyeur.abllinksample-2/lib/arm/libabl_link_tilde.so
11-23 17:01:36.495 193-193/? A/DEBUG: #9 pc 0003f45f /system/lib/libc.so (_ZL15__pthread_startPv+30)
11-23 17:01:36.496 193-193/? A/DEBUG: #10 pc 00019b43 /system/lib/libc.so (__start_thread+6)
11-23 17:01:37.095 193-193/? W/debuggerd: type=1400 audit(0.0:344): avc: denied { read } for name="kgsl-3d0" dev="tmpfs" ino=4008 scontext=u:r:debuggerd:s0 tcontext=u:object_r:gpu_device:s0 tclass=chr_file permissive=0
11-23 17:01:37.095 193-193/? W/debuggerd: type=1400 audit(0.0:345): avc: denied { read } for name="kgsl-3d0" dev="tmpfs" ino=4008 scontext=u:r:debuggerd:s0 tcontext=u:object_r:gpu_device:s0 tclass=chr_file permissive=0
11-23 17:01:37.293 193-193/? A/DEBUG: Tombstone written to: /data/tombstones/tombstone_00
11-23 17:01:37.293 193-193/? E/DEBUG: AM write failed: Broken pipe

Any help would be appreciated.
Thanks in advance

@DanAlbert
Copy link
Member

DanAlbert commented Nov 27, 2017

Could you try building with -Os explicitly in your cflags? It sounds like there may be a bad assumption about alignment requirements with -Oz (which is the default in r16). @pirama-arumuga-nainar @stephenhines: did anything ever come of that? I'm still catching up on mail after vacation.

@pirama-arumuga-nainar
Copy link
Collaborator

The -Oz vs. -Os internal issue ended up being a bug in the application code.

I wonder if this issue is surfacing now due to changing the default from armv5 to armv7. Clang didn't change between r15 and r16.

@DanAlbert
Copy link
Member

I wonder if this issue is surfacing now due to changing the default from armv5 to armv7.

What changed here? We didn't update any of the toolchain libraries (right?), and none of the build systems have changed that much (there is no default ABI for CMake, it's user selected; ndk-build has previously built armeabi-v7a and armeabi, but now doesn't build armeabi; arm standalone toolchains have built armeabi-v7a by default for a long time).

@planethcom: Just in case, what build system do you use?

@stephenhines
Copy link
Collaborator

https://buganizer.corp.google.com/issues/69176075#comment33 says that the ldm/stm issue was fixed in NDK r14. Is this using Clang at all, or is it using GCC? That isn't clear in the original message. I should also point out that misalignment can happen when changing other libraries too (i.e. something that is providing the data in this case could be misaligning it now).

@rprichard
Copy link
Collaborator

If rebuilding the app with -Os (overriding the -Oz default in r16's Clang) fixes the problem, then maybe #573 is the problem? That issue only applies when:

  • Clang is used to compile (GCC is unaffected)
  • There is C++ code that throws and catches an exception
  • AFAIK it affects only 32-bit ARM, not ARM64

@stephenhines
Copy link
Collaborator

It looks like exceptions are being used (https://github.com/Ableton/link/blob/master/include/ableton/test/serial_io/Timer.hpp#L68). I would suggest switching to -Os, since this has a very high chance of being #573 with all other variables removed.

@planethcom
Copy link
Author

Thanks a lot for the fast answers.

@planethcom: Just in case, what build system do you use?

Sorry for the missing details.

  • Build system: cmake
  • Compiler: clang

It looks like exceptions are being used (https://github.com/Ableton/link/blob/master/include/ableton/test/serial_io/Timer.hpp#L68). I would suggest switching to -Os, since this has a very high chance of being #573 with all other variables removed.

Switching to -Os in the release builds solved the problem.
Thank you so much, you made my day;)

@DanAlbert
Copy link
Member

Dup of #573 then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants