Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baseline 300% CPU usage with ARMv7 binaries on quad-core AArch64 ARM #13522

Closed
schmrlng opened this issue Oct 10, 2015 · 21 comments
Closed

baseline 300% CPU usage with ARMv7 binaries on quad-core AArch64 ARM #13522

schmrlng opened this issue Oct 10, 2015 · 21 comments
Labels
system:arm ARMv7 and AArch64 upstream The issue is with an upstream dependency, e.g. LLVM

Comments

@schmrlng
Copy link
Contributor

I'm trying out the 0.4 ARM binaries linked here on a quad-core Snapdragon 600-based single-board computer (in particular, the IFC6410P) running Linaro. As far as I can tell, everything works great except for the fact that even when idling, julia consumes three out of four cores. See the attached screenshot for an example from the REPL; julia -e 'sleep(10)' gives similar results. Has anyone else seen anything similar?

julia_arm_300

@pao pao added the system:arm ARMv7 and AArch64 label Oct 10, 2015
@pao
Copy link
Member

pao commented Oct 10, 2015

Didn't run into the problems I hit in #10791? Wonder if that was magically fixed by @vtjnash's big codegen refactoring?

EDIT: never mind, these are 32-bit binaries on AArch64...I need to just try to build native again.

@pao pao changed the title baseline 300% CPU usage on quad-core AArch64 ARM baseline 300% CPU usage with ARMv7 binaries on quad-core AArch64 ARM Oct 10, 2015
@schmrlng
Copy link
Contributor Author

I had those same issues trying to build a month or two ago (I patched Make.inc and then ran into similar problems), but I was hoping that the prebuilt binaries would save me. I didn't realize that those binaries are ARMv7, but in any case I don't know enough about ARM to know if that could be the underlying cause (the internet suggests that ARMv7 has some flavor of multicore support?).

@vtjnash
Copy link
Member

vtjnash commented Oct 10, 2015

it would be helpful if you could run this in gdb and periodically interrupt the process and grab a backtrace of the other threads to figure out why they are spinning. these are probably either libuv or libopenblas worker threads that should be sitting quietly at some mutex or syscall waiting for work.

@tkelman
Copy link
Contributor

tkelman commented Oct 10, 2015

I have seen similar behavior on an odroid board which I think is all 32 bit. Will try to run gdb on it next time I get a chance.

@schmrlng
Copy link
Contributor Author

I followed vtjnash's suggestion a couple of times and consistently got this result:

root@linaro-gnome:/home/linaro/julia/bin# gdb --args ./julia -e 'println("sleeping"); sleep(10)'
GNU gdb (Ubuntu 7.7-0ubuntu3) 7.7
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./julia...done.
(gdb) r
Starting program: /home/linaro/julia/bin/julia -e println\(\"sleeping\"\)\;\ sleep\(10\)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0x64b40450 (LWP 8937)]
[New Thread 0x65340450 (LWP 8938)]
[New Thread 0x66b40450 (LWP 8939)]
sleeping
^C
Program received signal SIGINT, Interrupt.
syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:38
38  ../ports/sysdeps/unix/sysv/linux/arm/syscall.S: No such file or directory.
(gdb) info threads
  Id   Target Id         Frame
  4    Thread 0x66b40450 (LWP 8939) "julia" 0x41146c96 in gettimeofday () at ../sysdeps/unix/syscall-template.S:81
  3    Thread 0x65340450 (LWP 8938) "julia" 0x41146c96 in gettimeofday () at ../sysdeps/unix/syscall-template.S:81
  2    Thread 0x64b40450 (LWP 8937) "julia" 0x41146c96 in gettimeofday () at ../sysdeps/unix/syscall-template.S:81
* 1    Thread 0x4001b3f0 (LWP 8934) "julia" syscall () at ../ports/sysdeps/unix/sysv/linux/arm/syscall.S:38
(gdb) thread 2
[Switching to thread 2 (Thread 0x64b40450 (LWP 8941))]
#0  0x41146c96 in gettimeofday () at ../sysdeps/unix/syscall-template.S:81
81  ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x41146c96 in gettimeofday () at ../sysdeps/unix/syscall-template.S:81
#1  0x63b3cf9c in blas_thread_server () from /home/linaro/julia/bin/../lib/julia/libopenblas.so
#2  0x64b3ff90 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

Threads 3 and 4 have the same backtrace, and thread 1 (the main thread) has a very long but standard-looking sleep backtrace.

@schmrlng
Copy link
Contributor Author

Also, it should be noted that gettimeofday has nothing to do with the fact that I'm running a sleep method. Running other stuff (e.g. large dense matrix multiplication to see if BLAS will wake those threads up) gives the same result.

@schmrlng
Copy link
Contributor Author

I'm not sure which version of OpenBLAS ARM julia is compiled against, but this loop is the best I could find: https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server.c#L308, where rpcc (https://github.com/xianyi/OpenBLAS/blob/develop/common.h#L426) calling gettimeofday seems to be inlined.

@vtjnash
Copy link
Member

vtjnash commented Oct 10, 2015

if you single-step through that main loop in gcc, does the value returned by rpcc ever change and allow it to enter the pthread_cond_wait SLEEPING state? or perhaps is one of the other variables not initialized right and causing it to loop?

@schmrlng
Copy link
Contributor Author

I've exhausted the extent of my gdb abilities, and I don't think I'll be able to figure out what's going on in the libopenblas code without debug info. Unfortunately bin/julia-debug only has debug symbols pertaining to the actual julia code, not dependencies, but I'll try my hand at building a debug version of lib/julia/libopenblas.so (probably easier than getting julia to build on AArch64?) and maybe that'll help.

@ViralBShah
Copy link
Member

You can also just use a system BLAS to avoid using OpenBLAS.

@schmrlng
Copy link
Contributor Author

Actually, looking at the OpenBLAS code it seems that recently (i.e. since OpenBLAS v0.2.14, the version of BLAS that the x86_64 version of release-0.4 builds from source, was released) the unsigned long long that rpcc() returns has been changed from units of milliseconds to units of nanoseconds: OpenMathLib/OpenBLAS@e12cf11.
This is pretty important since the timeout delta is hardcoded (counted in nanoseconds or CPU cycles, depending on architecture), so I'll give building OpenBLAS from the develop tip a shot and hopefully it'll "just work" without any debugging.

@schmrlng
Copy link
Contributor Author

Building from the tip of the OpenBLAS develop branch solves the 300% CPU issue. It introduces this warning:

linaro@linaro-gnome:~/julia$ julia
WARNING: Error during initialization of module LinAlg:
ErrorException("symbol "dpotrf_" could not be found: /home/linaro/julia/bin/../lib/julia/libopenblas.so: undefined symbol: dpotrf_")
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-rc3 (2015-09-27 20:34 UTC)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |  arm-linux-gnueabihf

but other than that it might be worth baking into future ARM binary releases.

@ViralBShah
Copy link
Member

cc: @xianyi

@ViralBShah ViralBShah added the upstream The issue is with an upstream dependency, e.g. LLVM label Oct 10, 2015
@Keno
Copy link
Member

Keno commented Oct 11, 2015

Shall we bump openblas or are we still following official release? I also recently submitted a Makefile patch upstream, which would be great to have.

@tkelman
Copy link
Contributor

tkelman commented Oct 11, 2015

We've been following releases for at least 2 years. We can ask nicely if upstream is ready to tag a new version.

@Keno
Copy link
Member

Keno commented Oct 11, 2015

We have, but we also switched openblas to be a git-external, so we could theoretically now bump to non-release versions.

@ViralBShah
Copy link
Member

The non-release versions may have other bugs. Can we bump only for arm?

@xianyi
Copy link

xianyi commented Oct 12, 2015

I am working on OpenBLAS cmake branch. After that, I will release a new version. I hope I can finish this at this week.

@simonbyrne
Copy link
Contributor

Was this ever resolved upstream? If so, what version do I need?

@ViralBShah
Copy link
Member

This issue doesn't exist in the 0.5 release binaries, I believe.

@tkelman
Copy link
Contributor

tkelman commented Mar 6, 2017

that commit OpenMathLib/OpenBLAS@e12cf11 says it's present in 0.2.15 and newer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:arm ARMv7 and AArch64 upstream The issue is with an upstream dependency, e.g. LLVM
Projects
None yet
Development

No branches or pull requests

8 participants