-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
baseline 300% CPU usage with ARMv7 binaries on quad-core AArch64 ARM #13522
Comments
I had those same issues trying to build a month or two ago (I patched Make.inc and then ran into similar problems), but I was hoping that the prebuilt binaries would save me. I didn't realize that those binaries are ARMv7, but in any case I don't know enough about ARM to know if that could be the underlying cause (the internet suggests that ARMv7 has some flavor of multicore support?). |
it would be helpful if you could run this in gdb and periodically interrupt the process and grab a backtrace of the other threads to figure out why they are spinning. these are probably either libuv or libopenblas worker threads that should be sitting quietly at some mutex or syscall waiting for work. |
I have seen similar behavior on an odroid board which I think is all 32 bit. Will try to run gdb on it next time I get a chance. |
I followed vtjnash's suggestion a couple of times and consistently got this result:
Threads 3 and 4 have the same backtrace, and thread 1 (the main thread) has a very long but standard-looking |
Also, it should be noted that |
I'm not sure which version of OpenBLAS ARM julia is compiled against, but this loop is the best I could find: https://github.com/xianyi/OpenBLAS/blob/develop/driver/others/blas_server.c#L308, where |
if you single-step through that main loop in gcc, does the value returned by rpcc ever change and allow it to enter the pthread_cond_wait SLEEPING state? or perhaps is one of the other variables not initialized right and causing it to loop? |
I've exhausted the extent of my gdb abilities, and I don't think I'll be able to figure out what's going on in the libopenblas code without debug info. Unfortunately bin/julia-debug only has debug symbols pertaining to the actual julia code, not dependencies, but I'll try my hand at building a debug version of lib/julia/libopenblas.so (probably easier than getting julia to build on AArch64?) and maybe that'll help. |
You can also just use a system BLAS to avoid using OpenBLAS. |
Actually, looking at the OpenBLAS code it seems that recently (i.e. since OpenBLAS v0.2.14, the version of BLAS that the x86_64 version of release-0.4 builds from source, was released) the unsigned long long that rpcc() returns has been changed from units of milliseconds to units of nanoseconds: OpenMathLib/OpenBLAS@e12cf11. |
Building from the tip of the OpenBLAS develop branch solves the 300% CPU issue. It introduces this warning:
but other than that it might be worth baking into future ARM binary releases. |
cc: @xianyi |
Shall we bump openblas or are we still following official release? I also recently submitted a Makefile patch upstream, which would be great to have. |
We've been following releases for at least 2 years. We can ask nicely if upstream is ready to tag a new version. |
We have, but we also switched openblas to be a git-external, so we could theoretically now bump to non-release versions. |
The non-release versions may have other bugs. Can we bump only for arm? |
I am working on OpenBLAS cmake branch. After that, I will release a new version. I hope I can finish this at this week. |
Was this ever resolved upstream? If so, what version do I need? |
This issue doesn't exist in the 0.5 release binaries, I believe. |
that commit OpenMathLib/OpenBLAS@e12cf11 says it's present in 0.2.15 and newer |
I'm trying out the 0.4 ARM binaries linked here on a quad-core Snapdragon 600-based single-board computer (in particular, the IFC6410P) running Linaro. As far as I can tell, everything works great except for the fact that even when idling, julia consumes three out of four cores. See the attached screenshot for an example from the REPL;
julia -e 'sleep(10)'
gives similar results. Has anyone else seen anything similar?The text was updated successfully, but these errors were encountered: