-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenBLAS thread pinning re-assigns one Julia thread to wrong CPU thread #105
Comments
I will look into the fundamental issue when I find the time for it. In the meantime, why are you using SysInfo directly? ThreadPinning.jl should be enough. It re-exports all the functions you're using (if not it's probably a oversight on my end). Also, I think that |
Had overlooked that - thanks, I've updated the example above. As for |
I could reproduce this on Perlmutter (no efficiency cores): crstnbr@login22 ThreadPinning.jl git:(main)
➜ OPENBLAS_NUM_THREADS=6 julia --project -t 6 -q
julia> using ThreadPinning
julia> pinthreads(:cores)
julia> getcpuids() |> print
[0, 1, 2, 3, 4, 5]
julia> openblas_pinthreads([128, 129, 130, 131, 132, 133]) # hyperthreads in the same cores
julia> openblas_getcpuids() |> print
[128, 129, 130, 131, 132, 133]
julia> getcpuids() |> print
[133, 1, 2, 3, 4, 5] |
However, my gut feeling tells me that it is not a problem with ThreadPinning.jl but a fundamental/upstream issue. Will investigate. |
Thanks! |
Goes both ways... crstnbr@login22 ThreadPinning.jl git:(main)
➜ OPENBLAS_NUM_THREADS=6 julia --project -t 6 -q
julia> using ThreadPinning
julia> openblas_pinthreads([128, 129, 130, 131, 132, 133]) # hyperthreads only
julia> openblas_getcpuids() |> print
[128, 129, 130, 131, 132, 133]
julia> pinthreads(:cores)
julia> getcpuids() |> print
[0, 1, 2, 3, 4, 5]
julia> openblas_getcpuids() |> print
[128, 129, 130, 131, 132, 0] and isn't related to hyperthreads and/or efficiency cores. crstnbr@login22 ThreadPinning.jl git:(main)
➜ OPENBLAS_NUM_THREADS=6 julia --project -t 6 -q
julia> using ThreadPinning
julia> pinthreads(cores(1:6))
julia> getcpuids() |> print
[0, 1, 2, 3, 4, 5]
julia> openblas_pinthreads(cores(7:12))
julia> openblas_getcpuids() |> print
[6, 7, 8, 9, 10, 11]
julia> getcpuids() |> print
[11, 1, 2, 3, 4, 5] |
Update: Added what happens to the OpenBLAS threads, when re-pinning the Julia threads, to the example above. |
Indeed it doesn't seem to have anything to do with HT/non-HT or the CPU numbers chosen, trying to pin the Julia threads and the OpenBLAS threads to non-overlapping sets of CPU threads in general seems to always result in the behavior above. |
Speculation: Maybe the reason for this issue lies in https://github.com/OpenMathLib/OpenBLAS/blob/d92cc96978c17a35355101a1901981970dec25b6/driver/others/blas_server.c#L357-L359. Maybe the call to |
Varying which Julia thread makes the julia> using ThreadPinning
julia> openblas_pinthreads([128, 129, 130, 131, 132, 133]; juliathreadid=2) # hyperthreads only
julia> openblas_getcpuids() |> print
ERROR: The affinity mask of OpenBLAS thread 6 includes multiple CPU threads. This likely indicates that this OpenBLAS hasn't been pinned yet.
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] openblas_getcpuid(; threadid::Int64, juliathreadid::Int64)
@ ThreadPinningCore.Internals /pscratch/sd/c/crstnbr/.julia/packages/ThreadPinningCore/fdkhT/src/openblas.jl:70
[3] openblas_getcpuid
@ /pscratch/sd/c/crstnbr/.julia/packages/ThreadPinningCore/fdkhT/src/openblas.jl:59 [inlined]
[4] openblas_getcpuids(; kwargs::@Kwargs{})
@ ThreadPinningCore.Internals /pscratch/sd/c/crstnbr/.julia/packages/ThreadPinningCore/fdkhT/src/openblas.jl:80
[5] openblas_getcpuids
@ /pscratch/sd/c/crstnbr/.julia/packages/ThreadPinningCore/fdkhT/src/openblas.jl:76 [inlined]
[6] openblas_getcpuids()
@ ThreadPinning.Querying /pscratch/sd/c/crstnbr/ThreadPinning.jl/src/querying.jl:317
[7] top-level scope
@ REPL[3]:1
julia> pinthreads(:cores)
julia> getcpuids() |> print
[0, 1, 2, 3, 4, 5]
julia> openblas_getcpuids() |> print
[128, 129, 130, 131, 132, 0] Going to a lower level, trying to isolate the issue: julia> using ThreadPinning
julia> import ThreadPinningCore: LibCalls
julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([0]));
julia> LibCalls.openblas_setaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0
julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([1]));
julia> LibCalls.openblas_getaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0
julia> cpuset_ref
Base.RefValue{ThreadPinningCore.LibCalls.Ccpu_set_t}(Ccpu_set_t(1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000))
# restart session (to be safe)
julia> using ThreadPinning
julia> import ThreadPinningCore: LibCalls
julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([0]));
julia> ThreadPinning.@fetchfrom 1 LibCalls.openblas_setaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0
julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([1]));
julia> LibCalls.openblas_getaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0
julia> cpuset_ref
Base.RefValue{ThreadPinningCore.LibCalls.Ccpu_set_t}(Ccpu_set_t(1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000))
# restart session (to be safe)
julia> using ThreadPinning
julia> import ThreadPinningCore: LibCalls
julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([0]));
julia> ThreadPinning.@fetchfrom 2 LibCalls.openblas_setaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0
julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([1]));
julia> LibCalls.openblas_getaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0
julia> cpuset_ref
Base.RefValue{ThreadPinningCore.LibCalls.Ccpu_set_t}(Ccpu_set_t(1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)) |
@vchuravy: Do you have any ideas what might be going on here? |
With
JULIA_NUM_THREADS=6
andOPENBLAS_NUM_THREADS=6
andpinning the Julia threads to the "non-HT" performance CPU-threads works as expected:
But after pinning the OpenBLAS threads to the "other half", so the "HT" CPU-threads of the performance cores
the Julia thread on CPU 0 gets reassigned to CPU 11, sharing that CPU thread with OpenBLAS and another Julia thread:
which is obviously not what we want. When trying to fix this by re-pinning the Julia threads
we end up with an OpenBLAS thread that shares a CPU thread with a Julia thread and another OpenBLAS thread.
(ThreadPinning v1.0.2, SysInfo v0.3.0).
The text was updated successfully, but these errors were encountered: