Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBLAS thread pinning re-assigns one Julia thread to wrong CPU thread #105

Open
oschulz opened this issue Aug 8, 2024 · 11 comments
Open

Comments

@oschulz
Copy link

oschulz commented Aug 8, 2024

With JULIA_NUM_THREADS=6 and OPENBLAS_NUM_THREADS=6 and

julia> using ThreadPinning
julia> using ThreadPinning: cpuids

julia> threadinfo()
Hostname:       ...
CPU(s):         1 x 13th Gen Intel(R) Core(TM) i9-13900H
CPU target:     goldmont
Cores:          14 (20 CPU-threads due to 2-way SMT)
Core kinds:     8 "efficiency cores", 6 "performance cores".
NUMA domains:   1 (14 cores each)

Julia threads:  6

CPU socket 1
  0,1, 2,3, 4,5, 6,7, 8,9, 10,11, 12, 13, 14, 15, 
  16, 17, 18, 19

julia> perf_cpus = perf_cpus = filter(i -> !isefficiencycore(i), cpuids()); string(perf_cpus)
"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]"

julia> non_ht_threads = filter(!ishyperthread, perf_cpus); string(non_ht_threads)
"[0, 2, 4, 6, 8, 10]"

julia> ht_threads = filter(ishyperthread, perf_cpus); string(ht_threads)
"[1, 3, 5, 7, 9, 11]"

pinning the Julia threads to the "non-HT" performance CPU-threads works as expected:

julia> pinthreads(non_ht_threads)

julia> string(getcpuids())
"[0, 2, 4, 6, 8, 10]"

julia> string(ThreadPinning.openblas_getcpuids())
ERROR: The affinity mask of OpenBLAS thread 1 includes multiple CPU threads. This likely indicates that this OpenBLAS hasn't been pinned yet.

But after pinning the OpenBLAS threads to the "other half", so the "HT" CPU-threads of the performance cores

julia> ThreadPinning.openblas_pinthreads(ht_threads)

the Julia thread on CPU 0 gets reassigned to CPU 11, sharing that CPU thread with OpenBLAS and another Julia thread:

julia> string(getcpuids())
"[11, 2, 4, 6, 8, 10]"

julia> string(ThreadPinning.openblas_getcpuids())
"[1, 3, 5, 7, 9, 11]"

which is obviously not what we want. When trying to fix this by re-pinning the Julia threads

julia> pinthreads(non_ht_threads)

julia> string(getcpuids())
"[0, 2, 4, 6, 8, 10]"

julia> string(ThreadPinning.openblas_getcpuids())
"[1, 3, 5, 7, 9, 0]"

we end up with an OpenBLAS thread that shares a CPU thread with a Julia thread and another OpenBLAS thread.

(ThreadPinning v1.0.2, SysInfo v0.3.0).

@carstenbauer
Copy link
Owner

I will look into the fundamental issue when I find the time for it.

In the meantime, why are you using SysInfo directly? ThreadPinning.jl should be enough. It re-exports all the functions you're using (if not it's probably a oversight on my end). Also, I think that threadinfo is better than sysinfo. So I wonder why you use the latter in combination with explicit getcpuids() calls instead.

@oschulz
Copy link
Author

oschulz commented Aug 8, 2024

ThreadPinning.jl should be enough. It re-exports all the functions you're using

Had overlooked that - thanks, I've updated the example above.

As for sysinfo() I just wanted to use the shiny new functionality. :-) But you're right, threadinfo() is more detailed.

@carstenbauer
Copy link
Owner

I could reproduce this on Perlmutter (no efficiency cores):

crstnbr@login22 ThreadPinning.jl git:(main)
➜ OPENBLAS_NUM_THREADS=6 julia --project -t 6 -q
julia> using ThreadPinning

julia> pinthreads(:cores)

julia> getcpuids() |> print
[0, 1, 2, 3, 4, 5]
julia> openblas_pinthreads([128, 129, 130, 131, 132, 133]) # hyperthreads in the same cores

julia> openblas_getcpuids() |> print
[128, 129, 130, 131, 132, 133]
julia> getcpuids() |> print
[133, 1, 2, 3, 4, 5]

@carstenbauer
Copy link
Owner

carstenbauer commented Aug 8, 2024

However, my gut feeling tells me that it is not a problem with ThreadPinning.jl but a fundamental/upstream issue. Will investigate.

@oschulz
Copy link
Author

oschulz commented Aug 8, 2024

Will investigate

Thanks!

@carstenbauer
Copy link
Owner

Goes both ways...

crstnbr@login22 ThreadPinning.jl git:(main)
➜ OPENBLAS_NUM_THREADS=6 julia --project -t 6 -q
julia> using ThreadPinning

julia> openblas_pinthreads([128, 129, 130, 131, 132, 133]) # hyperthreads only

julia> openblas_getcpuids() |> print
[128, 129, 130, 131, 132, 133]
julia> pinthreads(:cores)

julia> getcpuids() |> print
[0, 1, 2, 3, 4, 5]
julia> openblas_getcpuids() |> print
[128, 129, 130, 131, 132, 0]

and isn't related to hyperthreads and/or efficiency cores.

crstnbr@login22 ThreadPinning.jl git:(main)
➜ OPENBLAS_NUM_THREADS=6 julia --project -t 6 -q
julia> using ThreadPinning

julia> pinthreads(cores(1:6))

julia> getcpuids() |> print
[0, 1, 2, 3, 4, 5]
julia> openblas_pinthreads(cores(7:12))

julia> openblas_getcpuids() |> print
[6, 7, 8, 9, 10, 11]
julia> getcpuids() |> print
[11, 1, 2, 3, 4, 5]

@oschulz
Copy link
Author

oschulz commented Aug 8, 2024

Update: Added what happens to the OpenBLAS threads, when re-pinning the Julia threads, to the example above.

@oschulz
Copy link
Author

oschulz commented Aug 8, 2024

Indeed it doesn't seem to have anything to do with HT/non-HT or the CPU numbers chosen, trying to pin the Julia threads and the OpenBLAS threads to non-overlapping sets of CPU threads in general seems to always result in the behavior above.

@carstenbauer
Copy link
Owner

Speculation: Maybe the reason for this issue lies in https://github.com/OpenMathLib/OpenBLAS/blob/d92cc96978c17a35355101a1901981970dec25b6/driver/others/blas_server.c#L357-L359. Maybe the call to pthread_self() is problematic because the calling Julia thread is also a pthread.

@carstenbauer
Copy link
Owner

carstenbauer commented Aug 8, 2024

Varying which Julia thread makes the openblas_setaffinity call:

julia> using ThreadPinning

julia> openblas_pinthreads([128, 129, 130, 131, 132, 133]; juliathreadid=2) # hyperthreads only

julia> openblas_getcpuids() |> print
ERROR: The affinity mask of OpenBLAS thread 6 includes multiple CPU threads. This likely indicates that this OpenBLAS hasn't been pinned yet.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] openblas_getcpuid(; threadid::Int64, juliathreadid::Int64)
   @ ThreadPinningCore.Internals /pscratch/sd/c/crstnbr/.julia/packages/ThreadPinningCore/fdkhT/src/openblas.jl:70
 [3] openblas_getcpuid
   @ /pscratch/sd/c/crstnbr/.julia/packages/ThreadPinningCore/fdkhT/src/openblas.jl:59 [inlined]
 [4] openblas_getcpuids(; kwargs::@Kwargs{})
   @ ThreadPinningCore.Internals /pscratch/sd/c/crstnbr/.julia/packages/ThreadPinningCore/fdkhT/src/openblas.jl:80
 [5] openblas_getcpuids
   @ /pscratch/sd/c/crstnbr/.julia/packages/ThreadPinningCore/fdkhT/src/openblas.jl:76 [inlined]
 [6] openblas_getcpuids()
   @ ThreadPinning.Querying /pscratch/sd/c/crstnbr/ThreadPinning.jl/src/querying.jl:317
 [7] top-level scope
   @ REPL[3]:1

julia> pinthreads(:cores)

julia> getcpuids() |> print
[0, 1, 2, 3, 4, 5]
julia> openblas_getcpuids() |> print
[128, 129, 130, 131, 132, 0]

Going to a lower level, trying to isolate the issue:

julia> using ThreadPinning

julia> import ThreadPinningCore: LibCalls

julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([0]));

julia> LibCalls.openblas_setaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0

julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([1]));

julia> LibCalls.openblas_getaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0

julia> cpuset_ref
Base.RefValue{ThreadPinningCore.LibCalls.Ccpu_set_t}(Ccpu_set_t(1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000))


# restart session (to be safe)


julia> using ThreadPinning

julia> import ThreadPinningCore: LibCalls

julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([0]));

julia> ThreadPinning.@fetchfrom 1 LibCalls.openblas_setaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0

julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([1]));

julia> LibCalls.openblas_getaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0

julia> cpuset_ref
Base.RefValue{ThreadPinningCore.LibCalls.Ccpu_set_t}(Ccpu_set_t(1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000))


# restart session (to be safe)


julia> using ThreadPinning

julia> import ThreadPinningCore: LibCalls

julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([0]));

julia> ThreadPinning.@fetchfrom 2 LibCalls.openblas_setaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0

julia> cpuset_ref = Ref{LibCalls.Ccpu_set_t}(LibCalls.Ccpu_set_t([1]));

julia> LibCalls.openblas_getaffinity(5, sizeof(cpuset_ref[]), cpuset_ref)
0

julia> cpuset_ref
Base.RefValue{ThreadPinningCore.LibCalls.Ccpu_set_t}(Ccpu_set_t(1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000))

@carstenbauer
Copy link
Owner

@vchuravy: Do you have any ideas what might be going on here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants