Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addprocs() consistently fails after 68 on a 64-core system #21958

Closed
sbromberger opened this issue May 19, 2017 · 43 comments
Closed

addprocs() consistently fails after 68 on a 64-core system #21958

sbromberger opened this issue May 19, 2017 · 43 comments
Labels
parallelism Parallel or distributed computation

Comments

@sbromberger
Copy link
Contributor

On an 8-CPU system, each CPU having 8 cores, with a total single memory of 0.5TB:

Julia Version 0.7.0-DEV.181
Commit 8518fcb (2017-05-15 22:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-4610 v2 @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
julia> addprocs()
64-element Array{Int64,1}:
  2
  3
# removed a bunch of lines here
 64
 65

julia> addprocs(1)
1-element Array{Int64,1}:
 66

julia> addprocs(1)
1-element Array{Int64,1}:
 67

julia> addprocs(1)
1-element Array{Int64,1}:
 68

julia> addprocs(1)
ERROR: Timed out waiting to read host:port string from worker.
read_worker_host_port(::Pipe) at ./distributed/cluster.jl:250
connect(::Base.Distributed.LocalManager, ::Int64, ::WorkerConfig) at ./distributed/managers.jl:390
create_worker(::Base.Distributed.LocalManager, ::WorkerConfig) at ./distributed/cluster.jl:443
setup_launched_worker(::Base.Distributed.LocalManager, ::WorkerConfig, ::Array{Int64,1}) at ./distributed/cluster.jl:389
(::Base.Distributed.##33#36{Base.Distributed.LocalManager,WorkerConfig,Array{Int64,1}})() at ./task.jl:335
Stacktrace:
 [1] sync_end() at ./task.jl:287
 [2] macro expansion at ./task.jl:303 [inlined]
 [3] #addprocs_locked#30(::Array{Any,1}, ::Function, ::Base.Distributed.LocalManager) at ./distributed/cluster.jl:344
 [4] #addprocs#29(::Array{Any,1}, ::Function, ::Base.Distributed.LocalManager) at ./distributed/cluster.jl:319
 [5] #addprocs#243(::Bool, ::Array{Any,1}, ::Function, ::Int64) at ./distributed/managers.jl:311
 [6] addprocs(::Int64) at ./distributed/managers.jl:310

So far this is 100% repeatable after 68 processes.

@ararslan ararslan added the parallelism Parallel or distributed computation label May 19, 2017
@amitmurthy
Copy link
Contributor

It is a possibility that system limits are being hit. What is the output of ulimit -a?

@sbromberger
Copy link
Contributor Author

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 2063007
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

@amitmurthy
Copy link
Contributor

Hmm, I still think it has something to do with system limits for your account. FWIW, addprocs(70) works fine on my local OSX machine and I know of others who launching upwards of 100 workers on a single box.

I assume the failing addprocs call is failing after approximately 60 seconds, i.e., the default JULIA_WORKER_TIMEOUT value? Any other output written to screen? Do you see the same problem with addprocs(70; topology=:master_slave) ?

@sbromberger
Copy link
Contributor Author

addprocs(70; topology=:master_slave) works.

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

however, the next addprocs(70; topology=:master_slave) fails immediately with

julia> addprocs(70; topology=:master_slave)
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR (unhandled task failure): could not spawn setenv(`/home/seth/dev/julia7/bin/julia -Cx86-64 -J/home/seth/dev/julia7/lib/julia/sys.so --compile=yes --depwarn=yes --bind-to 127.0.0.1 --worker FkKXzU0Rr0TtkDQq`; dir="/home/seth/dev/divy/LightGraphs.jl"): resource temporarily unavailable (EAGAIN)
Stacktrace:
 [1] _jl_spawn(::String, ::Array{String,1}, ::Ptr{Void}, ::Base.Process, ::Base.DevNullStream, ::Base.PipeEndpoint, ::Base.TTY) at ./process.jl:363
 [2] #373 at ./process.jl:515 [inlined]
 [3] setup_stdio(::Base.##373#374{Cmd}, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:502
 [4] #spawn#372(::Nullable{Base.ProcessChain}, ::Function, ::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:514
 [5] spawn(::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:509
 [6] open(::Cmd, ::String, ::Base.DevNullStream) at ./process.jl:588
 [7] open(::Cmd) at ./process.jl:578
 [8] launch(::Base.Distributed.LocalManager, ::Dict{Any,Any}, ::Array{WorkerConfig,1}, ::Condition) at ./distributed/managers.jl:324
 [9] (::Base.Distributed.##31#34{Base.Distributed.LocalManager,Dict{Any,Any},Array{WorkerConfig,1},Condition})() at ./event.jl:73

and then about a minute later,

ERROR: Timed out waiting to read host:port string from worker.
read_worker_host_port(::Pipe) at ./distributed/cluster.jl:250
connect(::Base.Distributed.LocalManager, ::Int64, ::WorkerConfig) at ./distributed/managers.jl:390
create_worker(::Base.Distributed.LocalManager, ::WorkerConfig) at ./distributed/cluster.jl:443
setup_launched_worker(::Base.Distributed.LocalManager, ::WorkerConfig, ::Array{Int64,1}) at ./distributed/cluster.jl:389
(::Base.Distributed.##33#36{Base.Distributed.LocalManager,WorkerConfig,Array{Int64,1}})() at ./task.jl:335

...and 41 more exception(s).

Stacktrace:
 [1] sync_end() at ./task.jl:287
 [2] macro expansion at ./task.jl:303 [inlined]
 [3] #addprocs_locked#30(::Array{Any,1}, ::Function, ::Base.Distributed.LocalManager) at ./distributed/cluster.jl:344
 [4] (::Base.Distributed.#kw##addprocs_locked)(::Array{Any,1}, ::Base.Distributed.#addprocs_locked, ::Base.Distributed.LocalManager) at ./<missing>:0
 [5] #addprocs#29(::Array{Any,1}, ::Function, ::Base.Distributed.LocalManager) at ./distributed/cluster.jl:319
 [6] (::Base.Distributed.#kw##addprocs)(::Array{Any,1}, ::Base.Distributed.#addprocs, ::Base.Distributed.LocalManager) at ./<missing>:0
 [7] #addprocs#243(::Bool, ::Array{Any,1}, ::Function, ::Int64) at ./distributed/managers.jl:311
 [8] (::Base.Distributed.#kw##addprocs)(::Array{Any,1}, ::Base.Distributed.#addprocs, ::Int64) at ./<missing>:0

@sbromberger
Copy link
Contributor Author

When I try doing addprocs(140; topology=:master_slave), I get (immediately) a bunch of errors. Among them:

OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 65536 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 65536 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 65536 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable

repeated. This seems to suggest I'm hitting that max user process limit, but outside of two shells and a top command, I'm not running anything other than Julia.

@yuyichao
Copy link
Contributor

Note that sqrt(4096) is 64. How many openblas threads are you spawning?

@sbromberger
Copy link
Contributor Author

@yuyichao I don't know. I'm starting Julia up without any flags....

@yuyichao
Copy link
Contributor

I mean you can check with htop / ps how many threads are you using.

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

when I do a ps -eLf | grep julia | wc -l after starting a single instance of Julia, I get

-bash-4.2$ ps -eLf | grep julia | wc -l
150

That seems... high.

Edit: oops. I forgot to killall julia after the last failure. It's now

-bash-4.2$ ps -eLf | grep julia | wc -l
65

@yuyichao
Copy link
Contributor

That seems consistent with openblas starting ncores number of workers.

@andreasnoack
Copy link
Member

Maybe try out if you can add more workers if you launch with OMP_NUM_THREADS=1 julia

@sbromberger
Copy link
Contributor Author

yes. However:

julia> ccall((:openblas_get_num_threads64_, Base.libblas_name), Cint, ())
1

Still results in

-bash-4.2$ ps -eLf | grep julia | wc -l
50

@amitmurthy
Copy link
Contributor

By default each worker sets blas threads to 1 as part of the worker init. A worker may come up with 64 blas threads, but it should immediately be set to 1.

@amitmurthy
Copy link
Contributor

How many total threads do you see with julia -p1 ? It should be 64 blas threads on the master and 1 on the worker.

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

julia> addprocs(140; topology=:master_slave)
ERROR (unhandled task failure): could not spawn setenv(`/home/seth/dev/julia7/bin/julia -Cx86-64 -J/home/seth/dev/julia7/lib/julia/sys.so --compile=yes --depwarn=yes --bind-to 127.0.0.1 --worker WddJ9NavwZxE2Sa8`; dir="/home/seth"): resource temporarily unavailable (EAGAIN)
Stacktrace:
 [1] _jl_spawn(::String, ::Array{String,1}, ::Ptr{Void}, ::Base.Process, ::Base.DevNullStream, ::Base.PipeEndpoint, ::Base.TTY) at ./process.jl:363
 [2] #373 at ./process.jl:515 [inlined]
 [3] setup_stdio(::Base.##373#374{Cmd}, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:502
 [4] #spawn#372(::Nullable{Base.ProcessChain}, ::Function, ::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:514
 [5] spawn(::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:509
 [6] open(::Cmd, ::String, ::Base.DevNullStream) at ./process.jl:588
 [7] open(::Cmd) at ./process.jl:578
 [8] launch(::Base.Distributed.LocalManager, ::Dict{Any,Any}, ::Array{WorkerConfig,1}, ::Condition) at ./distributed/managers.jl:324
 [9] (::Base.Distributed.##31#34{Base.Distributed.LocalManager,Dict{Any,Any},Array{WorkerConfig,1},Condition})() at ./event.jl:73

signal (6): Aborted
while loading no file, in expression starting on line 0
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
init_once at /home/centos/buildbot/slave/package_tarball64/build/deps/srccache/libuv-52d72a52cc7ccd570929990f010ed16e2ec604c8/src/threadpool.c:174
pthread_once at /lib64/libpthread.so.0 (unknown line)
uv_once at /home/centos/buildbot/slave/package_tarball64/build/deps/srccache/libuv-52d72a52cc7ccd570929990f010ed16e2ec604c8/src/unix/thread.c:239
uv__work_submit at /home/centos/buildbot/slave/package_tarball64/build/deps/srccache/libuv-52d72a52cc7ccd570929990f010ed16e2ec604c8/src/threadpool.c:184
uv_getaddrinfo at /home/centos/buildbot/slave/package_tarball64/build/deps/srccache/libuv-52d72a52cc7ccd570929990f010ed16e2ec604c8/src/unix/getaddrinfo.c:186
jl_getaddrinfo at /home/centos/buildbot/slave/package_tarball64/build/src/jl_uv.c:715
getaddrinfo at ./socket.jl:624
unknown function (ip: 0x7fcd1ea110ef)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:348 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:367 [inlined]
jl_invoke at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:41
getaddrinfo at ./socket.jl:637
connect! at ./socket.jl:747
connect at ./stream.jl:982 [inlined]
connect_to_worker at ./distributed/managers.jl:488
unknown function (ip: 0x7fcd1e860682)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:348 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:367 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1923
connect at ./distributed/managers.jl:430
unknown function (ip: 0x7fcd1e85e409)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:348 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:367 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1923
create_worker at ./distributed/cluster.jl:443
setup_launched_worker at ./distributed/cluster.jl:389
#33 at ./task.jl:335
unknown function (ip: 0x7fcd0c09781f)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:348 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:367 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1923
jl_apply at /home/centos/buildbot/slave/package_tarball64/build/src/julia.h:1422 [inlined]
start_task at /home/centos/buildbot/slave/package_tarball64/build/src/task.c:267
unknown function (ip: 0xffffffffffffffff)
Allocations: 1235851 (Pool: 1234289; Big: 1562); GC: 0
Aborted (core dumped)

and back to shell.

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

@amitmurthy I don't think that's what I'm seeing: julia -p1 shows

-bash-4.2$ ps -eLf | grep julia | wc -l
125

(although I have to admit the flags to ps are new to me; I'm assuming they do what the manual says.)

@amitmurthy
Copy link
Contributor

With julia -p1 can you try

julia> @everywhere get_num_threads = function() # anonymous so it will be serialized when called
                  blas = BLAS.vendor()
                  # Wrap in a try to catch unsupported blas versions
                  try
                      if blas == :openblas
                          return ccall((:openblas_get_num_threads, Base.libblas_name), Cint, ())
                      elseif blas == :openblas64
                          return ccall((:openblas_get_num_threads64_, Base.libblas_name), Cint, ())
                      elseif blas == :mkl
                          return ccall((:MKL_Get_Max_Num_Threads, Base.libblas_name), Cint, ())
                      end

                      # OSX BLAS looks at an environment variable
                      if is_apple()
                          return ENV["VECLIB_MAXIMUM_THREADS"]
                      end
                  end

                  return nothing
              end

julia> get_num_threads()
4

julia> remotecall_fetch(get_num_threads, 2)
1

@yuyichao
Copy link
Contributor

Setting thread to lower number does not seem to destroy the thread.

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

julia> get_num_threads()
16

julia> remotecall_fetch(get_num_threads, 2)
1

but still

-bash-4.2$ ps -eLf | grep julia | wc -l
125

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

Does it matter that I'm running julia out of the build directory (as opposed to having a make install? That is, /home/seth/dev/julia7/bin/julia.

@sbromberger
Copy link
Contributor Author

technically, the ps output should be 124: It's counting the grep, so

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
124

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

With julia -p2, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
184

and with julia -p3, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
244

So it appears it's 64 + (60*p).

@yuyichao
Copy link
Contributor

Have you tried #21958 (comment) ?

@amitmurthy
Copy link
Contributor

I think @yuyichao is right. It is possible that blas is using the thread count to limit the number of active threads to schedule work on. And probably does not destroy the excess threads.

Also get_num_threads() on the master should ideally be 64, unless again blas is actually creating 64 threads but only using 16 by default.

@yuyichao
Copy link
Contributor

And probably does not destroy the excess threads.

According to openblas source, it might also leak them if you set it back up again.

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

with OMP_NUM_THREADS=1 julia7, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
49

with -p1, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
102

with -p2, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
155

So 49 + (53*p)

@amitmurthy
Copy link
Contributor

What about with OPENBLAS_NUM_THREADS=1 ?

@yuyichao
Copy link
Contributor

also set UV_THREADPOOL_SIZE=1

@sbromberger
Copy link
Contributor Author

What about with OPENBLAS_NUM_THREADS=1 ?

Same numbers as with OMP_NUM_THREADS=1, above.

also set UV_THREADPOOL_SIZE=1

49, 99, and 149, respectively, for default, -p1, and -p2.

@sbromberger
Copy link
Contributor Author

gah. I may have found the issue. One sec.

@sbromberger
Copy link
Contributor Author

sbromberger commented May 19, 2017

Yup, found it.

The issue: a few months ago I was playing around on another machine with multithreading and set export JULIA_NUM_THREADS=48 in my .bash_profile. When my account was created on this machine the admins copied over my home directory. I've removed the entry and now I can do an addprocs(192; topology=:master_slave).

@sbromberger
Copy link
Contributor Author

apologies for having wasted your time on this issue.

@amitmurthy
Copy link
Contributor

No worries. Like we set open blas threads to 1 on the workers, we may need a similar level of configurability for num julia threads on the workers too.

@StefanKarpinski
Copy link
Member

Is there anything we could have done to make it clearer what was going on here?

@sbromberger
Copy link
Contributor Author

@StefanKarpinski the issue was my failure to realize that an environment variable that affected Julia was set. It might be a good idea to have a command that prints the environment variables that have been processed by (and are relevant to) Julia - that would've uncovered this issue immediately.

Really sorry for the distraction here. Environment should've been my first check.

@StefanKarpinski
Copy link
Member

Having a way to display all significant environment variables would be a good idea.

@JeffreySarnoff
Copy link
Contributor

everyone seems comfortable with versioninfo() environmentinfo()?

@StefanKarpinski
Copy link
Member

Actually, we should just display the values of any Julia-significant environment variables in versioninfo() since that's a thing people are already posting to GitHub when they have issues.

@StefanKarpinski
Copy link
Member

Case in point: in this issue, @sbromberger would have noticed it right away since the first thing he posted in his report was the output of versioninfo().

@sbromberger
Copy link
Contributor Author

@StefanKarpinski two things:

in this issue, @sbromberger would have noticed it right away

You might be giving me too much credit.

Also, we want to make sure that if we do this, we don't send specific environment variables to versioninfo (ref #21949 (comment))

@sbromberger
Copy link
Contributor Author

Can I hijack for a silly question? What are the advantages to the different topologies? That is, I suppose there are cases where workers communicate with each other without going through the master, but does that happen on a standard pmap?

@amitmurthy
Copy link
Contributor

+1 for versioninfo displaying julia specific env.

pmap from the master will only need connectivity to the workers. You can also execute a pmap from a worker in which case it will need worker-worker connectivity.

With the default all-to-all setup and N workers we are looking at N^2 socket fds. I think we need to move to a model where worker-worker connections are setup lazily, i.e., at the time of the first communication which requires a direct worker-to-worker communication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

7 participants