addprocs() consistently fails after 68 on a 64-core system #21958

sbromberger · 2017-05-19T00:48:06Z

On an 8-CPU system, each CPU having 8 cores, with a total single memory of 0.5TB:

Julia Version 0.7.0-DEV.181
Commit 8518fcb (2017-05-15 22:30 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-4610 v2 @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

julia> addprocs()
64-element Array{Int64,1}:
  2
  3
# removed a bunch of lines here
 64
 65

julia> addprocs(1)
1-element Array{Int64,1}:
 66

julia> addprocs(1)
1-element Array{Int64,1}:
 67

julia> addprocs(1)
1-element Array{Int64,1}:
 68

julia> addprocs(1)
ERROR: Timed out waiting to read host:port string from worker.
read_worker_host_port(::Pipe) at ./distributed/cluster.jl:250
connect(::Base.Distributed.LocalManager, ::Int64, ::WorkerConfig) at ./distributed/managers.jl:390
create_worker(::Base.Distributed.LocalManager, ::WorkerConfig) at ./distributed/cluster.jl:443
setup_launched_worker(::Base.Distributed.LocalManager, ::WorkerConfig, ::Array{Int64,1}) at ./distributed/cluster.jl:389
(::Base.Distributed.##33#36{Base.Distributed.LocalManager,WorkerConfig,Array{Int64,1}})() at ./task.jl:335
Stacktrace:
 [1] sync_end() at ./task.jl:287
 [2] macro expansion at ./task.jl:303 [inlined]
 [3] #addprocs_locked#30(::Array{Any,1}, ::Function, ::Base.Distributed.LocalManager) at ./distributed/cluster.jl:344
 [4] #addprocs#29(::Array{Any,1}, ::Function, ::Base.Distributed.LocalManager) at ./distributed/cluster.jl:319
 [5] #addprocs#243(::Bool, ::Array{Any,1}, ::Function, ::Int64) at ./distributed/managers.jl:311
 [6] addprocs(::Int64) at ./distributed/managers.jl:310

So far this is 100% repeatable after 68 processes.

The text was updated successfully, but these errors were encountered:

amitmurthy · 2017-05-19T04:56:16Z

It is a possibility that system limits are being hit. What is the output of ulimit -a?

sbromberger · 2017-05-19T05:10:43Z

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 2063007
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

amitmurthy · 2017-05-19T06:49:09Z

Hmm, I still think it has something to do with system limits for your account. FWIW, addprocs(70) works fine on my local OSX machine and I know of others who launching upwards of 100 workers on a single box.

I assume the failing addprocs call is failing after approximately 60 seconds, i.e., the default JULIA_WORKER_TIMEOUT value? Any other output written to screen? Do you see the same problem with addprocs(70; topology=:master_slave) ?

sbromberger · 2017-05-19T15:06:31Z

addprocs(70; topology=:master_slave) works.

sbromberger · 2017-05-19T15:07:58Z

however, the next addprocs(70; topology=:master_slave) fails immediately with

julia> addprocs(70; topology=:master_slave)
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR: pthread_create(signal_listener) failed
ERROR (unhandled task failure): could not spawn setenv(`/home/seth/dev/julia7/bin/julia -Cx86-64 -J/home/seth/dev/julia7/lib/julia/sys.so --compile=yes --depwarn=yes --bind-to 127.0.0.1 --worker FkKXzU0Rr0TtkDQq`; dir="/home/seth/dev/divy/LightGraphs.jl"): resource temporarily unavailable (EAGAIN)
Stacktrace:
 [1] _jl_spawn(::String, ::Array{String,1}, ::Ptr{Void}, ::Base.Process, ::Base.DevNullStream, ::Base.PipeEndpoint, ::Base.TTY) at ./process.jl:363
 [2] #373 at ./process.jl:515 [inlined]
 [3] setup_stdio(::Base.##373#374{Cmd}, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:502
 [4] #spawn#372(::Nullable{Base.ProcessChain}, ::Function, ::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:514
 [5] spawn(::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:509
 [6] open(::Cmd, ::String, ::Base.DevNullStream) at ./process.jl:588
 [7] open(::Cmd) at ./process.jl:578
 [8] launch(::Base.Distributed.LocalManager, ::Dict{Any,Any}, ::Array{WorkerConfig,1}, ::Condition) at ./distributed/managers.jl:324
 [9] (::Base.Distributed.##31#34{Base.Distributed.LocalManager,Dict{Any,Any},Array{WorkerConfig,1},Condition})() at ./event.jl:73

and then about a minute later,

ERROR: Timed out waiting to read host:port string from worker.
read_worker_host_port(::Pipe) at ./distributed/cluster.jl:250
connect(::Base.Distributed.LocalManager, ::Int64, ::WorkerConfig) at ./distributed/managers.jl:390
create_worker(::Base.Distributed.LocalManager, ::WorkerConfig) at ./distributed/cluster.jl:443
setup_launched_worker(::Base.Distributed.LocalManager, ::WorkerConfig, ::Array{Int64,1}) at ./distributed/cluster.jl:389
(::Base.Distributed.##33#36{Base.Distributed.LocalManager,WorkerConfig,Array{Int64,1}})() at ./task.jl:335

...and 41 more exception(s).

Stacktrace:
 [1] sync_end() at ./task.jl:287
 [2] macro expansion at ./task.jl:303 [inlined]
 [3] #addprocs_locked#30(::Array{Any,1}, ::Function, ::Base.Distributed.LocalManager) at ./distributed/cluster.jl:344
 [4] (::Base.Distributed.#kw##addprocs_locked)(::Array{Any,1}, ::Base.Distributed.#addprocs_locked, ::Base.Distributed.LocalManager) at ./<missing>:0
 [5] #addprocs#29(::Array{Any,1}, ::Function, ::Base.Distributed.LocalManager) at ./distributed/cluster.jl:319
 [6] (::Base.Distributed.#kw##addprocs)(::Array{Any,1}, ::Base.Distributed.#addprocs, ::Base.Distributed.LocalManager) at ./<missing>:0
 [7] #addprocs#243(::Bool, ::Array{Any,1}, ::Function, ::Int64) at ./distributed/managers.jl:311
 [8] (::Base.Distributed.#kw##addprocs)(::Array{Any,1}, ::Base.Distributed.#addprocs, ::Int64) at ./<missing>:0

sbromberger · 2017-05-19T15:11:37Z

When I try doing addprocs(140; topology=:master_slave), I get (immediately) a bunch of errors. Among them:

OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 65536 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 65536 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 65536 max
OpenBLAS blas_thread_init: pthread_create: Resource temporarily unavailable

repeated. This seems to suggest I'm hitting that max user process limit, but outside of two shells and a top command, I'm not running anything other than Julia.

yuyichao · 2017-05-19T15:12:04Z

Note that sqrt(4096) is 64. How many openblas threads are you spawning?

sbromberger · 2017-05-19T15:12:43Z

@yuyichao I don't know. I'm starting Julia up without any flags....

yuyichao · 2017-05-19T15:13:11Z

I mean you can check with htop / ps how many threads are you using.

sbromberger · 2017-05-19T15:15:50Z

when I do a ps -eLf | grep julia | wc -l after starting a single instance of Julia, I get

-bash-4.2$ ps -eLf | grep julia | wc -l
150

That seems... high.

Edit: oops. I forgot to killall julia after the last failure. It's now

-bash-4.2$ ps -eLf | grep julia | wc -l
65

yuyichao · 2017-05-19T15:19:45Z

That seems consistent with openblas starting ncores number of workers.

andreasnoack · 2017-05-19T15:20:41Z

Maybe try out if you can add more workers if you launch with OMP_NUM_THREADS=1 julia

sbromberger · 2017-05-19T15:20:49Z

yes. However:

julia> ccall((:openblas_get_num_threads64_, Base.libblas_name), Cint, ())
1

Still results in

-bash-4.2$ ps -eLf | grep julia | wc -l
50

amitmurthy · 2017-05-19T15:21:00Z

By default each worker sets blas threads to 1 as part of the worker init. A worker may come up with 64 blas threads, but it should immediately be set to 1.

amitmurthy · 2017-05-19T15:22:19Z

How many total threads do you see with julia -p1 ? It should be 64 blas threads on the master and 1 on the worker.

sbromberger · 2017-05-19T15:22:40Z

julia> addprocs(140; topology=:master_slave)
ERROR (unhandled task failure): could not spawn setenv(`/home/seth/dev/julia7/bin/julia -Cx86-64 -J/home/seth/dev/julia7/lib/julia/sys.so --compile=yes --depwarn=yes --bind-to 127.0.0.1 --worker WddJ9NavwZxE2Sa8`; dir="/home/seth"): resource temporarily unavailable (EAGAIN)
Stacktrace:
 [1] _jl_spawn(::String, ::Array{String,1}, ::Ptr{Void}, ::Base.Process, ::Base.DevNullStream, ::Base.PipeEndpoint, ::Base.TTY) at ./process.jl:363
 [2] #373 at ./process.jl:515 [inlined]
 [3] setup_stdio(::Base.##373#374{Cmd}, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:502
 [4] #spawn#372(::Nullable{Base.ProcessChain}, ::Function, ::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:514
 [5] spawn(::Cmd, ::Tuple{Base.DevNullStream,Pipe,Base.TTY}) at ./process.jl:509
 [6] open(::Cmd, ::String, ::Base.DevNullStream) at ./process.jl:588
 [7] open(::Cmd) at ./process.jl:578
 [8] launch(::Base.Distributed.LocalManager, ::Dict{Any,Any}, ::Array{WorkerConfig,1}, ::Condition) at ./distributed/managers.jl:324
 [9] (::Base.Distributed.##31#34{Base.Distributed.LocalManager,Dict{Any,Any},Array{WorkerConfig,1},Condition})() at ./event.jl:73

signal (6): Aborted
while loading no file, in expression starting on line 0
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
init_once at /home/centos/buildbot/slave/package_tarball64/build/deps/srccache/libuv-52d72a52cc7ccd570929990f010ed16e2ec604c8/src/threadpool.c:174
pthread_once at /lib64/libpthread.so.0 (unknown line)
uv_once at /home/centos/buildbot/slave/package_tarball64/build/deps/srccache/libuv-52d72a52cc7ccd570929990f010ed16e2ec604c8/src/unix/thread.c:239
uv__work_submit at /home/centos/buildbot/slave/package_tarball64/build/deps/srccache/libuv-52d72a52cc7ccd570929990f010ed16e2ec604c8/src/threadpool.c:184
uv_getaddrinfo at /home/centos/buildbot/slave/package_tarball64/build/deps/srccache/libuv-52d72a52cc7ccd570929990f010ed16e2ec604c8/src/unix/getaddrinfo.c:186
jl_getaddrinfo at /home/centos/buildbot/slave/package_tarball64/build/src/jl_uv.c:715
getaddrinfo at ./socket.jl:624
unknown function (ip: 0x7fcd1ea110ef)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:348 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:367 [inlined]
jl_invoke at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:41
getaddrinfo at ./socket.jl:637
connect! at ./socket.jl:747
connect at ./stream.jl:982 [inlined]
connect_to_worker at ./distributed/managers.jl:488
unknown function (ip: 0x7fcd1e860682)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:348 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:367 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1923
connect at ./distributed/managers.jl:430
unknown function (ip: 0x7fcd1e85e409)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:348 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:367 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1923
create_worker at ./distributed/cluster.jl:443
setup_launched_worker at ./distributed/cluster.jl:389
#33 at ./task.jl:335
unknown function (ip: 0x7fcd0c09781f)
jl_call_fptr_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:348 [inlined]
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:367 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1923
jl_apply at /home/centos/buildbot/slave/package_tarball64/build/src/julia.h:1422 [inlined]
start_task at /home/centos/buildbot/slave/package_tarball64/build/src/task.c:267
unknown function (ip: 0xffffffffffffffff)
Allocations: 1235851 (Pool: 1234289; Big: 1562); GC: 0
Aborted (core dumped)

and back to shell.

sbromberger · 2017-05-19T15:23:36Z

@amitmurthy I don't think that's what I'm seeing: julia -p1 shows

-bash-4.2$ ps -eLf | grep julia | wc -l
125

(although I have to admit the flags to ps are new to me; I'm assuming they do what the manual says.)

amitmurthy · 2017-05-19T15:27:40Z

With julia -p1 can you try

julia> @everywhere get_num_threads = function() # anonymous so it will be serialized when called
                  blas = BLAS.vendor()
                  # Wrap in a try to catch unsupported blas versions
                  try
                      if blas == :openblas
                          return ccall((:openblas_get_num_threads, Base.libblas_name), Cint, ())
                      elseif blas == :openblas64
                          return ccall((:openblas_get_num_threads64_, Base.libblas_name), Cint, ())
                      elseif blas == :mkl
                          return ccall((:MKL_Get_Max_Num_Threads, Base.libblas_name), Cint, ())
                      end

                      # OSX BLAS looks at an environment variable
                      if is_apple()
                          return ENV["VECLIB_MAXIMUM_THREADS"]
                      end
                  end

                  return nothing
              end

julia> get_num_threads()
4

julia> remotecall_fetch(get_num_threads, 2)
1

yuyichao · 2017-05-19T15:33:43Z

Setting thread to lower number does not seem to destroy the thread.

sbromberger · 2017-05-19T15:33:53Z

julia> get_num_threads()
16

julia> remotecall_fetch(get_num_threads, 2)
1

but still

-bash-4.2$ ps -eLf | grep julia | wc -l
125

sbromberger · 2017-05-19T15:35:51Z

Does it matter that I'm running julia out of the build directory (as opposed to having a make install? That is, /home/seth/dev/julia7/bin/julia.

sbromberger · 2017-05-19T15:37:00Z

technically, the ps output should be 124: It's counting the grep, so

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
124

sbromberger · 2017-05-19T15:37:47Z

With julia -p2, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
184

and with julia -p3, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
244

So it appears it's 64 + (60*p).

yuyichao · 2017-05-19T15:37:57Z

Have you tried #21958 (comment) ?

amitmurthy · 2017-05-19T15:38:55Z

I think @yuyichao is right. It is possible that blas is using the thread count to limit the number of active threads to schedule work on. And probably does not destroy the excess threads.

Also get_num_threads() on the master should ideally be 64, unless again blas is actually creating 64 threads but only using 16 by default.

yuyichao · 2017-05-19T15:40:06Z

And probably does not destroy the excess threads.

According to openblas source, it might also leak them if you set it back up again.

sbromberger · 2017-05-19T15:40:40Z

with OMP_NUM_THREADS=1 julia7, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
49

with -p1, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
102

with -p2, I get

-bash-4.2$ ps -eLf | grep julia | grep -v grep | wc -l
155

So 49 + (53*p)

amitmurthy · 2017-05-19T15:49:01Z

What about with OPENBLAS_NUM_THREADS=1 ?

yuyichao · 2017-05-19T15:50:01Z

also set UV_THREADPOOL_SIZE=1

sbromberger · 2017-05-19T15:51:51Z

What about with OPENBLAS_NUM_THREADS=1 ?

Same numbers as with OMP_NUM_THREADS=1, above.

also set UV_THREADPOOL_SIZE=1

49, 99, and 149, respectively, for default, -p1, and -p2.

sbromberger · 2017-05-19T16:05:42Z

gah. I may have found the issue. One sec.

sbromberger · 2017-05-19T16:12:08Z

Yup, found it.

The issue: a few months ago I was playing around on another machine with multithreading and set export JULIA_NUM_THREADS=48 in my .bash_profile. When my account was created on this machine the admins copied over my home directory. I've removed the entry and now I can do an addprocs(192; topology=:master_slave).

sbromberger · 2017-05-19T16:17:28Z

apologies for having wasted your time on this issue.

amitmurthy · 2017-05-19T16:22:07Z

No worries. Like we set open blas threads to 1 on the workers, we may need a similar level of configurability for num julia threads on the workers too.

StefanKarpinski · 2017-05-19T17:20:22Z

Is there anything we could have done to make it clearer what was going on here?

sbromberger · 2017-05-19T17:28:59Z

@StefanKarpinski the issue was my failure to realize that an environment variable that affected Julia was set. It might be a good idea to have a command that prints the environment variables that have been processed by (and are relevant to) Julia - that would've uncovered this issue immediately.

Really sorry for the distraction here. Environment should've been my first check.

StefanKarpinski · 2017-05-19T17:37:17Z

Having a way to display all significant environment variables would be a good idea.

JeffreySarnoff · 2017-05-19T18:13:57Z

everyone seems comfortable with versioninfo() environmentinfo()?

StefanKarpinski · 2017-05-19T20:05:09Z

Actually, we should just display the values of any Julia-significant environment variables in versioninfo() since that's a thing people are already posting to GitHub when they have issues.

StefanKarpinski · 2017-05-19T20:05:57Z

Case in point: in this issue, @sbromberger would have noticed it right away since the first thing he posted in his report was the output of versioninfo().

sbromberger · 2017-05-19T20:44:52Z

@StefanKarpinski two things:

in this issue, @sbromberger would have noticed it right away

You might be giving me too much credit.

Also, we want to make sure that if we do this, we don't send specific environment variables to versioninfo (ref #21949 (comment))

sbromberger · 2017-05-19T20:50:25Z

Can I hijack for a silly question? What are the advantages to the different topologies? That is, I suppose there are cases where workers communicate with each other without going through the master, but does that happen on a standard pmap?

amitmurthy · 2017-05-20T02:06:13Z

+1 for versioninfo displaying julia specific env.

pmap from the master will only need connectivity to the workers. You can also execute a pmap from a worker in which case it will need worker-worker connectivity.

With the default all-to-all setup and N workers we are looking at N^2 socket fds. I think we need to move to a model where worker-worker connections are setup lazily, i.e., at the time of the first communication which requires a direct worker-to-worker communication.

ararslan added the parallelism Parallel or distributed computation label May 19, 2017

sbromberger closed this as completed May 19, 2017

StefanKarpinski mentioned this issue May 19, 2017

include values of significant environment vars in versioninfo() #21971

Closed

addprocs() consistently fails after 68 on a 64-core system #21958

addprocs() consistently fails after 68 on a 64-core system #21958

Comments

sbromberger commented May 19, 2017

amitmurthy commented May 19, 2017

sbromberger commented May 19, 2017

amitmurthy commented May 19, 2017

sbromberger commented May 19, 2017

sbromberger commented May 19, 2017 • edited Loading

sbromberger commented May 19, 2017

yuyichao commented May 19, 2017

sbromberger commented May 19, 2017

yuyichao commented May 19, 2017

sbromberger commented May 19, 2017 • edited Loading

yuyichao commented May 19, 2017

andreasnoack commented May 19, 2017

sbromberger commented May 19, 2017

amitmurthy commented May 19, 2017

amitmurthy commented May 19, 2017

sbromberger commented May 19, 2017 • edited Loading

sbromberger commented May 19, 2017 • edited Loading

amitmurthy commented May 19, 2017

yuyichao commented May 19, 2017

sbromberger commented May 19, 2017 • edited Loading

sbromberger commented May 19, 2017 • edited Loading

sbromberger commented May 19, 2017

sbromberger commented May 19, 2017 • edited Loading

yuyichao commented May 19, 2017

amitmurthy commented May 19, 2017

yuyichao commented May 19, 2017

sbromberger commented May 19, 2017 • edited Loading

amitmurthy commented May 19, 2017

yuyichao commented May 19, 2017

sbromberger commented May 19, 2017

sbromberger commented May 19, 2017

sbromberger commented May 19, 2017 • edited Loading

sbromberger commented May 19, 2017

amitmurthy commented May 19, 2017

StefanKarpinski commented May 19, 2017

sbromberger commented May 19, 2017

StefanKarpinski commented May 19, 2017

JeffreySarnoff commented May 19, 2017

StefanKarpinski commented May 19, 2017

StefanKarpinski commented May 19, 2017

sbromberger commented May 19, 2017

sbromberger commented May 19, 2017

amitmurthy commented May 20, 2017

sbromberger commented May 19, 2017 •

edited

Loading

sbromberger commented May 19, 2017 •

edited

Loading

sbromberger commented May 19, 2017 •

edited

Loading

sbromberger commented May 19, 2017 •

edited

Loading

sbromberger commented May 19, 2017 •

edited

Loading

sbromberger commented May 19, 2017 •

edited

Loading

sbromberger commented May 19, 2017 •

edited

Loading

sbromberger commented May 19, 2017 •

edited

Loading

sbromberger commented May 19, 2017 •

edited

Loading