-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addprocs() consistently fails after 68 on a 64-core system #21958
Comments
It is a possibility that system limits are being hit. What is the output of |
|
Hmm, I still think it has something to do with system limits for your account. FWIW, addprocs(70) works fine on my local OSX machine and I know of others who launching upwards of 100 workers on a single box. I assume the failing |
|
however, the next
and then about a minute later,
|
When I try doing
repeated. This seems to suggest I'm hitting that max user process limit, but outside of two shells and a |
Note that |
@yuyichao I don't know. I'm starting Julia up without any flags.... |
I mean you can check with htop / ps how many threads are you using. |
when I do a
That seems... high. Edit: oops. I forgot to
|
That seems consistent with openblas starting ncores number of workers. |
Maybe try out if you can add more workers if you launch with |
yes. However:
Still results in
|
By default each worker sets blas threads to 1 as part of the worker init. A worker may come up with 64 blas threads, but it should immediately be set to 1. |
How many total threads do you see with |
and back to shell. |
@amitmurthy I don't think that's what I'm seeing:
(although I have to admit the flags to |
With julia -p1 can you try
|
Setting thread to lower number does not seem to destroy the thread. |
but still
|
Does it matter that I'm running julia out of the build directory (as opposed to having a |
technically, the
|
With
and with
So it appears it's |
Have you tried #21958 (comment) ? |
I think @yuyichao is right. It is possible that blas is using the thread count to limit the number of active threads to schedule work on. And probably does not destroy the excess threads. Also |
According to openblas source, it might also leak them if you set it back up again. |
with
with
with
So |
What about with |
also set |
Same numbers as with
|
gah. I may have found the issue. One sec. |
Yup, found it. The issue: a few months ago I was playing around on another machine with multithreading and set |
apologies for having wasted your time on this issue. |
No worries. Like we set open blas threads to 1 on the workers, we may need a similar level of configurability for num julia threads on the workers too. |
Is there anything we could have done to make it clearer what was going on here? |
@StefanKarpinski the issue was my failure to realize that an environment variable that affected Julia was set. It might be a good idea to have a command that prints the environment variables that have been processed by (and are relevant to) Julia - that would've uncovered this issue immediately. Really sorry for the distraction here. Environment should've been my first check. |
Having a way to display all significant environment variables would be a good idea. |
everyone seems comfortable with |
Actually, we should just display the values of any Julia-significant environment variables in |
Case in point: in this issue, @sbromberger would have noticed it right away since the first thing he posted in his report was the output of |
@StefanKarpinski two things:
You might be giving me too much credit. Also, we want to make sure that if we do this, we don't send specific environment variables to versioninfo (ref #21949 (comment)) |
Can I hijack for a silly question? What are the advantages to the different topologies? That is, I suppose there are cases where workers communicate with each other without going through the master, but does that happen on a standard |
+1 for
With the default all-to-all setup and N workers we are looking at N^2 socket fds. I think we need to move to a model where worker-worker connections are setup lazily, i.e., at the time of the first communication which requires a direct worker-to-worker communication. |
On an 8-CPU system, each CPU having 8 cores, with a total single memory of 0.5TB:
So far this is 100% repeatable after 68 processes.
The text was updated successfully, but these errors were encountered: