-
Notifications
You must be signed in to change notification settings - Fork 937
Closed
Description
Here is the previous behavior
(running two tasks on two nodes with only sm and self btl, and cutoff=1024)
obviously, this cannot work, and the error message is explicit
$ mpirun --hostn1,n2 -np 2 --mca mpi_add_procs_cutoff 1024 --mca btl sm,self ./hw
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[39443,1],0]) is on host: n1
Process 2 ([[39443,1],1]) is on host: n2
BTLs attempted: self
Your MPI job is now going to abort; sorry.
since default cutoff is now zero, there is no such check at MPI_Init() time, but such check is missing at runtime.
a cryptic error message is issued followed by a crash
$ mpirun --host n1,n2 -np 2 --mca mpi_add_procs_cutoff 0 --mca btl sm,self ./hw
[n1:03052] mca_bml_base_btl_array_get_next: invalid array size
[n2:01380] mca_bml_base_btl_array_get_next: invalid array size
There are 2 tasks
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3052 on node n1 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Metadata
Metadata
Assignees
Labels
No labels