Skip to content

error message when no btl is available and cutoff is used #1501

@ggouaillardet

Description

@ggouaillardet

Here is the previous behavior
(running two tasks on two nodes with only sm and self btl, and cutoff=1024)

obviously, this cannot work, and the error message is explicit

$ mpirun --hostn1,n2 -np 2 --mca mpi_add_procs_cutoff 1024 --mca btl  sm,self ./hw
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[39443,1],0]) is on host: n1
  Process 2 ([[39443,1],1]) is on host: n2
  BTLs attempted: self

Your MPI job is now going to abort; sorry.

since default cutoff is now zero, there is no such check at MPI_Init() time, but such check is missing at runtime.
a cryptic error message is issued followed by a crash

$ mpirun --host n1,n2 -np 2 --mca mpi_add_procs_cutoff 0 --mca btl  sm,self ./hw
[n1:03052] mca_bml_base_btl_array_get_next: invalid array size
[n2:01380] mca_bml_base_btl_array_get_next: invalid array size
There are 2 tasks
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3052 on node n1 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

@rhc54 @hjelmn FYI

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions