Skip to content

v4.0.0 hangs for simple message send & recv in mca_btl_vader_component_progress? #6258

@q-p

Description

@q-p

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)

v4.0.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Installed from source, but the same error occurs with the installation from homebrew on macOS. The configure options were all default (i.e. ./configure --prefix=... && make -j10 && make install).

The compiler used to compile Open MPI and the example was the system compiler, which is gcc 4.3.4.

Please describe the system on which you are running

  • Operating system/version: Linux (SLED11SP2)
  • Computer hardware: x86_64 (Intel Xeon E3-1276)
  • Network type: n/a (Ethernet)

Details of the problem

A relatively simple case (repo attached) involving 2 processes -- one doing a (non-blocking) send followed by a wait, the other doing a matching (non-blocking) recv followed by a wait -- will hang once the message exceeds a certain size (6185592 bytes => OK, 6185593 bytes => hang).

When the send & recv are changed to their blocking counterpart, the hang still occurs.

The problem did not occur in previous versions of Open MPI, in particular 3.1.3 seems fine.

#include <stdio.h>

#include "mpi.h"

static const MPI_Datatype Datatype = MPI_PACKED;
static const int Tag = 42;
static const int RecvProc = 0;
static const int SendProc = 1;

// 6185592 does not hang w/ Open-MPI 4.0.0, 6185593 does hang in the Wait()s
#define MessageSize (6185592 + 1)
static unsigned char data[MessageSize] = {0};

int main(int argc, char *argv[])
{
  MPI_Init(&argc, &argv);
  MPI_Comm comm = MPI_COMM_WORLD;
  
  int myID = 0;
  int nProcs = 1;
  MPI_Comm_size(comm, &nProcs);
  MPI_Comm_rank(comm, &myID);

  if (nProcs != 2)
  {
    if (myID == 0)
      printf("Must be run on 2 procs\n");
    MPI_Finalize();
    return -1;
  }

  int result = 0;
  if (myID == RecvProc)
  {
    MPI_Status probeStatus;
    result = MPI_Probe(SendProc, MPI_ANY_TAG, comm, &probeStatus);
    printf("[%i] MPI_Probe => %i\n", myID, result);
    int size = 0;
    result = MPI_Get_count(&probeStatus, Datatype, &size);
    printf("[%i] MPI_Get_count => %i, size = %i\n", myID, result, size);

    MPI_Request recvRequest;
    result = MPI_Irecv(data, size, Datatype, SendProc, Tag, comm, &recvRequest);
    printf("[%i] MPI_Irecv(size = %i) => %i\n", myID, size, result);
    MPI_Status recvStatus;
    result = MPI_Wait(&recvRequest, &recvStatus);
    printf("[%i] MPI_Wait => %i\n", myID, result);
  }
  else
  { // myID == SendProc
    MPI_Request sendRequest;
    result = MPI_Isend(data, MessageSize, Datatype, RecvProc, Tag, comm, &sendRequest);
    printf("[%i] MPI_Isend(size = %i) => %i\n", myID, MessageSize, result);
    MPI_Status sendStatus;
    result = MPI_Wait(&sendRequest, &sendStatus);
    printf("[%i] MPI_Wait => %i\n", myID, result);
  }

  printf("[%i] Done\n", myID);
  MPI_Finalize();
  return 0;
}

open-mpi4_hang_repo.c.zip

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions