Skip to content

Conversation

@ggouaillardet
Copy link
Contributor

If PMIx is unreachable, but a PMI1/2 or SLURM environment is detected,
issue a warning before "falling back" to singleton mode.

Refs. #10286

Signed-off-by: Gilles Gouaillardet [email protected]

If PMIx is unreachable, but a PMI1/2 or SLURM environment is detected,
issue a warning before "falling back" to singleton mode.

Refs. open-mpi#10286

Signed-off-by: Gilles Gouaillardet <[email protected]>
@ggouaillardet
Copy link
Contributor Author

@bwbarrett here is an idea on how to tackle #10286

We could also add a MCA parameter to select which action should be taken:

  • none
  • warn
  • abort

FWIW, I tested both SLURM and PMI in order to support

  • flux: PMI but no necessarily SLURM
  • SLURM with no default mpi option (or srun --mpi=none -n 2 a.out)

rank_str = getenv("SLURM_PROCID");
}
int rank = (NULL != rank_str)?atoi(rank_str):0;
if (0 == rank) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this really is rank 0? Should -1 be used instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale for being 0 is we do our best to limit the warning message to rank 0. But if we cannot figure out the rank (likely caused by a busted environment?) I'd rather have all the ranks print the warning message than none.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ggouaillardet on this one.

if (NULL == size_str) {
size_str = getenv("SLURM_NPROCS");
}
int size = (NULL != size_str)?atoi(size_str):1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you made my brain hurt for no reason :).

rank_str = getenv("SLURM_PROCID");
}
int rank = (NULL != rank_str)?atoi(rank_str):0;
if (0 == rank) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ggouaillardet on this one.

@awlauria awlauria merged commit 353153e into open-mpi:main Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants