Skip to content

schizo/ompi: add ompi5 personality to prterun#795

Closed
acolinisi wants to merge 1 commit intoopenpmix:masterfrom
acolinisi:PR--schizo-ompi5-personality-for-prterun
Closed

schizo/ompi: add ompi5 personality to prterun#795
acolinisi wants to merge 1 commit intoopenpmix:masterfrom
acolinisi:PR--schizo-ompi5-personality-for-prterun

Conversation

@acolinisi
Copy link
Contributor

Fixes propagation of environment variables with prterun -x.

Note that propagation already works for mpirun -x without this commit,
and does NOT work for prun -x without or with this commit.

$ cat echo-FOO.sh
echo $(hostname): $FOO

$ env FOO=bar mpirun -x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
e25n15: bar

Before this commit:

$ env FOO=bar prterun -x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
e25n15:

$ env FOO=bar prterun --mca schizo_base_personalities prte,ompi5 \
	-x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
e25n15: bar

After this commit:

$ env FOO=bar prterun -x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
e25n15: bar

Note: the only known workaround for prun -x is to explicitly add the
personality via an MCA parameter:

$ prte --daemonize
$ env FOO=bar prun  --mca schizo_base_personalities prte,ompi5 \
	-x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
e25n15: bar
$ pterm

Sidenote: without :NOLOCAL, we see env var propagate to the HNP (batch5)
node, but not to the worker compute nodes. Not attaching output without
:NOLOCAL because at the time of this commit it is broken for an
unrelated reason (see note in PR #793), but the behavior without :NOLOCAL has been observed on
an earlier version of master (2020-12-02). In short, env var propagation
on HNP is different because it doesn't involve the whole path through
ssh launcher, followed by prted daemon, followed by launch.

Signed-off-by: Alexei Colin acolin@isi.edu

Fixes propagation of environment variables with `prterun -x`.

Note that propagation already works for `mpirun -x` without this commit,
and does NOT work for `prun -x` without or with this commit.

	$ cat echo-FOO.sh
	echo $(hostname): $FOO

	$ env FOO=bar mpirun -x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
	e25n15: bar

Before this commit:

	$ env FOO=bar prterun -x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
	e25n15:

	$ env FOO=bar prterun --mca schizo_base_personalities prte,ompi5 \
		-x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
	e25n15: bar

After this commit:

	$ env FOO=bar prterun -x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
	e25n15: bar

Note: the only known workaround for `prun -x` is to explicitly add the
personality via an MCA parameter:

	$ prte --daemonize
	$ env FOO=bar prun  --mca schizo_base_personalities prte,ompi5 \
		-x FOO -n 1 --map-by ppr:1:node:NOLOCAL bash echo-FOO.sh
	e25n15: bar
	$ pterm

Sidenote: without :NOLOCAL, we see env var propagate to the HNP (batch5)
node, but not to the worker compute nodes. Not attaching output without
:NOLOCAL because at the time of this commit it is broken for an
unrelated reason, but the behavior without :NOLOCAL has been observed on
an earlier version of master (2020-12-02). In short, env var propagation
on HNP is different because it doesn't involve the whole path through
ssh launcher, followed by prted daemon, followed by launch.

Signed-off-by: Alexei Colin <acolin@isi.edu>
@jjhursey
Copy link
Member

jjhursey commented Mar 3, 2021

ok to test

@rhc54
Copy link
Contributor

rhc54 commented Mar 3, 2021

Not sure we really want to do this - what we are doing is proclaiming prterun to be an OMPI tool so that PMIx picks up and forwards envars for it. That just seems wrong.

The -x option is, at least right now, considered to be a strictly OMPI option. prterun should therefore not accept that option unless specifically told to operate as an OMPI proxy. We may decide to change that someday - at that time, it would be better to add the option directly to the prte personality instead of confusing things by declaring a prtetool to be an OMPI one.

@rhc54
Copy link
Contributor

rhc54 commented Mar 4, 2021

I've taken a look at this and I'm convinced we really don't want prterun doing this by default. We do want prterun --personality ompi to do it, but that seems to already work (it is the equivalent of mpirun).

The prun issue is the same thing. It seems to work if you specify --personality ompi, as it should since -x is solely an OMPI option. I therefore think we won't take this patch.

@rhc54 rhc54 closed this Mar 4, 2021
@alexeicolin
Copy link

Thanks for looking into it. Probably deserves opening an issue for documentation, then, because -x is in the man prun and prterun -h and prun -h. And it currently takes gdb and unspoken amount of time to find out the magic --personality ompi or --mca schizo_base_personalities prte,ompi5 incantation.

If the fix is to remove -x, then RIP -x the useful flag -- I will resurrect you even if it means carrying a patch, because wrapping commands with env VAR="value" command, or wrapping commands with a script would be inferior.

@rhc54
Copy link
Contributor

rhc54 commented Mar 4, 2021

-x hasn't died - it was always just an OMPI cmd line option, never a PRRTE one. Now that OMPI is adopting PRRTE as its runtime, we have to accommodate both communities. Hence the personality designator.

@jjhursey has been updating the documentation - see #773

@alexeicolin
Copy link

Then, should I open an issue to "add an option for forwarding env vars to PRRTE, specifically to prun" since it doesn't have one? Is prun --personality ompi a workaround until that feature is added or does one essentially depend on OMPI feature set when using PRRTE tools (prun)?

@rhc54
Copy link
Contributor

rhc54 commented Mar 4, 2021

Then, should I open an issue to "add an option for forwarding env vars to PRRTE, specifically to prun" since it doesn't have one? Is prun --personality ompi a workaround until that feature is added or does one essentially depend on OMPI feature set when using PRRTE tools (prun)?

It was never really intended to be a workaround. We honestly haven't had time to fully complete the separation of cmd line options - which should be OMPI-only, which PRRTE-only, and which are both. It is a little hard because the community is predominantly (but not entirely) composed of OMPI users right now, and so we sorta want all our favorite options to be easily available.

However, MPICH and other groups are beginning to migrate this direction, and they have very different ideas about the cmd line. So we've been somewhat hesitant to "go too far" one way.

All that said, I would indeed suggest opening an issue requesting we make -x a PRRTE cmd line option. Having another voice push for that direction only helps us to get off the fence 😄

@jjhursey
Copy link
Member

jjhursey commented Mar 5, 2021

👍

All that said, I would indeed suggest opening an issue requesting we make -x a PRRTE cmd line option. Having another voice push for that direction only helps us to get off the fence 😄

I filed Issue #801 to work on this change. It shouldn't take much to do. I can put it on my to-do list for next week unless someone gets to it before me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants