Skip to content

Add prterun manpage, cleanup other manpages#773

Merged
jjhursey merged 6 commits intoopenpmix:masterfrom
jjhursey:prterun-md
Mar 9, 2021
Merged

Add prterun manpage, cleanup other manpages#773
jjhursey merged 6 commits intoopenpmix:masterfrom
jjhursey:prterun-md

Conversation

@jjhursey
Copy link
Member

  • Add a prterun.1 manpage that mostly references other pages to
    avoid too much redundancy.
  • Split out the map/rank/bind discussion to a stand alone man
    page since it is shared by prun and prterun (and, one could
    argue PMIx_Spawn in the context of PRTE). Additionally, it
    is quite expansive so this provides a single place to provide
    additional examples and notes on the interplay of these three
    operations.
    • Add a soft link for prte-rank.1 and prte-bind.1 which
      link to prte-map.1 for ease of use.
  • Closes Need to update prun.1 manpage with CLI changes #450

@jjhursey
Copy link
Member Author

This PR reflects the behavior of the master branch as of today. It does not reflect the changes that we wish to make as noted here. We can update the man pages as those changes come in.

@jjhursey
Copy link
Member Author

There are three open issues noted in the documentation:

I can remove those examples from the man pages for now, and we can add them back when the tickets are fixed. Or leave the examples, and when those tickets are fixed they can remove the TODO comment. I'd prefer the latter - that way the author can update the example as needed to reflect actual behavior.

@jjhursey jjhursey force-pushed the prterun-md branch 2 times, most recently from 05bc45d to 40cb52a Compare February 23, 2021 02:49
 * Add a `prterun.1` manpage that mostly references other pages to
   avoid too much redundancy.
 * Split out the map/rank/bind discussion to a stand alone man
   page since it is shared by `prun` and `prterun` (and, one could
   argue `PMIx_Spawn` in the context of PRTE). Additionally, it
   is quite expansive so this provides a single place to provide
   additional examples and notes on the interplay of these three
   operations.
   - Add a soft link for `prte-rank.1` and `prte-bind.1` which
     link to `prte-map.1` for ease of use.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey
Copy link
Member Author

This should be ready for review now (had to fix a couple of build issues). It's probably easier to view the whole files than try to read the diffs since a bunch of stuff moved around.

@jjhursey
Copy link
Member Author

jjhursey commented Feb 23, 2021

PR #774 added the qualifier DONOTRESOLVE and DONOTLAUNCH to --map-by

Note to self - there may be other qualifiers that need to be documented. I just picked up the ones advertised in the --help output.

 * Add some more deprecated CLI options
 * Replace the use of deprecated CLI options with their new forms.
 * Remove the 'physical' mode of rankfile since it does not exist
 * Fix rankfile examples to not use the socket specifier logic since
   it is not working. The new examples just use a core listing.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey
Copy link
Member Author

Looking at the man page after my comment above I noticed a few other deprecated options that needed to be listed. I also cleaned up the rankfile examples.

I commented out the examples associated with open bugs. That makes the manpage correct when rendered, and easy for us to add back those examples once the bugs are fixed.

 * Now that openpmix#772 is fixed
 * Add Difference between overloading and oversubscription

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey
Copy link
Member Author

I just pushed a couple of additions

I'll work on the review comments this evening.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey
Copy link
Member Author

Added back the MIMD example now that Issue #771 is fixed

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey
Copy link
Member Author

Interesting CI failed in cycle with gist

20:30:29 [cffe4b8c5458:03719] UNEXPECTED MESSAGE tag = 39680 from source prte-3cd6154b28f4-346@0:9
20:30:29 --------------------------------------------------------------------------
20:30:29 A received msg header indicates a size that is too large:
20:30:29 
20:30:29   Requested size:  909194548
20:30:29   Size limit:      16777216
20:30:29 
20:30:29 If you believe this msg is legitimate, please increase the
20:30:29 max msg size via the ptl_base_max_msg_size parameter.
20:30:29 --------------------------------------------------------------------------
20:30:29 

There are no code changes in this PR. But this is something we should track down.

bot:ibm:gnu:retest

@jjhursey
Copy link
Member Author

bot:ibm:gnu:retest

@rhc54
Copy link
Contributor

rhc54 commented Feb 26, 2021

Here is my take on that error, though I haven't investigated enough to prove/disprove it. You are running a lot of prun calls in tight sequence. The socket timeout is fairly long in comparison, and every so often the new prun gets an old socket that hasn't been harvested yet. So there is leftover cruft sitting in that socket which manifests itself as an "unexpected message" as soon as prun opens it. Because it is incomplete, the bytes are totally out of sequence from what we expect, resulting in our parsing it into a ridiculous size.

Issue would be: what happens if the leftover bytes didn't result in a ridiculous size, and we actually tried to parse that as a legitimate message? My guess would be: ouch!

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
@jjhursey
Copy link
Member Author

I just pushed the clarification from #673 (comment)

@jjhursey
Copy link
Member Author

For the error I mentioned above I filed #787 for further discussion/debug - since it's not related to this PR.

@jjhursey
Copy link
Member Author

jjhursey commented Mar 2, 2021

Reviewers: This is ready for review.

There is one outstanding bug (Issue #770) with max_slots that I'm working on. I'll update the man page with that fix.

There are probably other improvements to the man page that we can bring in over time as well.


: Synonym for `--wdir`.

`-x <env>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like CLI details are in flux and tricky with OMPI/PRRTE separation (#795), but a simple "requires --personality ompi" in the description for -x here might save a lot of pain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. @rhc54 is -x intended to only be for the OMPI personality? It seems like something that should be part of the core.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see the discussion here and filed Issue #801 to fix the code.

This was referenced Mar 5, 2021
@jjhursey
Copy link
Member Author

jjhursey commented Mar 8, 2021

We are starting to get conflicts with other PRs making changes to the man pages. Can I get a review on this so we can all work from the same base?

Copy link
Contributor

@rhc54 rhc54 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - we can always update as we go, but this is a solid base. Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Need to update prun.1 manpage with CLI changes

4 participants