Skip to content

Conversation

@devreal
Copy link
Contributor

@devreal devreal commented Nov 16, 2020

While testing HAN with ADAPT I noticed that ADAPT is not selected without jumping through some hoops:

  1. The preference treatment in coll/base should append the coll modules to the selectable list (the list is order in ascending priority order) and it should preserve the priority provided by the user (decreasing priority order in the ompi_comm_coll_preference info key).
  2. Set ADAPT's and SMs default priority to 1 to allow for selection by HAN without also setting the coll_adapt_priority and coll_sm_priority MCA parameter.

@devreal devreal requested a review from bosilca November 16, 2020 19:34
@devreal devreal force-pushed the fix-coll-base-preference branch 2 times, most recently from e13e0a7 to 476a6e5 Compare November 16, 2020 20:22
@devreal
Copy link
Contributor Author

devreal commented Nov 16, 2020

I removed the priority bump for coll/sm because latency goes through the roof when used with HAN in my tests. This needs some more investigation.

The selectable list is sorted with lowest to highest priority so the
user-defined preferences should be appended to the list.
The preference treatment should also maintain the order provided by the user
(first item has highest priority) so switch the loop order.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
…e components

Also make coll/tuned the default for shared memory communication
as coll/sm has shown performance issues that need investigation.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
This has shown to be more effective in achieving overlap
of inter- and intra-node communication and reduces the inital
delay before hitting the network.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
@devreal devreal force-pushed the fix-coll-base-preference branch from 476a6e5 to 1cdc855 Compare November 19, 2020 18:23
@devreal
Copy link
Contributor Author

devreal commented Nov 19, 2020

Some more small changes that would be good to have in 4.1:

  1. Removed references to experimental collective modules shared and solo that are not available
  2. Made sure coll/sm, coll/han, and coll/han are selectable with their default priorities
  3. Changed default segment size for reduce and allreduce, which has reduced latency in my experiments.

@rajachan rajachan merged commit 38d2f12 into open-mpi:master Nov 20, 2020
@devreal devreal deleted the fix-coll-base-preference branch October 3, 2022 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants