Fix some issues with dynamic algorithm selection in coll/tuned #8198

devreal · 2020-11-10T21:30:33Z

This PR addresses a potential performance issue with the algorithm selection in coll/tuned and some minor issues found while digging into it:

Performance: some bcast and allreduce algorithms require the number of elements to be larger than the number of ranks and will fall back to a linear implementation if that is not the case. This hit me bitterly with 4B bcast on 128 ranks on a single node where latency increased 10x compared to 127 ranks.
The documentation for the coll_tuned_*_algorithm MCA variables should mention that they only take effect if the coll_tuned_use_dynamic_rules variable is set to true.
Mark global static array used for the MCA parameters as const.
Fix some glitches in comments.

Backport of #8186 to v4.1.x

The mca parameters coll_tuned_*_algorithm are ignored unless coll_tuned_use_dynamic_rules is true so mention that in the description. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 06f605c)

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 7261255)

…d fall back to linear Bcast: scatter_allgather and scatter_allgather_ring expect N_elem >= N_procs Allreduce: rabenseifner expects N_elem >= pow2 nearest to N_procs In all cases, the implementations will fall back to a linear implementation, which will most likely yield the worst performance (noted for 4B bcast on 128 ranks) Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 04d198f)

devreal · 2020-11-11T17:03:05Z

I added a commit that removes the selection of linear algorithms in allreduce and allgather. In my measurements the latency for these ranges is higher than necessary and I don't see how that is motivated by previous measurements (it seems unlikely to me that linear algorithms perform well at several dozens or hundreds of ranks).

rajachan · 2020-11-11T17:13:30Z

@devreal ICYMI, something was off with allgatherv too (I'd tested with the 4.1.x branch #8186 (comment)). Is that something you are seeing?

devreal · 2020-11-11T17:27:13Z

@rajachan I have not yet looked at allgatherv. I can run some tests for that over night and see. Do remember at what scales things were weird?

rajachan · 2020-11-11T17:34:42Z

I was running with ~1K ranks (32 nodes with 36 ranks per node).

rajachan · 2020-11-11T17:37:06Z

Btw, your master PR is missing the allreduce/allgather commit.

devreal · 2020-11-11T17:39:55Z

Btw, your master PR is missing the allreduce/allgather commit.

Oops, pushed to the wrong branch. Will fix in a minute

wckzhang · 2020-11-11T17:40:50Z

Btw, your master PR is missing the allreduce/allgather commit.

Nice catch

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 22e289b)

…lgather These selections seem harmful in my measurements and don't seem to be motivated by previous measurement data. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit a15e5dc)

devreal added 3 commits November 10, 2020 22:26

coll/tuned: add hint about dynamic rules to mca parameters

a9ede52

The mca parameters coll_tuned_*_algorithm are ignored unless coll_tuned_use_dynamic_rules is true so mention that in the description. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 06f605c)

coll/tuned: Mark global static algorithm as const

4a3f2af

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 7261255)

devreal added the Target: v4.1.x label Nov 10, 2020

devreal requested review from bosilca, rajachan and wckzhang November 10, 2020 21:30

wckzhang approved these changes Nov 10, 2020

View reviewed changes

bosilca approved these changes Nov 10, 2020

View reviewed changes

rajachan added RM approved and removed RM approved labels Nov 11, 2020

devreal added 2 commits November 11, 2020 18:42

coll/tuned: fix minor errors in comments

aec55f1

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu> (cherry picked from commit 22e289b)

devreal force-pushed the fix-tuned-dynamic-v4.1.x branch from 0f89397 to 3cae9f7 Compare November 11, 2020 17:43

jsquyres added the RM approved label Nov 12, 2020

jsquyres merged commit 6f21a39 into open-mpi:v4.1.x Nov 12, 2020

it-is-a-robot mentioned this pull request Nov 30, 2021

Huawei 1230 kunpengcompute/hmpi#30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix some issues with dynamic algorithm selection in coll/tuned #8198

Fix some issues with dynamic algorithm selection in coll/tuned #8198

Uh oh!

devreal commented Nov 10, 2020

Uh oh!

devreal commented Nov 11, 2020

Uh oh!

rajachan commented Nov 11, 2020

Uh oh!

devreal commented Nov 11, 2020

Uh oh!

rajachan commented Nov 11, 2020

Uh oh!

rajachan commented Nov 11, 2020

Uh oh!

devreal commented Nov 11, 2020

Uh oh!

wckzhang commented Nov 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix some issues with dynamic algorithm selection in coll/tuned #8198

Fix some issues with dynamic algorithm selection in coll/tuned #8198

Uh oh!

Conversation

devreal commented Nov 10, 2020

Uh oh!

devreal commented Nov 11, 2020

Uh oh!

rajachan commented Nov 11, 2020

Uh oh!

devreal commented Nov 11, 2020

Uh oh!

rajachan commented Nov 11, 2020

Uh oh!

rajachan commented Nov 11, 2020

Uh oh!

devreal commented Nov 11, 2020

Uh oh!

wckzhang commented Nov 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants