Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi: Enhance flexibility for custom topologies #2134

Merged
merged 9 commits into from
Jun 15, 2023

Conversation

georgebisbas
Copy link
Contributor

This PR enhances the available custom topology MPI decompositions

The requirements for the number of stars to evenly divide the number of MPI procs are dropped.
You can see some topology to decomposition combos in the added tests

@georgebisbas georgebisbas added the MPI mpi-related label May 22, 2023
@georgebisbas georgebisbas self-assigned this May 22, 2023
@georgebisbas georgebisbas force-pushed the test_custom_topology_w_star branch from 0a8ecf0 to c8d3b54 Compare May 22, 2023 16:34
@codecov
Copy link

codecov bot commented May 22, 2023

Codecov Report

Merging #2134 (f26e986) into master (9b9c45e) will decrease coverage by 0.03%.
The diff coverage is 35.71%.

@@            Coverage Diff             @@
##           master    #2134      +/-   ##
==========================================
- Coverage   87.10%   87.07%   -0.03%     
==========================================
  Files         223      223              
  Lines       39813    39832      +19     
  Branches     5166     5169       +3     
==========================================
+ Hits        34679    34685       +6     
- Misses       4556     4569      +13     
  Partials      578      578              
Impacted Files Coverage Δ
examples/seismic/test_seismic_utils.py 100.00% <ø> (ø)
devito/mpi/distributed.py 87.37% <25.00%> (-2.86%) ⬇️
tests/test_mpi.py 98.85% <57.14%> (-0.23%) ⬇️
devito/data/data.py 95.79% <100.00%> (ø)

devito/mpi/distributed.py Outdated Show resolved Hide resolved
# Decompose the processes remaining for allocation to prime factors
alloc_procs = np.prod([i for i in items if i != '*'])
remprocs = int(input_comm.size // alloc_procs)
prime_factors = primefactors(remprocs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit overly intricate.

If you use factorint(remproc) you get directly the dict of factors that you just need to split i.e

factors = factorint(remprocs)
vals = [k for (k, v) in factors.items() for _ in range(v)]
vals = vals + [1 for _ in range(nstars - len(vals))]
startvals = (*vals[:nstart-1], prod(vals[nstars-1:])

and should be it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not realy, since you do not prioritise neither the outermost dimension, nor the overall balance:
e.g. your approach will give:

FAILED tests/test_mpi.py::TestDistributor::test_custom_topology_3d_dummy[24-topology20-dist_topology20] - assert (2, 2, 6) == (6, 2, 2)
FAILED tests/test_mpi.py::TestDistributor::test_custom_topology_3d_dummy[32-topology21-dist_topology21] - assert (2, 2, 8) == (4, 4, 2)

Copy link
Contributor

@mloubout mloubout Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just a minor change to the last line to cycle instead of multiply the remainder

split =  np.array_split(vals, nstar)
starvals = np.prod([np.pad(s, (0, nstar-len(s)), constant_values=1) for s in split], dims=1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And pad (nstar-len(s), 0) if you want it left heavy instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I pushed a simplified version, lemme know if you like it better.
I would dare to say that I consider your approach less readable, at least for my eyes.
btw, I could not make it work?
Would you mind to try locally your code?

@georgebisbas georgebisbas force-pushed the test_custom_topology_w_star branch from 15761fe to 3ae6798 Compare June 4, 2023 12:42
tests/test_mpi.py Outdated Show resolved Hide resolved
@georgebisbas georgebisbas force-pushed the test_custom_topology_w_star branch from 3ae6798 to 958579c Compare June 5, 2023 09:00
Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several improvements possible in my opinion

tests/test_mpi.py Outdated Show resolved Hide resolved
devito/mpi/distributed.py Outdated Show resolved Hide resolved
for index, value in zip(int_pos, int_vals):
processed[index] = value

if dd_list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need this if?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it's probably doable with a comprehension

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? Like how without iterating O(len(processed))?

devito/mpi/distributed.py Outdated Show resolved Hide resolved
the outermost dimension

Assuming N=6 and requested topology is `('*', '*', 1)`,
since there is no integer k, so that k*k=6, we resort to the closest factors to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rewrite this (grammarly maybe)? it's not very clear to me at first glance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified it to let the examples speak by themselves

devito/mpi/distributed.py Outdated Show resolved Hide resolved
# Start by using the max prime factor at the first starred position,
# then cyclically-iteratively decompose as evenly as possible until
# decomposing to the number of `remprocs`
while remprocs != 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking: if one passes a dummy comm with .size = 0, this remains entrapped in an infinite loop. Safer to use : while remprocs > 1

Also, I see an inconsisten use of _ in variable names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to rem_procs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing 0 is bad use, maybe we can jump to the assert statement directly ?
Asking cutom topo on 0 ranks looks bad to me

devito/mpi/distributed.py Outdated Show resolved Hide resolved
# Decompose the processes remaining for allocation to prime factors
alloc_procs = np.prod([i for i in items if i != '*'])
remprocs = int(input_comm.size // alloc_procs)
prime_factors = primefactors(remprocs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having this both here and inside the while loop, you should have it only once, at the top of the while loop...

which, if necessary, you can turn into:

while True:
    if rem_procs > 1:
        break
    ...

thus emulating a do-while loop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably same story with rem_procs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped and improved, lemme know if it is fine

try:
assert np.prod(processed) == input_comm.size
except:
raise ValueError("Invalid `topology`", processed, " for given nprocs:",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's an assert, why bothering with this? Just leave the assert alone and that's it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be more helpful as a message to the user?
I can drop if you feel so

@georgebisbas georgebisbas force-pushed the test_custom_topology_w_star branch from 5fd03f7 to 83e8640 Compare June 8, 2023 15:30
@georgebisbas georgebisbas force-pushed the test_custom_topology_w_star branch from 14b1fdc to f26e986 Compare June 9, 2023 10:58
@FabioLuporini FabioLuporini merged commit 4c54253 into master Jun 15, 2023
@FabioLuporini
Copy link
Contributor

Merged, thanks

@FabioLuporini FabioLuporini deleted the test_custom_topology_w_star branch June 15, 2023 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MPI mpi-related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants