Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi data gatherer #1376

Merged
merged 4 commits into from
Sep 16, 2020
Merged

mpi data gatherer #1376

merged 4 commits into from
Sep 16, 2020

Conversation

rhodrin
Copy link
Contributor

@rhodrin rhodrin commented Jul 6, 2020

This PR adds a method to Function's to enable the gathering of distributed Data onto a single rank.

NOTE: PR'ing for CI spin - needs lots of clean-up, fixes and many more tests. Will update here with more details when appropriate.

@rhodrin rhodrin added enhancement WIP Still work in progress testing MPI mpi-related labels Jul 6, 2020
@codecov
Copy link

codecov bot commented Jul 6, 2020

Codecov Report

Merging #1376 into master will decrease coverage by 10.13%.
The diff coverage is 14.28%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1376       +/-   ##
===========================================
- Coverage   86.27%   76.14%   -10.14%     
===========================================
  Files         193      187        -6     
  Lines       27777    27432      -345     
  Branches     3758     3741       -17     
===========================================
- Hits        23965    20888     -3077     
- Misses       3387     6075     +2688     
- Partials      425      469       +44     
Impacted Files Coverage Δ
devito/data/data.py 49.01% <8.10%> (-45.61%) ⬇️
tests/test_data.py 40.69% <16.94%> (-57.00%) ⬇️
devito/types/dense.py 82.82% <50.00%> (-10.87%) ⬇️
tests/test_mpi.py 0.96% <0.00%> (-92.07%) ⬇️
devito/mpi/routines.py 25.91% <0.00%> (-69.21%) ⬇️
devito/data/utils.py 34.93% <0.00%> (-57.65%) ⬇️
devito/mpi/distributed.py 54.83% <0.00%> (-36.56%) ⬇️
tests/conftest.py 52.94% <0.00%> (-31.94%) ⬇️
... and 55 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 65babe9...14550f6. Read the comment docs.

@@ -174,9 +175,9 @@ def __str__(self):
return super(Data, self._local).__str__()

@_check_idx
def __getitem__(self, glb_idx, comm_type):
def __getitem__(self, glb_idx, comm_type, _gather=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable name == method name ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the whole variable possibly needs an overhaul.

So I'm thinking should it be allowed to be a tuple? i.e. you can gather simultaneously on multiple ranks. I'm not sure how often users would want this though + it adds a decent bit of complexity to the routine to save users (probably?) a couple of lines of code.

So would only be worth it if users are going to frequently do something along the lines of

a = f.data_gather(rank='all')

local_dat = recval.wait()
loc_ind = local_si[local_dat[0]]
retval.data[loc_ind] = local_dat[1]
# TODO: These statements can potentially be fused?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I can see there's some overlap... also the while-if-elif-else structure is identical AFAICT. We might want to abstract it away into a separate routine which takes callbacks (the bodies each each if-branch) as input? or is it just painful over-engineering?
in the future, we might want to add other gather-like operations -- scatter, broadcast, ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is identical at the moment. It possibly should stay identical depending on what functionality we want to allow - but we can discuss that.

We might want to abstract it away into a separate routine

Yes, we definitely want to do that - it will make the fusion much cleaner. I'm just keeping them separate for the time being until I get all required functionality working with the gather, after that I'll have a good idea re. the best way to fuse them.

@rhodrin rhodrin added RFC and removed POC WIP Still work in progress testing labels Sep 14, 2020
Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments left, but basically lovely and ready to go

def _gather(self, start=None, stop=None, step=1, rank=0):
"""
Method for gathering distributed data into a NumPy array
on a single rank. See the the public ``data_gather`` method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double the

devito/types/dense.py Show resolved Hide resolved
The final point of the `slice` to include.
Notes
-----
Notes will go here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

maybe also add something along the lines of "users should be aware that calling this method might blow up the memory since... so use it judiciously" -- what do you say?

loc_idx = self._index_glb_to_loc(glb_idx)
if comm_type is index_by_index:
if comm_type is index_by_index or isinstance(gather, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a default gather=True equivalent to gather=0? Or is it overkill

loc_idx = self._index_glb_to_loc(glb_idx)
if comm_type is index_by_index:
if comm_type is index_by_index or isinstance(gather, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is a lot of isinstance(gather, int), wouldn't it be nicer to have gather = int(gather) if isinstance(gather, int) else None at the beginning and then only needs if gather everywhere below. The int(gather) probably not necessary actually but just here for safety.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gather can be, and often is, 0 which is why is instance isinstance is used. But we could have the kwarg be gather_rank and then introduce gather as a True/False - leads to one more line of code, but code is then a bit tidier I guess.

@pytest.mark.parallel(mode=4)
@pytest.mark.parametrize('rank', [0, 1, 2, 3])
def test_simple_gather(self, rank):
""" Fill me in."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. the 3 instances of it will be fixed on next push.

loc_idx = self._index_glb_to_loc(glb_idx)
if comm_type is index_by_index:
gather = True if isinstance(gather_rank, int) else False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking for code homogeneity, perhaps for another PR: typically we call these variables "is_..." in this case "is_gather". The "is_" encodes the information that the variable is a boolean

(1, 8, 3),
((0, 4, 4), None, (2, 1, 1))])
def test_sliced_gather_3D(self, start, stop, step):
""" Test gather for various 2D slices."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3D 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, bleh, thanks. I'll add this and the thing @FabioLuporini mentioned to #1457.

@rhodrin rhodrin merged commit 994aff7 into master Sep 16, 2020
@rhodrin rhodrin deleted the data_gather branch January 5, 2021 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MPI mpi-related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants