Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: Revamp code generation for asynchronous operations #2376

Merged
merged 58 commits into from
Jun 18, 2024

Conversation

FabioLuporini
Copy link
Contributor

This PR is the result of the last 40 days of work, more or less. It would deserve to be split over multiple PRs, as what I've done here is really really bad, but point is, 1) I didn't have the time to go back and forth and 2) it's all intertwined, somehow...

Strictly speaking, there are no new features here. "Just" a major revamp to generalise code generation for asynchronous operations, which will mostly impact PRO.

A lot of compiler passes have been rewritten from scratch, and in my opinion, the code quality has increased significantly. Some legacy code has finally gone for good...

Despite being a giant PR, it's worth saying that all edits go in the same direction and contribute to the same goal; in other words, this PR isn't the sum of N orthogonal changes, even though, I agree, the appearance may be deceptive

writes = c1.scope.writes[target]
except KeyError:
# No read-write dependency, ignore
def _schedule_waitlocks(self, c0, d, clusters, locks, syncs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this is mostly the same as before (with same exceptions), but split over different methods for better code readability

Copy link

codecov bot commented May 27, 2024

Codecov Report

Attention: Patch coverage is 61.75772% with 483 lines in your changes missing coverage. Please review.

Project coverage is 86.77%. Comparing base (210f899) to head (0decbb2).

Files Patch % Lines
devito/passes/clusters/asynchrony.py 9.77% 157 Missing ⚠️
devito/passes/iet/asynchrony.py 9.47% 86 Missing ⚠️
devito/passes/iet/engine.py 45.00% 29 Missing and 4 partials ⚠️
devito/passes/iet/orchestration.py 25.00% 33 Missing ⚠️
devito/passes/clusters/buffering.py 88.50% 15 Missing and 15 partials ⚠️
devito/passes/clusters/implicit.py 72.81% 25 Missing and 3 partials ⚠️
tests/test_gpu_common.py 0.00% 23 Missing ⚠️
devito/core/gpu.py 30.00% 14 Missing ⚠️
devito/operator/operator.py 54.16% 9 Missing and 2 partials ⚠️
devito/ir/support/syncs.py 67.74% 10 Missing ⚠️
... and 18 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2376      +/-   ##
==========================================
- Coverage   86.81%   86.77%   -0.05%     
==========================================
  Files         234      234              
  Lines       44004    44245     +241     
  Branches     8127     8181      +54     
==========================================
+ Hits        38204    38393     +189     
- Misses       5093     5137      +44     
- Partials      707      715       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


__all__ = ['buffering']


@timed_pass()
def buffering(clusters, callback, sregistry, options, **kwargs):
def buffering(clusters, key, sregistry, options, **kwargs):
Copy link
Contributor Author

@FabioLuporini FabioLuporini May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: unlike passes/clusters/asynchrony.py, which for a good 85-90% is the same code but refactored, this pass has been totally revamped and the result is, imho, vastly better (more robust, easier to read, dramatically easier to understand and modify) code

devito/operator/operator.py Show resolved Hide resolved
devito/operator/profiling.py Show resolved Hide resolved

k = PerfKey(name, rank)

if not ops:
if not ops or any(np.isnan(i) for i in [ops, points, traffic]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.isfinite to catch the Inf cases too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so something like

any(not np.isfinite(i) ...)

instead of

any(np.isnan(i) ...)

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

οκ

devito/passes/iet/misc.py Outdated Show resolved Hide resolved
Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big PR, I need some more time, to have another look and most importantly understand a few things!

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

devito/core/gpu.py Show resolved Hide resolved
@@ -215,6 +215,13 @@ def functions(self):
launch_args += (self.stream.function,)
return super().functions + launch_args

@cached_property
def expr_symbols(self):
launch_symbols = (self.grid, self.block,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need second comma in this tuple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

devito/ir/stree/algorithms.py Show resolved Hide resolved
if callable(maybe_callable):
dims = {i.dim for i in self if maybe_callable(i.dim)}
else:
dims = set(as_tuple(maybe_callable))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: add an as_set utility to do this. They could both be thin wrappers around some as_iterable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

devito/operator/operator.py Show resolved Hide resolved
devito/passes/clusters/implicit.py Show resolved Hide resolved
intervals = IntervalGroup(intervals, relations=relations)

return IterationSpace(intervals, sub_iterators, directions)
return self.ispace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be dropped then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, given the major change I prefer to keep it like that for now, it also helps me remember what .ispace actually represents (and why) when obtained via .written

devito/tools/utils.py Show resolved Hide resolved
devito/types/misc.py Show resolved Hide resolved
Copy link
Contributor

@mloubout mloubout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good

options0 = kwargs0.pop('options')

def wrapper(expressions, **kwargs1):
options = {**options0, **kwargs1.pop('options', {})}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wrapper(expressions, options=None, **kwargs1): and skip the pop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

kwargs = {'options': options, **kwargs0, **kwargs1}

# User-provided openmp flag has precedence over defaults
if not options['openmp']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options.get ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no square brackets is fine because by construction openmp must be there

devito/core/cpu.py Outdated Show resolved Hide resolved
devito/core/gpu.py Outdated Show resolved Hide resolved
processed.append(c.rebuild(exprs=None, guards=c.guards, syncs=c.syncs))
elif c.is_critical_region:
if c.is_wait:
processed.append(c.rebuild(exprs=[], syncs=c.syncs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably drop c.sync it's unchanged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

"""Shorthand to retrieve the DiscreteFunctions in `exprs`."""
indexeds = search(exprs, q_indexed, mode, 'dfs')

functions = search(exprs, q_function, mode, 'dfs')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in a deep mode like for retrieve_dimensions

devito/types/basic.py Show resolved Hide resolved
devito/types/dimension.py Outdated Show resolved Hide resolved
@@ -672,29 +672,6 @@ class Dummy(SubDomainSet):
assert z.is_Parallel


class TestMultiSubDimension:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were issues rebuilding MultiSubDimension because of how the thickness worked, so it has a bunch of extra gubbins to handle this. This PR removes the thickness as a kwarg and all the extra machinery, so this test isn't really needed anymore

tests/test_buffering.py Show resolved Hide resolved
devito/ir/iet/visitors.py Show resolved Hide resolved
devito/ir/stree/algorithms.py Show resolved Hide resolved
devito/ir/support/guards.py Show resolved Hide resolved
@FabioLuporini FabioLuporini force-pushed the revamp-async-codegen-final branch 2 times, most recently from 57c7acd to 2e5d1eb Compare June 6, 2024 14:00

@cached_property
def readfrom(self):
def last_idx(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually useful for the xdsl lowering, I had no idea it existed

Comment on lines +83 to +87
def _make_snapout(self, iet, *args):
return self._make_callable('write_snapshot', iet, *args)

def _make_snapin(self, iet, *args):
return self._make_callable('read_snapshot', iet, *args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the func name as the callable arg for being homogeneous to 'init_array'

examples/performance/01_gpu.ipynb Show resolved Hide resolved
tests/test_linearize.py Show resolved Hide resolved
# 2-tuples representing ranges
imask = []
for v in o.imask:
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single dispatch of uxrteplace doesn't handle these two cases?


k = PerfKey(name, rank)

if not ops:
if not ops or any(np.isnan(i) for i in [ops, points, traffic]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


k = PerfKey(name, rank)

if not ops:
if not ops or any(np.isnan(i) for i in [ops, points, traffic]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder

@@ -36,6 +36,16 @@ def is_on_device(obj, gpu_fit):
return all(f in gpu_fit for f in fsave)


def stream_dimensions(f):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really related to this PR but is it good that these functions live in init?
Maybe some better place? other/new file?

return clusters

try:
pd = prefix[-2].dim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent to dim.parent?

right = parent.symbolic_max - rst
def __init_finalize__(self, name, parent, msd):
# NOTE: a MultiSubDimension stashes a reference to the originating
# MultiSubDomain. This creates a circular pattern as the `msd` itself
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two spaces?

@FabioLuporini FabioLuporini force-pushed the revamp-async-codegen-final branch from 2e5d1eb to 0decbb2 Compare June 18, 2024 09:51
@FabioLuporini FabioLuporini merged commit be7c403 into master Jun 18, 2024
31 checks passed
@FabioLuporini FabioLuporini deleted the revamp-async-codegen-final branch June 18, 2024 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants