Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: Revamp CIRE exploiting EvalDerivative #1688

Merged
merged 38 commits into from
Jun 7, 2021

Conversation

FabioLuporini
Copy link
Contributor

@FabioLuporini FabioLuporini commented May 3, 2021

By searching and optimizing EvalDerivatives, that is sub-expressions produced by the evaluation of Derivatives, the CIRE pass is now more rigorous, robust, effective, and quick. It also opens up the possibility of easily synthesizing different variants of the same code.

The PR is equipped with many new tests, and adjust all of the existing ones to the new CIRE infrastructure.

Some optimization options , such as cire-mincost-* and cire-repeats-* have finally been dropped and replaced by two simple knobs, namely cire-mingain and cire-schedule, which are suitably documented

EDIT: I will also leave some inline comments for ease of reviewing

Note 1: AFAICT, there is no compilation speed regression

Note 2: this PR may impact the code generated by the compiler for some Operators. In all my experiments, I have not seen even a single regression in execution time.

So, Note 1 + Note 2 => all sort of timings should be improving. It'd be great to have TheMatrix to easily test this on multiple architectures simultaneously, but for now we'll need some beta testing I guess

@FabioLuporini FabioLuporini requested a review from mloubout May 3, 2021 13:46
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented May 3, 2021

Codecov Report

Merging #1688 (cadb4e1) into master (299b116) will increase coverage by 0.03%.
The diff coverage is 95.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1688      +/-   ##
==========================================
+ Coverage   86.68%   86.72%   +0.03%     
==========================================
  Files         218      218              
  Lines       33389    33712     +323     
  Branches     4342     4408      +66     
==========================================
+ Hits        28943    29236     +293     
- Misses       3959     3988      +29     
- Partials      487      488       +1     
Impacted Files Coverage Δ
devito/finite_differences/coefficients.py 91.86% <ø> (ø)
tests/test_gpu_common.py 1.89% <0.00%> (-0.01%) ⬇️
devito/symbolics/queries.py 48.69% <8.69%> (-10.45%) ⬇️
devito/ir/support/vector.py 93.81% <33.33%> (-0.43%) ⬇️
devito/operator/operator.py 88.26% <85.71%> (-0.28%) ⬇️
devito/passes/clusters/aliases.py 97.44% <98.54%> (+0.37%) ⬆️
devito/core/cpu.py 100.00% <100.00%> (ø)
devito/core/gpu.py 94.68% <100.00%> (-0.03%) ⬇️
devito/finite_differences/derivative.py 86.41% <100.00%> (-0.17%) ⬇️
devito/finite_differences/differentiable.py 95.10% <100.00%> (-0.08%) ⬇️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 299b116...cadb4e1. Read the comment docs.

clusters = fuse(clusters)
clusters = eliminate_arrays(clusters)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: yep, we can drop a whole pass, as thanks to the many generalisations it has become unnecessary (while being much weaker then what we have now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By unecessary you mean that it is not anymore giving improvements?

clusters = eliminate_arrays(clusters)

# Reduce flops
clusters = cse(clusters, sregistry)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: note that now cse is performed after fuse. That's fine, it was wrong before

Copy link
Contributor

@georgebisbas georgebisbas May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it worng?

@@ -359,8 +356,4 @@ def _eval_fd(self, expr):
for e in self._ppsubs:
res = res.xreplace(e)

# Step 5: Cast to EvaluatedDerivative
assert res.is_Add
res = EvalDerivative(*res.args, evaluate=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: now unnecessary as done directly inside finite_difference.py

@@ -117,24 +117,3 @@ def _fetch_properties(self, clusters, prefix):
for i in prefix:
properties[i.dim].update(v.get(i.dim, set()))
return properties


class Context(Queue):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: dropping as unnecessary now

@@ -78,8 +78,11 @@ def collect_const(expr):
# Back to the running example
# -> (a + c)
add = Add(*v)
# -> 3.*(a + c)
mul = Mul(k, add, evaluate=False)
if add == 0:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: these are small bug fixes... without, in some corner circumstances we were generating expressions such as a + 0, while now we would generate just a

@@ -185,7 +185,7 @@ def _(expr):

derivs, others = split(args, lambda a: isinstance(a, sympy.Derivative))
if not derivs:
return expr
return reuse_if_untouched(expr, args)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this is a performance-bug fix, and it's now tested

# E.g., `x0_blk0` or (`a[y_m+1]` => `y not in imapper`)
intervals.append(i)
continue
if i.dim in a.free_symbols:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this is a small bug fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it fix? What is the bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly anymore, but I've added a test. It had to do with hierarchical blocking and maybe nested parallelism

def _generate(self, exprs, exclude):
# E.g., extract `u.dx*a*b` and `u.dx*a*c` from `[(u.dx*a*b).dy`, `(u.dx*a*c).dy]`
def cbk_search(expr):
if isinstance(expr, EvalDerivative) and not expr.base.is_Function:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this is the game changer -- we can finally look for EvalDerivative, rather than speculatively searching all sort of expressions in sum-of-product form

@FabioLuporini FabioLuporini force-pushed the cire-ultimate-revamp branch from ee22a3e to 76d17bd Compare May 4, 2021 08:21
Copy link
Contributor

@mloubout mloubout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. looks good

@@ -24,17 +22,17 @@ class Cpu64OperatorMixin(object):
3 => "blocks", "sub-blocks", and "sub-sub-blocks", ...
"""

CIRE_MINCOST_INV = 50
CIRE_MINGAIN = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want backward compatibility?

CIRE_MINCOST_INV = 0

And still use it if set (or throw a warning).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'm going to deprecate stuff and raise warnings accordingly

"""

CIRE_MINCOST_SOPS = 10
CIRE_SCHEDULE = 'automatic'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher (lower) values
this higher lower seems a bit unclear to me

@@ -26,17 +25,17 @@ class DeviceOperatorMixin(object):
3 => "blocks", "sub-blocks", and "sub-sub-blocks", ...
"""

CIRE_MINCOST_INV = 50
CIRE_MINGAIN = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as cpu

'sops': oo.pop('cire-mincost-sops', cls.CIRE_MINCOST_SOPS)
}
o['cire-mingain'] = oo.pop('cire-mingain', cls.CIRE_MINGAIN)
o['cire-schedule'] = oo.pop('cire-schedule', cls.CIRE_SCHEDULE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier to have a an abstract operator class to avoid this code duplication in cpu and gpu?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it (obviously) but in reality there are quite a few differences between the two, so I'd rather keep it separated

devito/finite_differences/differentiable.py Show resolved Hide resolved
try:
obj.base = base
except AttributeError:
# This might happen if e.g. one attempts a (re)construction with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there may be even weirder case. Like consider first order derivative with coefficient 1/h_x, 0 then it would evaluate to a Mul (1/h_x*u)which is not anAdd` so may break the whole constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, but I wasn't able to produce them, not sure why... any chance you can try writing a reproducer?

obj = super().__new__(cls, *args, **kwargs)

try:
obj.base = base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the need for that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CIRE won't optimize EvalDerivatives if the base is a pure Function, such as u.dx, simply because there would be nothing to optimize. It needs to be an expression to be worth optimizing, ie expr.dx . CTRL+F for .base and you'll see where it's used in aliases.py

# `score > M` => optimized
# `m <= score <= M` => maybe discarded, maybe optimized -- depends on heuristics
m = mingain
M = mingain*3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heuristic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Heuristics are in master too in this same function, though the "3" is new. It works perfectly for what we want and don't want to optimize given the other chosen values, what can I say..

# TODO: Once we'll be able to keep Derivative intact down to this point,
# we won't probably need this function anymore, because, in essence, what
# this function is doing is reconstructing prematurely lowered information
grids = {getattr(i.function, 'grid', None) for i in expr.free_symbols} - {None}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this done here? And why not expr.grid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will shortly do another pass

clusters = fuse(clusters)
clusters = eliminate_arrays(clusters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By unecessary you mean that it is not anymore giving improvements?

'sops': oo.pop('cire-mincost-sops', cls.CIRE_MINCOST_SOPS)
}
o['cire-mingain'] = oo.pop('cire-mingain', cls.CIRE_MINGAIN)
o['cire-schedule'] = oo.pop('cire-schedule', cls.CIRE_SCHEDULE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

clusters = eliminate_arrays(clusters)

# Reduce flops
clusters = cse(clusters, sregistry)
Copy link
Contributor

@georgebisbas georgebisbas May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it worng?

@@ -164,7 +164,7 @@ def used_dimensions(self):
example, reduction or redundant (i.e., invariant) Dimensions won't
appear in an expression.
"""
return set().union(*[i._defines for i in self.free_symbols if i.is_Dimension])
return {i for i in self.free_symbols if i.is_Dimension}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that helping?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, now it adheres to the specification of the method (docstring)
before, it was mistakenly adding all dimensions, including the parent ones, which may actually not appear directly in the cluster

raise NotImplementedError

def _in_writeto(self, dim, cluster):
def _betterv(self, delta_flops, delta_ws):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deliver typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, gonna find a better name, current one is leftover


return mapper
# Rule out extractions that are *not* independent of the Dimension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are not independent \equivalent to are dependent (?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, will fix

@@ -485,91 +543,91 @@ def collect(extracted, ispace, min_storage):
distance = [(d, set(v)) for d, v in LabeledVector.transpose(*distance)]
distances.append(LabeledVector([(d, v.pop()) for d, v in distance]))

aliases.add(alias, list(mapper), aliaseds, distances)
# Compute the alias score
na = len(aliaseds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other var name?

# `score > M` => optimized
# `m <= score <= M` => maybe discarded, maybe optimized -- depends on heuristics
m = mingain
M = mingain*3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heuristic?

# E.g., `x0_blk0` or (`a[y_m+1]` => `y not in imapper`)
intervals.append(i)
continue
if i.dim in a.free_symbols:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it fix? What is the bug?

@@ -294,9 +294,10 @@ def subs_op_args(expr, args):
return expr.subs({i.name: args[i.name] for i in expr.free_symbols if i.name in args})


def rebuild_if_untouched(expr, args, evaluate=False):
def reuse_if_untouched(expr, args, evaluate=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

"""

CIRE_MINCOST_SOPS = 10
CIRE_SCHEDULE = 'automatic'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher (lower) values
this higher lower seems a bit unclear to me

@@ -81,3 +81,7 @@ def normalize_properties(*args):
properties.update(p - drop)

return properties


def relax_properties(properties):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mentioned that in the past? To drop properties?

assert len(arrays) == 5
assert len(FindNodes(VExpanded).visit(op._func_table['bf0'])) == 3
xs, ys, zs = self.get_params(op, 'x0_blk0_size', 'y0_blk0_size', 'z_size')
# The three kind of derivatives taken -- in x, y, z -- all are over
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

three kinds?


# Also check against expected operation count to make sure
# all redundancies have been detected correctly
assert summary[('section1', None)].ops == 80
assert summary1[('section1', None)].ops == 75
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHy summary1 instead of summary?


class TestTTIv2(object):

@switchconfig(profiling='advanced')
@pytest.mark.parametrize('space_order,expected', [
(4, 212), (12, 420)
(4, 191), (12, 383)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@FabioLuporini FabioLuporini force-pushed the cire-ultimate-revamp branch 2 times, most recently from 20d5870 to 68f8e0c Compare May 11, 2021 14:47
@FabioLuporini FabioLuporini force-pushed the cire-ultimate-revamp branch from 951d5f6 to cadb4e1 Compare May 12, 2021 08:41
@@ -918,6 +918,14 @@ def parse_kwargs(**kwargs):
options.setdefault('mpi', configuration['mpi'])
for k, v in configuration['opt-options'].items():
options.setdefault(k, v)
# Handle deprecations
deprecated_options = ('cire-mincost-inv', 'cire-mincost-sops')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cire-maxalias as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added commit already in follow-up PR

@FabioLuporini FabioLuporini merged commit 8d60c7b into master Jun 7, 2021
@FabioLuporini FabioLuporini deleted the cire-ultimate-revamp branch June 7, 2021 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants