-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Revamp CIRE exploiting EvalDerivative #1688
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report
@@ Coverage Diff @@
## master #1688 +/- ##
==========================================
+ Coverage 86.68% 86.72% +0.03%
==========================================
Files 218 218
Lines 33389 33712 +323
Branches 4342 4408 +66
==========================================
+ Hits 28943 29236 +293
- Misses 3959 3988 +29
- Partials 487 488 +1
Continue to review full report at Codecov.
|
clusters = fuse(clusters) | ||
clusters = eliminate_arrays(clusters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: yep, we can drop a whole pass, as thanks to the many generalisations it has become unnecessary (while being much weaker then what we have now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By unecessary
you mean that it is not anymore giving improvements?
clusters = eliminate_arrays(clusters) | ||
|
||
# Reduce flops | ||
clusters = cse(clusters, sregistry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: note that now cse
is performed after fuse
. That's fine, it was wrong before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was it worng?
@@ -359,8 +356,4 @@ def _eval_fd(self, expr): | |||
for e in self._ppsubs: | |||
res = res.xreplace(e) | |||
|
|||
# Step 5: Cast to EvaluatedDerivative | |||
assert res.is_Add | |||
res = EvalDerivative(*res.args, evaluate=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: now unnecessary as done directly inside finite_difference.py
@@ -117,24 +117,3 @@ def _fetch_properties(self, clusters, prefix): | |||
for i in prefix: | |||
properties[i.dim].update(v.get(i.dim, set())) | |||
return properties | |||
|
|||
|
|||
class Context(Queue): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: dropping as unnecessary now
@@ -78,8 +78,11 @@ def collect_const(expr): | |||
# Back to the running example | |||
# -> (a + c) | |||
add = Add(*v) | |||
# -> 3.*(a + c) | |||
mul = Mul(k, add, evaluate=False) | |||
if add == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: these are small bug fixes... without, in some corner circumstances we were generating expressions such as a + 0
, while now we would generate just a
@@ -185,7 +185,7 @@ def _(expr): | |||
|
|||
derivs, others = split(args, lambda a: isinstance(a, sympy.Derivative)) | |||
if not derivs: | |||
return expr | |||
return reuse_if_untouched(expr, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: this is a performance-bug fix, and it's now tested
# E.g., `x0_blk0` or (`a[y_m+1]` => `y not in imapper`) | ||
intervals.append(i) | ||
continue | ||
if i.dim in a.free_symbols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: this is a small bug fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it fix? What is the bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly anymore, but I've added a test. It had to do with hierarchical blocking and maybe nested parallelism
def _generate(self, exprs, exclude): | ||
# E.g., extract `u.dx*a*b` and `u.dx*a*c` from `[(u.dx*a*b).dy`, `(u.dx*a*c).dy]` | ||
def cbk_search(expr): | ||
if isinstance(expr, EvalDerivative) and not expr.base.is_Function: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: this is the game changer -- we can finally look for EvalDerivative
, rather than speculatively searching all sort of expressions in sum-of-product form
ee22a3e
to
76d17bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. looks good
@@ -24,17 +22,17 @@ class Cpu64OperatorMixin(object): | |||
3 => "blocks", "sub-blocks", and "sub-sub-blocks", ... | |||
""" | |||
|
|||
CIRE_MINCOST_INV = 50 | |||
CIRE_MINGAIN = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want backward compatibility?
CIRE_MINCOST_INV = 0
And still use it if set (or throw a warning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'm going to deprecate stuff and raise warnings accordingly
""" | ||
|
||
CIRE_MINCOST_SOPS = 10 | ||
CIRE_SCHEDULE = 'automatic' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Higher (lower) values
this higher lower seems a bit unclear to me
@@ -26,17 +25,17 @@ class DeviceOperatorMixin(object): | |||
3 => "blocks", "sub-blocks", and "sub-sub-blocks", ... | |||
""" | |||
|
|||
CIRE_MINCOST_INV = 50 | |||
CIRE_MINGAIN = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as cpu
'sops': oo.pop('cire-mincost-sops', cls.CIRE_MINCOST_SOPS) | ||
} | ||
o['cire-mingain'] = oo.pop('cire-mingain', cls.CIRE_MINGAIN) | ||
o['cire-schedule'] = oo.pop('cire-schedule', cls.CIRE_SCHEDULE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be easier to have a an abstract operator class to avoid this code duplication in cpu and gpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it (obviously) but in reality there are quite a few differences between the two, so I'd rather keep it separated
try: | ||
obj.base = base | ||
except AttributeError: | ||
# This might happen if e.g. one attempts a (re)construction with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there may be even weirder case. Like consider first order derivative with coefficient 1/h_x, 0
then it would evaluate to a Mul
(1/h_x*u)which is not an
Add` so may break the whole constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but I wasn't able to produce them, not sure why... any chance you can try writing a reproducer?
obj = super().__new__(cls, *args, **kwargs) | ||
|
||
try: | ||
obj.base = base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the need for that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CIRE won't optimize EvalDerivatives if the base is a pure Function, such as u.dx
, simply because there would be nothing to optimize. It needs to be an expression to be worth optimizing, ie expr.dx
. CTRL+F for .base
and you'll see where it's used in aliases.py
# `score > M` => optimized | ||
# `m <= score <= M` => maybe discarded, maybe optimized -- depends on heuristics | ||
m = mingain | ||
M = mingain*3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heuristic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. Heuristics are in master too in this same function, though the "3" is new. It works perfectly for what we want and don't want to optimize given the other chosen values, what can I say..
# TODO: Once we'll be able to keep Derivative intact down to this point, | ||
# we won't probably need this function anymore, because, in essence, what | ||
# this function is doing is reconstructing prematurely lowered information | ||
grids = {getattr(i.function, 'grid', None) for i in expr.free_symbols} - {None} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this done here? And why not expr.grid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will shortly do another pass
clusters = fuse(clusters) | ||
clusters = eliminate_arrays(clusters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By unecessary
you mean that it is not anymore giving improvements?
'sops': oo.pop('cire-mincost-sops', cls.CIRE_MINCOST_SOPS) | ||
} | ||
o['cire-mingain'] = oo.pop('cire-mingain', cls.CIRE_MINGAIN) | ||
o['cire-schedule'] = oo.pop('cire-schedule', cls.CIRE_SCHEDULE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
clusters = eliminate_arrays(clusters) | ||
|
||
# Reduce flops | ||
clusters = cse(clusters, sregistry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was it worng?
@@ -164,7 +164,7 @@ def used_dimensions(self): | |||
example, reduction or redundant (i.e., invariant) Dimensions won't | |||
appear in an expression. | |||
""" | |||
return set().union(*[i._defines for i in self.free_symbols if i.is_Dimension]) | |||
return {i for i in self.free_symbols if i.is_Dimension} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that helping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, now it adheres to the specification of the method (docstring)
before, it was mistakenly adding all dimensions, including the parent ones, which may actually not appear directly in the cluster
devito/passes/clusters/aliases.py
Outdated
raise NotImplementedError | ||
|
||
def _in_writeto(self, dim, cluster): | ||
def _betterv(self, delta_flops, delta_ws): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deliver typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, gonna find a better name, current one is leftover
devito/passes/clusters/aliases.py
Outdated
|
||
return mapper | ||
# Rule out extractions that are *not* independent of the Dimension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are not independent \equivalent to are dependent (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, will fix
@@ -485,91 +543,91 @@ def collect(extracted, ispace, min_storage): | |||
distance = [(d, set(v)) for d, v in LabeledVector.transpose(*distance)] | |||
distances.append(LabeledVector([(d, v.pop()) for d, v in distance])) | |||
|
|||
aliases.add(alias, list(mapper), aliaseds, distances) | |||
# Compute the alias score | |||
na = len(aliaseds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other var name?
# `score > M` => optimized | ||
# `m <= score <= M` => maybe discarded, maybe optimized -- depends on heuristics | ||
m = mingain | ||
M = mingain*3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heuristic?
# E.g., `x0_blk0` or (`a[y_m+1]` => `y not in imapper`) | ||
intervals.append(i) | ||
continue | ||
if i.dim in a.free_symbols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it fix? What is the bug?
@@ -294,9 +294,10 @@ def subs_op_args(expr, args): | |||
return expr.subs({i.name: args[i.name] for i in expr.free_symbols if i.name in args}) | |||
|
|||
|
|||
def rebuild_if_untouched(expr, args, evaluate=False): | |||
def reuse_if_untouched(expr, args, evaluate=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
""" | ||
|
||
CIRE_MINCOST_SOPS = 10 | ||
CIRE_SCHEDULE = 'automatic' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Higher (lower) values
this higher lower seems a bit unclear to me
@@ -81,3 +81,7 @@ def normalize_properties(*args): | |||
properties.update(p - drop) | |||
|
|||
return properties | |||
|
|||
|
|||
def relax_properties(properties): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mentioned that in the past? To drop properties?
assert len(arrays) == 5 | ||
assert len(FindNodes(VExpanded).visit(op._func_table['bf0'])) == 3 | ||
xs, ys, zs = self.get_params(op, 'x0_blk0_size', 'y0_blk0_size', 'z_size') | ||
# The three kind of derivatives taken -- in x, y, z -- all are over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
three kinds?
|
||
# Also check against expected operation count to make sure | ||
# all redundancies have been detected correctly | ||
assert summary[('section1', None)].ops == 80 | ||
assert summary1[('section1', None)].ops == 75 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHy summary1 instead of summary?
|
||
class TestTTIv2(object): | ||
|
||
@switchconfig(profiling='advanced') | ||
@pytest.mark.parametrize('space_order,expected', [ | ||
(4, 212), (12, 420) | ||
(4, 191), (12, 383) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
20d5870
to
68f8e0c
Compare
951d5f6
to
cadb4e1
Compare
@@ -918,6 +918,14 @@ def parse_kwargs(**kwargs): | |||
options.setdefault('mpi', configuration['mpi']) | |||
for k, v in configuration['opt-options'].items(): | |||
options.setdefault(k, v) | |||
# Handle deprecations | |||
deprecated_options = ('cire-mincost-inv', 'cire-mincost-sops') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cire-maxalias
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added commit already in follow-up PR
By searching and optimizing EvalDerivatives, that is sub-expressions produced by the evaluation of Derivatives, the CIRE pass is now more rigorous, robust, effective, and quick. It also opens up the possibility of easily synthesizing different variants of the same code.
The PR is equipped with many new tests, and adjust all of the existing ones to the new CIRE infrastructure.
Some optimization options , such as
cire-mincost-*
andcire-repeats-*
have finally been dropped and replaced by two simple knobs, namelycire-mingain
andcire-schedule
, which are suitably documentedEDIT: I will also leave some inline comments for ease of reviewing
Note 1: AFAICT, there is no compilation speed regression
Note 2: this PR may impact the code generated by the compiler for some Operators. In all my experiments, I have not seen even a single regression in execution time.
So, Note 1 + Note 2 => all sort of timings should be improving. It'd be great to have TheMatrix to easily test this on multiple architectures simultaneously, but for now we'll need some beta testing I guess