-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Add blockrelax tests and refresh advisor profiling #1929
Conversation
georgebisbas
commented
May 31, 2022
- Avoid autotuning when children's block shapes are the same as the parent's block shapes
- Slightly refresh advisor
# We must be able to do thread pinning, otherwise any results would be | ||
# meaningless. Currently, we only support doing that via numactl | ||
# Thread pinning is strongly recommended for reliable results. | ||
# We support thread pinning via numactl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or KMP_AFFINITY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"we support" doesn't also make much sense. We don't support anything, thread/process placement is up to the user (though we might one day devise an option to do that automatically at least for openmp since there are special for-loop clauses that allow you to specify the thread affinity for that loop. But not today)
KMP_AFFINITY is also intel stuff
The OpenMP standard env vars for thread pinning are OMP_PLACES and OMP_PROC_BIND , IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rephrased a bit. Hope its better now
stepper = None | ||
timesteps = 1 | ||
elif len(steppers) == 1: | ||
if len(steppers) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove that 0
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if CI doesn't fail, either test suite is flawed or we somewhat changed things over time such that it's never the case.
Now, the second case to me is unlikely...one could definitely try auto-tuning without a time loop. Hence, the test suite requires an update !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0 case is removed since devito does not apply loop blocking for non time-iterative computations.
devito/devito/passes/clusters/blocking.py
Line 148 in 45c75a7
# Heuristic: TILABLE not worth it if not within a SEQUENTIAL Dimension |
CI passes.
Idea: blockrelax should allow blocking non-time-iterative loops, right? It skips the heuristics. Then we need to add some tests for this.
i.e. block relax on matrix multiplication example?
I need to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me yeah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests updated with blockrelax in linear algebra.
Need to add some tests that check the structure too
Codecov Report
@@ Coverage Diff @@
## master #1929 +/- ##
=======================================
Coverage 89.60% 89.60%
=======================================
Files 211 211
Lines 35941 35941
Branches 5414 5414
=======================================
Hits 32205 32205
Misses 3232 3232
Partials 504 504 Continue to review full report at Codecov.
|
f0e6956
to
eaf2f13
Compare
devito/core/autotuning.py
Outdated
if all(v <= i and i % v == 0 for _, i in bs): | ||
# To be a valid block size, it must be smaller than | ||
# and divide evenly the parent's block size. | ||
# Blocksizes equal to the parent's block size are not included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also why not? it would be sort-of like defaulting to the non-hierarchical case no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect that if someone asks for >1 levels would not be interested in that....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in, why would you remove a bunch of points from the exploration space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well in order to reduce the time needed. But I agree that for typical space blocking, this is cheap, so I am happy to bring them back
eaf2f13
to
ac4b61f
Compare
ac4b61f
to
dd52a3a
Compare
dd52a3a
to
f08a52a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good here
2bb1247
to
6b45b07
Compare