-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Introduce gpu-create parameter for buffers initialised on device #2107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome !
I just left a few minor comments
devito/core/gpu.py
Outdated
@@ -73,6 +73,7 @@ def _normalize_kwargs(cls, **kwargs): | |||
o['par-nested'] = np.inf # Never use nested parallelism | |||
o['par-disabled'] = oo.pop('par-disabled', True) # No host parallelism by default | |||
o['gpu-fit'] = as_tuple(oo.pop('gpu-fit', cls._normalize_gpu_fit(**kwargs))) | |||
o['devicecreate'] = as_tuple(oo.pop('devicecreate', ())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, in retrospect, maybe gpu-create
, is a better candidate. Or perhaps gpu-zero
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to gpu-create
, I think gpu-zero
gives the impression that it is only zeroed within the device and gives no information about where the buffer is generated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
devito/operator/operator.py
Outdated
@@ -1086,10 +1090,14 @@ def parse_kwargs(**kwargs): | |||
else: | |||
openmp = kwopenmp | |||
|
|||
# `opt`, devicecreate | |||
devicecreate = kwargs.get('devicecreate', options.get('devicecreate', ())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know where to write this comment, so I'll write it here.
Currently the API is:
Operator(... devicecreate=... )
But devicecreate
is a so called opt-option (optimization option), hence it should rather be passed as such:
Operator(..., opt=('advanced', {'devicecreate': ...} # or rather gpu-zero, see above
So you can drop this change here, and rather add it here
https://github.com/devitocodes/devito/blob/master/devito/core/cpu.py#L72
and here
https://github.com/devitocodes/devito/blob/master/devito/core/gpu.py#L75
(in fact, you've already started editing at the second link...)
EDIT: uops, sorry, just saw that you indeed have performed edits at both links already 👍
devito/passes/__init__.py
Outdated
@@ -39,6 +39,36 @@ def is_on_device(obj, gpu_fit): | |||
return all(f in gpu_fit for f in fsave) | |||
|
|||
|
|||
def is_device_created(obj, devicecreate): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_gpu_zero
?
Worth mentioning in the doc that the Functions will be zero-initialized
devito/passes/iet/definitions.py
Outdated
@@ -393,7 +393,7 @@ def place_casts(self, iet, **kwargs): | |||
|
|||
return iet, {} | |||
|
|||
def process(self, graph): | |||
def process(self, graph, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're stashing devicecreate
(gpu-zero
or whatever) in __init__
-- correctly! Hence you don't need to add **kwargs
, neither here nor at the caller sites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These **kwargs
are needed to have is_rcompile
reach the iet_pass
wrapper
devito/passes/iet/definitions.py
Outdated
mmap = (mmap, init) | ||
else: | ||
mmap = self.lang._map_to(obj) | ||
efunc = () | ||
|
||
if read_only is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the case ìs_device_created
+ read_only
make sense? ie, both ifs are true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not now that I think of it
devito/passes/iet/definitions.py
Outdated
if is_device_created(obj, self.devicecreate): | ||
mmap = self.lang._map_alloc(obj) | ||
|
||
cdims = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move this entire segment of code -- from cdims = []
to the end of the for loop (line 465) to a separate utility function? Something at the bottom of this file would do. Something like make_zero_init
or whatever. I could then use it from other places as well, thus avoiding redundant code.
tests/test_gpu_common.py
Outdated
@@ -1128,6 +1128,44 @@ def test_xcor_from_saved(self, opt, opt_options, gpu_fit): | |||
|
|||
assert np.all(g.data == 30) | |||
|
|||
def test_devicecreate_forward(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on second thought, I think we need to check codegen. Something very minimal -- eg to make sure that there's indeed a create(...) and not a mapto().
if openacc:
assert 'create(u)' in str(...)
elif openmp
assert '...'
(it's target-language specific; you can pull the target language from configuration
)
devito/core/cpu.py
Outdated
@@ -179,7 +180,7 @@ def _specialize_iet(cls, graph, **kwargs): | |||
hoist_prodders(graph) | |||
|
|||
# Symbol definitions | |||
cls._Target.DataManager(**kwargs).process(graph) | |||
cls._Target.DataManager(**kwargs).process(graph, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments below -- but basically you don't need to pass kwargs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still needed to pass is_rcompile
to the iet_pass
wrapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, looks nice
devito/passes/__init__.py
Outdated
f = i | ||
fcreate.append(f) | ||
|
||
return all(f in gpu_create for f in fcreate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicking in case list is "long". You don't actually need other fcreate
you just need to return false
at the first i
not in gpu_create
and return true at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's true, hadn't thought of it
|
||
assert np.all(u.data[0] == 36) | ||
assert np.all(u.data[1] == 35) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should place-transfers
be tested out as well separatrly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that. Should I add the test to test_gpu_common.py::TestStreaming
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left tiny comments, but imho we can land this
gpu_create : list of Function | ||
The Function's which are expected to be created in device memory. This | ||
information is given directly by the user through the compiler option | ||
`devicecreate` and is propagated down here through the various stages of lowering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpu-create
devito/passes/__init__.py
Outdated
@@ -39,6 +39,37 @@ def is_on_device(obj, gpu_fit): | |||
return all(f in gpu_fit for f in fsave) | |||
|
|||
|
|||
def is_gpu_create(obj, gpu_create): | |||
""" | |||
True if the given object is created and not copied in the device memory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:%s/object is/objects are
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, running tests
Codecov Report
@@ Coverage Diff @@
## master #2107 +/- ##
==========================================
- Coverage 87.73% 87.60% -0.13%
==========================================
Files 221 221
Lines 38961 39100 +139
Branches 5064 5080 +16
==========================================
+ Hits 34181 34253 +72
- Misses 4213 4280 +67
Partials 567 567
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GTG. Next in the merge pipeline... Thanks!
Introduces a parameter
devicecreate
that defines a list of buffers to be created on the device and initialised to zero instead of copied from host to device.For a buffer
u
to be created on the device, an operator can be created usingop(..., devicecreate=u, ...)
.For a series of buffers
u
andv
the operator can be created usingop(..., devicecreate=(u, v), ...)
.The PR also marks recursive compilations with the flag
is_rcompile
and changes theiet_pass
decorator to accept an optional parameteriet_pass(skipif_rcompile=True)
that skips passes if part of a recursive compilation. This prevents infinite recursions when callingrcompile
from aniet_pass
.