Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: Introduce gpu-create parameter for buffers initialised on device #2107

Merged
merged 11 commits into from
May 3, 2023

Conversation

ccuetom
Copy link
Contributor

@ccuetom ccuetom commented Apr 14, 2023

Introduces a parameter devicecreate that defines a list of buffers to be created on the device and initialised to zero instead of copied from host to device.

For a buffer u to be created on the device, an operator can be created using op(..., devicecreate=u, ...).

For a series of buffers u and v the operator can be created using op(..., devicecreate=(u, v), ...).

The PR also marks recursive compilations with the flag is_rcompile and changes the iet_pass decorator to accept an optional parameter iet_pass(skipif_rcompile=True) that skips passes if part of a recursive compilation. This prevents infinite recursions when calling rcompile from an iet_pass.

Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome !

I just left a few minor comments

@@ -73,6 +73,7 @@ def _normalize_kwargs(cls, **kwargs):
o['par-nested'] = np.inf # Never use nested parallelism
o['par-disabled'] = oo.pop('par-disabled', True) # No host parallelism by default
o['gpu-fit'] = as_tuple(oo.pop('gpu-fit', cls._normalize_gpu_fit(**kwargs)))
o['devicecreate'] = as_tuple(oo.pop('devicecreate', ()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in retrospect, maybe gpu-create, is a better candidate. Or perhaps gpu-zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to gpu-create, I think gpu-zero gives the impression that it is only zeroed within the device and gives no information about where the buffer is generated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@@ -1086,10 +1090,14 @@ def parse_kwargs(**kwargs):
else:
openmp = kwopenmp

# `opt`, devicecreate
devicecreate = kwargs.get('devicecreate', options.get('devicecreate', ()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where to write this comment, so I'll write it here.

Currently the API is:

Operator(... devicecreate=... )

But devicecreate is a so called opt-option (optimization option), hence it should rather be passed as such:

Operator(..., opt=('advanced', {'devicecreate': ...}  # or rather gpu-zero, see above

So you can drop this change here, and rather add it here
https://github.com/devitocodes/devito/blob/master/devito/core/cpu.py#L72
and here
https://github.com/devitocodes/devito/blob/master/devito/core/gpu.py#L75
(in fact, you've already started editing at the second link...)

EDIT: uops, sorry, just saw that you indeed have performed edits at both links already 👍

@@ -39,6 +39,36 @@ def is_on_device(obj, gpu_fit):
return all(f in gpu_fit for f in fsave)


def is_device_created(obj, devicecreate):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_gpu_zero?

Worth mentioning in the doc that the Functions will be zero-initialized

@@ -393,7 +393,7 @@ def place_casts(self, iet, **kwargs):

return iet, {}

def process(self, graph):
def process(self, graph, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're stashing devicecreate (gpu-zero or whatever) in __init__ -- correctly! Hence you don't need to add **kwargs, neither here nor at the caller sites

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These **kwargs are needed to have is_rcompile reach the iet_pass wrapper

mmap = (mmap, init)
else:
mmap = self.lang._map_to(obj)
efunc = ()

if read_only is False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the case ìs_device_created+ read_only make sense? ie, both ifs are true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not now that I think of it

if is_device_created(obj, self.devicecreate):
mmap = self.lang._map_alloc(obj)

cdims = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this entire segment of code -- from cdims = [] to the end of the for loop (line 465) to a separate utility function? Something at the bottom of this file would do. Something like make_zero_init or whatever. I could then use it from other places as well, thus avoiding redundant code.

@@ -1128,6 +1128,44 @@ def test_xcor_from_saved(self, opt, opt_options, gpu_fit):

assert np.all(g.data == 30)

def test_devicecreate_forward(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, I think we need to check codegen. Something very minimal -- eg to make sure that there's indeed a create(...) and not a mapto().

if openacc:
    assert 'create(u)' in str(...)
elif openmp
    assert '...'

(it's target-language specific; you can pull the target language from configuration)

@@ -179,7 +180,7 @@ def _specialize_iet(cls, graph, **kwargs):
hoist_prodders(graph)

# Symbol definitions
cls._Target.DataManager(**kwargs).process(graph)
cls._Target.DataManager(**kwargs).process(graph, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments below -- but basically you don't need to pass kwargs here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still needed to pass is_rcompile to the iet_pass wrapper

Copy link
Contributor

@mloubout mloubout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, looks nice

f = i
fcreate.append(f)

return all(f in gpu_create for f in fcreate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicking in case list is "long". You don't actually need other fcreate you just need to return false at the first i not in gpu_create and return true at the end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true, hadn't thought of it

devito/passes/iet/definitions.py Show resolved Hide resolved
devito/passes/iet/definitions.py Show resolved Hide resolved

assert np.all(u.data[0] == 36)
assert np.all(u.data[1] == 35)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should place-transfers be tested out as well separatrly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. Should I add the test to test_gpu_common.py::TestStreaming?

@FabioLuporini FabioLuporini changed the title compiler: Introduce devicecreate parameter for buffers initialised on device compiler: Introduce gpu-create parameter for buffers initialised on device Apr 18, 2023
Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left tiny comments, but imho we can land this

gpu_create : list of Function
The Function's which are expected to be created in device memory. This
information is given directly by the user through the compiler option
`devicecreate` and is propagated down here through the various stages of lowering.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu-create

@@ -39,6 +39,37 @@ def is_on_device(obj, gpu_fit):
return all(f in gpu_fit for f in fsave)


def is_gpu_create(obj, gpu_create):
"""
True if the given object is created and not copied in the device memory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:%s/object is/objects are

?

devito/passes/iet/definitions.py Show resolved Hide resolved
devito/passes/iet/definitions.py Show resolved Hide resolved
Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, running tests

@codecov
Copy link

codecov bot commented Apr 19, 2023

Codecov Report

Merging #2107 (89e33fc) into master (468e493) will decrease coverage by 0.13%.
The diff coverage is 23.76%.

@@            Coverage Diff             @@
##           master    #2107      +/-   ##
==========================================
- Coverage   87.73%   87.60%   -0.13%     
==========================================
  Files         221      221              
  Lines       38961    39100     +139     
  Branches     5064     5080      +16     
==========================================
+ Hits        34181    34253      +72     
- Misses       4213     4280      +67     
  Partials      567      567              
Impacted Files Coverage Δ
tests/test_gpu_common.py 1.32% <0.00%> (-0.08%) ⬇️
devito/passes/iet/definitions.py 82.31% <31.25%> (-6.03%) ⬇️
devito/passes/__init__.py 66.66% <46.15%> (-11.60%) ⬇️
devito/core/gpu.py 73.75% <71.42%> (-0.08%) ⬇️
devito/core/cpu.py 100.00% <100.00%> (ø)
devito/operator/operator.py 89.15% <100.00%> (+0.02%) ⬆️

... and 9 files with indirect coverage changes

Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTG. Next in the merge pipeline... Thanks!

@FabioLuporini FabioLuporini merged commit b858fb1 into devitocodes:master May 3, 2023
@ccuetom ccuetom deleted the openacc-create branch May 19, 2023 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants