Skip to content

Commit

Permalink
Refactor argument passing
Browse files Browse the repository at this point in the history
- Refactor argument passing so that instead of implicitly-spawned
  `ImplementedDataInfo` objects, there are actual arguments (for
  automatic offsets and strides, base storage, and `sep`-tagged arrays).
  It also centralizes the logic for what goes into argument lists,
  instead of having various "filtered" versions scattered about.
- Get started on type-annotating a bit of loopy.
- Switch a not-small number of data structures to be dataclasses, notably `LoopKernel`.
- Drop OCCA support from the ISPC target. (I'm not aware of any users, ever.)
- Drop the Numba target outright. (I'm not aware of any users, ever.)
- Drop `LoopKernel.local_sizes`, which was usable to directly set the
  workgroup size. (I'm not aware of any users, ever.)
- Expire the deprecation for `iname_to_tags`.
- Bumps the Python compatibility target to 3.8, for `from __future__
  import annotations` and  `cached_property` (mypy does not support
  nested decorators)
- Bug fix: `tags` was not part of `LoopKernel.hash_fields`
- Bug fix: `InstructionBase.get_write_dependency_names()` was used to
  find written variables, `InstructionBase.assignee_var_names()` is
  correct
- Bug fix: KernelExecutorBase now uses linearize() so as to not bypass
  pre-linearization checks (cf. gh-639)
  • Loading branch information
inducer committed Jun 28, 2022
1 parent ffa29ab commit 34c7344
Show file tree
Hide file tree
Showing 60 changed files with 2,971 additions and 2,653 deletions.
15 changes: 14 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
uses: actions/setup-python@v1
with:
# matches compat target in setup.py
python-version: '3.6'
python-version: '3.8'
- name: "Main Script"
run: |
curl -L -O https://gitlab.tiker.net/inducer/ci-support/raw/main/prepare-and-run-flake8.sh
Expand All @@ -35,6 +35,19 @@ jobs:
curl -L -O https://gitlab.tiker.net/inducer/ci-support/raw/main/prepare-and-run-pylint.sh
. ./prepare-and-run-pylint.sh "$(basename $GITHUB_REPOSITORY)" test/test_*.py
mypy:
name: Mypy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: "Main Script"
run: |
curl -L -O https://tiker.net/ci-support-v0
. ./ci-support-v0
build_py_project_in_conda_env
python -m pip install mypy
./run-mypy.sh
pytest:
name: Conda Pytest
runs-on: ubuntu-latest
Expand Down
12 changes: 12 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,18 @@ Flake8:
except:
- tags

Mypy:
script: |
curl -L -O https://tiker.net/ci-support-v0
. ./ci-support-v0
build_py_project_in_venv
python -m pip install mypy
./run-mypy.sh
tags:
- python3
except:
- tags

Benchmarks:
stage: test
script:
Expand Down
19 changes: 19 additions & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,25 @@
"https://pyrsistent.readthedocs.io/en/latest/": None,
}

# Some modules need to import things just so that sphinx can resolve symbols in
# type annotations. Often, we do not want these imports (e.g. of PyOpenCL) when
# in normal use (because they would introduce unintended side effects or hard
# dependencies). This flag exists so that these imports only occur during doc
# build. Since sphinx appears to resolve type hints lexically (as it should),
# this needs to be cross-module (since, e.g. an inherited arraycontext
# docstring can be read by sphinx when building meshmode, a dependent package),
# this needs a setting of the same name across all packages involved, that's
# why this name is as global-sounding as it is.
import sys
sys._BUILDING_SPHINX_DOCS = True

nitpick_ignore_regex = [
["py:class", r"typing_extensions\.(.+)"],
["py:class", r"numpy\.u?int[0-9]+"],
["py:class", r"numpy\.float[0-9]+"],
["py:class", r"numpy\.complex[0-9]+"],

# As of 2022-06-22, it doesn't look like there's sphinx documentation
# available.
["py:class", r"immutables\.(.+)"],
]
10 changes: 0 additions & 10 deletions doc/ref_kernel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -515,24 +515,14 @@ Arguments
^^^^^^^^^

.. autoclass:: KernelArgument
:members:
:undoc-members:

.. autoclass:: ValueArg
:members:
:undoc-members:

.. autoclass:: ArrayArg
:members:
:undoc-members:

.. autoclass:: ConstantArg
:members:
:undoc-members:

.. autoclass:: ImageArg
:members:
:undoc-members:

.. _temporaries:

Expand Down
2 changes: 2 additions & 0 deletions doc/ref_transform.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ Influencing data access

.. automodule:: loopy.transform.privatize

.. autofunction:: allocate_temporaries_for_base_storage

Padding Data
------------

Expand Down
18 changes: 9 additions & 9 deletions doc/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ inspect that code, too, using :attr:`loopy.Options.write_wrapper`:
if allocator is None:
allocator = _lpy_cl_tools.DeferredAllocator(queue.context)
<BLANKLINE>
# {{{ find integer arguments from shapes
# {{{ find integer arguments from array data
<BLANKLINE>
if n is None:
if a is not None:
Expand Down Expand Up @@ -1228,11 +1228,11 @@ should call :func:`loopy.get_one_linearized_kernel`:
...
---------------------------------------------------------------------------
LINEARIZATION:
0: CALL KERNEL rotate_v2(extra_args=[], extra_inames=[])
0: CALL KERNEL rotate_v2
1: tmp = arr[i_inner + i_outer*16] {id=maketmp}
2: RETURN FROM KERNEL rotate_v2
3: ... gbarrier
4: CALL KERNEL rotate_v2_0(extra_args=[], extra_inames=[])
4: CALL KERNEL rotate_v2_0
5: arr[(1 + i_inner + i_outer*16) % n] = tmp {id=rotate}
6: RETURN FROM KERNEL rotate_v2_0
---------------------------------------------------------------------------
Expand Down Expand Up @@ -1260,18 +1260,18 @@ put those instructions into the schedule.
...
---------------------------------------------------------------------------
TEMPORARIES:
tmp: type: np:dtype('int32'), shape: () aspace:private
tmp_save_slot: type: np:dtype('int32'), shape: (n // 16, 16), dim_tags: (N1:stride:16, N0:stride:1) aspace:global
tmp: type: np:dtype('int32'), shape: () aspace: private
tmp_save_slot: type: np:dtype('int32'), shape: (n // 16, 16), dim_tags: (N1:stride:16, N0:stride:1) aspace: global
---------------------------------------------------------------------------
...
---------------------------------------------------------------------------
LINEARIZATION:
0: CALL KERNEL rotate_v2(extra_args=['tmp_save_slot'], extra_inames=[])
0: CALL KERNEL rotate_v2
1: tmp = arr[i_inner + i_outer*16] {id=maketmp}
2: tmp_save_slot[tmp_save_hw_dim_0_rotate_v2, tmp_save_hw_dim_1_rotate_v2] = tmp {id=tmp.save}
3: RETURN FROM KERNEL rotate_v2
4: ... gbarrier
5: CALL KERNEL rotate_v2_0(extra_args=['tmp_save_slot'], extra_inames=[])
5: CALL KERNEL rotate_v2_0
6: tmp = tmp_save_slot[tmp_reload_hw_dim_0_rotate_v2_0, tmp_reload_hw_dim_1_rotate_v2_0] {id=tmp.reload}
7: arr[(1 + i_inner + i_outer*16) % n] = tmp {id=rotate}
8: RETURN FROM KERNEL rotate_v2_0
Expand All @@ -1297,15 +1297,15 @@ The kernel translates into two OpenCL kernels.
#define lid(N) ((int) get_local_id(N))
#define gid(N) ((int) get_group_id(N))
<BLANKLINE>
__kernel void __attribute__ ((reqd_work_group_size(16, 1, 1))) rotate_v2(__global int *__restrict__ arr, int const n, __global int *__restrict__ tmp_save_slot)
__kernel void __attribute__ ((reqd_work_group_size(16, 1, 1))) rotate_v2(__global int const *__restrict__ arr, int const n, __global int *__restrict__ tmp_save_slot)
{
int tmp;
<BLANKLINE>
tmp = arr[16 * gid(0) + lid(0)];
tmp_save_slot[16 * gid(0) + lid(0)] = tmp;
}
<BLANKLINE>
__kernel void __attribute__ ((reqd_work_group_size(16, 1, 1))) rotate_v2_0(__global int *__restrict__ arr, int const n, __global int *__restrict__ tmp_save_slot)
__kernel void __attribute__ ((reqd_work_group_size(16, 1, 1))) rotate_v2_0(__global int *__restrict__ arr, int const n, __global int const *__restrict__ tmp_save_slot)
{
int tmp;
<BLANKLINE>
Expand Down
20 changes: 11 additions & 9 deletions loopy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,8 @@
alias_temporaries, set_argument_order,
rename_argument,
set_temporary_scope,
set_temporary_address_space)
set_temporary_address_space,
allocate_temporaries_for_base_storage)

from loopy.transform.subst import (extract_subst,
assignment_to_subst, expand_subst, find_rules_matching,
Expand Down Expand Up @@ -157,7 +158,6 @@
from loopy.target.opencl import OpenCLTarget
from loopy.target.pyopencl import PyOpenCLTarget
from loopy.target.ispc import ISPCTarget
from loopy.target.numba import NumbaTarget, NumbaCudaTarget

from loopy.tools import Optional, t_unit_to_python, memoize_on_disk

Expand Down Expand Up @@ -216,6 +216,7 @@
"remove_unused_arguments",
"alias_temporaries", "set_argument_order",
"rename_argument", "set_temporary_scope", "set_temporary_address_space",
"allocate_temporaries_for_base_storage",

"find_instructions", "map_instructions",
"set_instruction_priority", "add_dependency",
Expand Down Expand Up @@ -302,7 +303,6 @@
"CWithGNULibcTarget", "ExecutableCWithGNULibcTarget",
"CudaTarget", "OpenCLTarget",
"PyOpenCLTarget", "ISPCTarget",
"NumbaTarget", "NumbaCudaTarget",
"ASTBuilderBase",

"Optional", "memoize_on_disk",
Expand Down Expand Up @@ -366,7 +366,7 @@ def set_options(kernel, *args, **kwargs):
# {{{ library registration

@for_each_kernel
def register_preamble_generators(kernel, preamble_generators):
def register_preamble_generators(kernel: LoopKernel, preamble_generators):
"""
:arg manglers: list of functions of signature ``(preamble_info)``
generating tuples ``(sortable_str_identifier, code)``,
Expand All @@ -376,7 +376,8 @@ def register_preamble_generators(kernel, preamble_generators):
"""
from loopy.tools import unpickles_equally

new_pgens = kernel.preamble_generators[:]
new_pgens = tuple(kernel.preamble_generators)

for pgen in preamble_generators:
if pgen not in new_pgens:
if not unpickles_equally(pgen):
Expand All @@ -385,7 +386,7 @@ def register_preamble_generators(kernel, preamble_generators):
"and would thus disrupt loopy's caches"
% pgen)

new_pgens.insert(0, pgen)
new_pgens = (pgen,) + new_pgens

return kernel.copy(preamble_generators=new_pgens)

Expand All @@ -394,7 +395,7 @@ def register_preamble_generators(kernel, preamble_generators):
def register_symbol_manglers(kernel, manglers):
from loopy.tools import unpickles_equally

new_manglers = kernel.symbol_manglers[:]
new_manglers = kernel.symbol_manglers
for m in manglers:
if m not in new_manglers:
if not unpickles_equally(m):
Expand All @@ -403,7 +404,7 @@ def register_symbol_manglers(kernel, manglers):
"and would disrupt loopy's caches"
% m)

new_manglers.insert(0, m)
new_manglers = (m,) + new_manglers

return kernel.copy(symbol_manglers=new_manglers)

Expand Down Expand Up @@ -484,7 +485,8 @@ def make_copy_kernel(new_dim_tags, old_dim_tags=None):
result = make_kernel(set_str,
"output[%s] = input[%s]"
% (commad_indices, commad_indices),
lang_version=MOST_RECENT_LANGUAGE_VERSION)
lang_version=MOST_RECENT_LANGUAGE_VERSION,
default_offset=auto)

result = tag_array_axes(result, "input", old_dim_tags)
result = tag_array_axes(result, "output", new_dim_tags)
Expand Down
Loading

0 comments on commit 34c7344

Please sign in to comment.