Vectorisation sprint #654

sv2518 · 2022-03-01T10:20:07Z

Implements automatic cross-element vectorisation (This is the work from TJ and Kaushik)

See firedrakeproject/firedrake#2365 for firedrake CI runs.
The corresponding loopy PR is inducer/loopy#557.

Big thanks to @kaushikcfd for working hard on an update of this so that we can get it merged.

The mechanism in PyOP2

Check if kernel is vectorisable (see list below)
Change target of the kernel to CVectorExtensionsTarget
Inline all inner kernels in the wrapper kernel
Align all temporaries
Decide which instructions cannot be vectorised with extensions (see list below)
Shift the bound of the loop index (iname) to vectorise over so that it starts from 0
Split the iname (according to the SIMD length of the architecture which is determined with py-cpuinfo)
Break the temporaries (add a new axis to the temporary) and index it with the provided iname
Tag axes to vectorise over
Tag iname to vectorise with with lp.VectorizeTag(lp.OpenMPSIMDTag()) where VectorizeTag indicates that we try to use vector extensions first, but if an instruction can't be vectorised we use the fallback OpenMPSIMDTag which wraps the instruction in openmp simd pragmas

Kernels which cannot be vectorised

Kernels which assemble matrices
Kernels which use complex types
Kernels with read write access arguments
The kernels which generate the extrusion coordinates
Kernels with conditionals

Single instructions which cannot be vectorised

Instructions which are outside the loop which was split (because they don't depend on the loop index we vectorise over)
Constant literal temporaries on the RHS (because we cannot just index into them)
Instruction with calls to Slate inverses and solve (gcc could do this, Kaushik extended the solve and inverse callables in PyOP2 with strided versions for that, but clang can't)

Only works when kernel is a Loopy kernel.

…he tree vectorisation flag for our vectorisation anyways.

… loopy codebase.

…the variable.

Don't vectorise, if complex arguments. Check if vect strategy specified, otw dont vectorise.

connorjward

I think there are going to be a few naming errors from my refactoring. I've labelled where they are.

pyop2/global_kernel.py

…ross caller-callee is a bit involved and loopy can't deal with it yet.

sv2518 · 2022-05-24T16:00:21Z

Ok Connor I think I addressed all your comments now :)

connorjward

Only one comment. Otherwise looks good AFAICT. You should definitely run the Firedrake test suite to make sure you haven't broken anything by accident.

pyop2/configuration.py

N.B. This is currently set to use PYOP2_TIME as the configuration option. This is misleading and should be changed.

Connorjward/add nbytes

…print

Don't add py-cpuinfo

Loopy now requires py3.8

tj-sun and others added 21 commits July 22, 2020 15:22

codegen: Implement SIMD vectorisation

2a8d17c

Only works when kernel is a Loopy kernel.

add omp simd vectorization mode

fbc6e4a

add openmp flag and by pass workaround flag

5ae780d

DROP BEFORE MERGE: test with correct loopy branch

ba693dc

Turn of tree vectorize for certain gcc compilers. We might not need t…

4ec0769

…he tree vectorisation flag for our vectorisation anyways.

Add simd compiler flags.

f9e60fd

Remove time configuration.

00e073d

Default SIMD width.

1cf7698

Generate CVec Target with batch size infomation and move typedef into…

3e66946

… loopy codebase.

Move zero declaration to loopy code base to be more robust in naming …

1238ce8

…the variable.

Added conditionals when to vectorise:

1d54777

Don't vectorise, if complex arguments. Check if vect strategy specified, otw dont vectorise.

Drop omp vectorisation.

b369213

Add -march=native everywhere.

1c6346e

Silence warnings.

856b6aa

Change vector tag.

5e52ce1

Give more control over vectorisation to PyOP2.

537c14c

Naming adaption.

9317654

Realize ilp first.

6723b6a

Jenkins.

38ebc8a

Merge branch 'master' into vectorisation-restructure-checks

32b2910

DBM: run against new loopy branch

944c6cf

connorjward reviewed Mar 1, 2022

View reviewed changes

pyop2/global_kernel.py Outdated Show resolved Hide resolved

pyop2/global_kernel.py Outdated Show resolved Hide resolved

pyop2/global_kernel.py Outdated Show resolved Hide resolved

sv2518 added 8 commits March 1, 2022 12:58

Lint

3a1eb24

More adapations to new PyOP2

681e315

More adapations to new PyOP2

48d6142

DBM take the correct branch

792c8f0

Adapt to new PyOP2 and vectorisation

2469870

Adapt to new PyOP2 and vectorisation

4bbcde5

Fix return wrapper with kernel not kernel

a5c0455

We do need to inline bc Implementing transforms that apply cleanly ac…

c374031

…ross caller-callee is a bit involved and loopy can't deal with it yet.

Put vectorisation strategy only in cache key of the global kernel.

e5fe4d2

sv2518 requested a review from connorjward May 24, 2022 16:00

connorjward reviewed May 25, 2022

View reviewed changes

pyop2/configuration.py Outdated Show resolved Hide resolved

sv2518 and others added 18 commits May 25, 2022 15:31

lint

0eff9d6

Fix docs

22ce06e

Fix config error

bdefbfa

Fix config error

2a459e5

Don't add py-cpuinfo

56c65da

Add nbytes property

ca5c51b

N.B. This is currently set to use PYOP2_TIME as the configuration option. This is misleading and should be changed.

Drop unused args

dc5f3bc

Time->extra_info

ac36708

Merge branch 'vectorisation-sprint' into connorjward/add-nbytes

89c9dec

Merge pull request #666 from OP2/connorjward/add-nbytes

e2af4c7

Connorjward/add nbytes

Merge branch 'vectorisation-sprint' into JDBetteridge/vectorisation-s…

4de6f06

…print

Merge pull request #665 from OP2/JDBetteridge/vectorisation-sprint

2840f28

Don't add py-cpuinfo

Fix bandwidth calculation

89feb72

Add simd compiler flag also to LinuxGNU compiler

0857145

Add vectorisation flag to linux clang compiler too

662241e

account for changed in loopy's vectorization syntax

203223c

run CI with py3.8

fae323f

Loopy now requires py3.8

Fallback for stopping criterium

030cae5

sv2518 mentioned this pull request Jul 7, 2022

Vectorisation #589

Closed

sv2518 and others added 7 commits July 7, 2022 18:44

Fallback for stopping criterium

ece0e62

Reduce inames to untag

934e147

Reduce inames to untag

bd95ba3

Fallback for stopping criterium

fd6650d

unroll (not vectorize) loops surrounding CInstructions

f69755d

get rid of noop insns

e72f316

Fix merge leftovers for vectorisation in chapter 3

09bf629

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorisation sprint #654

Vectorisation sprint #654

sv2518 commented Mar 1, 2022 •

edited

Loading

connorjward left a comment

sv2518 commented May 24, 2022

connorjward left a comment

Vectorisation sprint #654

Are you sure you want to change the base?

Vectorisation sprint #654

Conversation

sv2518 commented Mar 1, 2022 • edited Loading

connorjward left a comment

Choose a reason for hiding this comment

sv2518 commented May 24, 2022

connorjward left a comment

Choose a reason for hiding this comment

sv2518 commented Mar 1, 2022 •

edited

Loading