1010:encoding: utf-8
1111:lang: en
1212:dpcpp: pass:[DPC++]
13- :stem: asciimath
13+ :sectnums:
14+ :sectnumlevels: 4
1415
1516// Set the default source code type in this document to C++,
1617// for syntax highlighting purposes. This is needed because
1718// docbook uses c++ and html5 uses cpp.
1819:language: {basebackend@docbook:c++:cpp}
1920
2021
21- == 1. Notice
22+ == Notice
2223
2324[%hardbreaks]
2425Copyright (C) Codeplay Software Limited. All rights reserved.
@@ -28,14 +29,14 @@ of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
2829permission by Khronos.
2930
3031
31- == 2. Contact
32+ == Contact
3233
3334To report problems with this extension, please open a new issue at:
3435
3536https://github.com/intel/llvm/issues
3637
3738
38- == 3. Dependencies
39+ == Dependencies
3940
4041This extension is written against the SYCL 2020 revision 6 specification. All
4142references below to the "core SYCL specification" or to section numbers in the
@@ -45,15 +46,15 @@ This extension builds on top of the proposed SYCL graphs
4546https://github.com/reble/llvm/blob/sycl-graph-update/sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc[extension
4647proposal]. All references to the "graphs proposal" refer to this proposal.
4748
48- == 4. Status
49+ == Status
4950
5051This is a proposed extension specification, intended to gather community
5152feedback. Interfaces defined in this specification may not be implemented yet
5253or may be in a preliminary state. The specification itself may also change in
5354incompatible ways before it is finalized. *Shipping software products should
5455not rely on APIs defined in this specification.*
5556
56- == 5. Overview
57+ == Overview
5758
5859The SYCL graph
5960https://github.com/reble/llvm/blob/sycl-graph-update/sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc[extension
@@ -87,9 +88,9 @@ fusion of two or more kernels in a SYCL graph into a single kernel **at
8788runtime**. This requires the extension of the runtime with some sort of JIT
8889compiler to allow for the fusion of kernel functions at runtime.
8990
90- == 6. Specification
91+ == Specification
9192
92- === 6.1. Feature test macro
93+ === Feature test macro
9394
9495This extension provides a feature-test macro as described in the core SYCL
9596specification. An implementation supporting this extension must predefine the
@@ -108,11 +109,11 @@ supports.
108109|Initial version of this extension.
109110|===
110111
111- === 6.2. API modifications
112+ === API modifications
112113
113- ==== 6.2.1. Properties
114+ ==== Properties
114115
115- ===== 6.2.1.1. Graph Fusion Property
116+ ===== Graph Fusion Property
116117
117118The API for `command_graph<graph_state::modifiable>::finalize()` includes a
118119`property_list` parameter. The following property, defined by this extension,
@@ -124,7 +125,7 @@ sycl::ext::oneapi::experimental::property::command_graph::perform_fusion
124125```
125126
126127The property is not prescriptive. Implementations are free to not perform fusion
127- if it is not possible (see below section <<_6_5_limitations >>), fusion is not
128+ if it is not possible (see below section <<_limitations >>), fusion is not
128129supported by the implementation, or the implementation decides not to perform
129130fusion for other reasons. It is not an error if an implementation does not
130131perform fusion even though the property is passed.
@@ -133,7 +134,7 @@ Implementations can provide a diagnostic message in case fusion was not
133134performed through an implementation-specified mechanism, but are not required to
134135do so.
135136
136- ===== 6.2.1.2. Barrier property
137+ ===== Barrier property
137138
138139The following property can be added to the `property_list` of the
139140`command_graph<graph_state::modifiable>::finalize()` API.
@@ -150,7 +151,7 @@ The property only takes effect if the
150151property is also part of the `property_list` of the same invocation of
151152`command_graph<...>::finalize()`.
152153
153- ===== 6.2.1.3. Local internalization property
154+ ===== Local internalization property
154155
155156The following property can be passed to three different APIs, namely:
156157
@@ -187,7 +188,7 @@ no error if they do not perform internalization. Implementations can provide a
187188diagnostic message in case internalization was not performed through an
188189implementation-specified mechanism, but are not required to do so.
189190
190- ===== 6.2.1.4. Private internalization property
191+ ===== Private internalization property
191192
192193The following property can be passed to three different APIs, namely:
193194
@@ -224,7 +225,7 @@ no error if they do not perform internalization. Implementations can provide a
224225diagnostic message in case internalization was not performed through an
225226implementation-specified mechanism, but are not required to do so.
226227
227- ==== 6.2.2. Device information descriptors
228+ ==== Device information descriptors
228229
229230To support querying whether a SYCL device and the underlying platform support
230231kernel fusion for graphs, the following device information descriptor is added
@@ -238,7 +239,7 @@ When passed to `device::get_info<...>()`, the function returns `true` if the
238239SYCL `device` and the underlying `platform` support kernel fusion for graphs.
239240
240241
241- === 6.3. Linearization
242+ === Linearization
242243
243244In order to be able to perform kernel fusion, the commands in a graph must be
244245arranged in a valid sequential order.
@@ -252,7 +253,7 @@ partial order) is implementation defined. The linearization should be
252253deterministic, i.e. it should yield the same sequence when presented with the
253254same DAG.
254255
255- === 6.4. Synchronization in kernels
256+ === Synchronization in kernels
256257
257258Group barriers semantics do not change in the fused kernel and barriers already
258259in the unfused kernels are preserved in the fused kernel. Despite this, it is
@@ -261,7 +262,7 @@ same work-group executing a fused kernel, a barrier is added between each of the
261262kernels being fused. This automatic insertion of additional barriers can be
262263deactivated through the property defined above.
263264
264- === 6.5. Limitations
265+ === Limitations
265266
266267Some scenarios might require fusion to be cancelled if some undesired scenarios
267268arise.
@@ -278,29 +279,29 @@ The following sections describe a number of scenarios that might require to
278279cancel fusion. Note that some implementations might be more capable/permissive
279280and might not abort fusion in all of these cases.
280281
281- ==== 6.5.1. Hierarchical Parallelism
282+ ==== Hierarchical Parallelism
282283
283284The extension does not support kernels using hierarchical parallelism. Although
284285some implementations might want to add support for this kind of kernels.
285286
286- ==== 6.5.2. Incompatible ND-ranges of the kernels to fuse
287+ ==== Incompatible ND-ranges of the kernels to fuse
287288
288289Incompatibility of ND-ranges will be determined by the kernel fusion
289290implementation. All implementations should support fusing kernels with the exact
290291same ND-ranges, but implementations might cancel fusion as soon as a kernel with
291292a different ND-range is submitted.
292293
293- ==== 6.5.3. Kernels with different dimensions
294+ ==== Kernels with different dimensions
294295
295296Similar to the previous one, it is implementation-defined whether or not to
296297support fusing kernels with different dimensionality.
297298
298- ==== 6.5.4. No intermediate representation
299+ ==== No intermediate representation
299300
300301In case any of the kernels to be fused does not come with an accessible
301302suitable intermediate representation, kernel fusion is canceled.
302303
303- ==== 6.5.5. Explicit memory operations and host tasks
304+ ==== Explicit memory operations and host tasks
304305
305306The graph proposal allows graphs to contain, next to device kernels, explicit
306307memory operations and host tasks. As both of these types of commands cannot be
@@ -311,13 +312,13 @@ It is valid to execute some memory operations and host tasks before all device
311312kernels and some after all device kernels, as long as that sequence is a valid
312313linearization.
313314
314- ==== 6.5.6. Multi-device graph
315+ ==== Multi-device graph
315316
316317Attempting to fuse a graph containing device kernels for more than one device
317318may lead to fusion being cancelled, as kernel fusion across multiple devices
318319and/or backends is generally not possible.
319320
320- === 6.6. Internalization
321+ === Internalization
321322
322323While avoiding repeated kernel launch overheads will most likely already improve
323324application performance, kernel fusion can deliver even higher performance gains
@@ -368,7 +369,7 @@ no internalization were to happen.
368369In sum this allows users to trigger internalization of a buffer or allocated
369370device memory by just specifying a single property.
370371
371- ==== 6.6.1. Buffer internalization
372+ ==== Buffer internalization
372373
373374In some cases, the user will specify different internalization targets for a
374375buffer and accessors to such buffer. When incompatible combinations are used, an
@@ -440,9 +441,9 @@ performed. If there is a mismatch between the two accessors (access
440441range, access offset, number of dimensions, data type), no
441442internalization is performed.
442443
443- == 7. Examples
444+ == Examples
444445
445- === 7.1. Buffer-based example
446+ === Buffer-based example
446447
447448```c++
448449#include <sycl/sycl.hpp>
@@ -530,7 +531,7 @@ int main() {
530531}
531532```
532533
533- === 7.2. USM-based example
534+ === USM-based example
534535
535536```c++
536537#include <sycl/sycl.hpp>
@@ -617,11 +618,12 @@ int main() {
617618}
618619```
619620
620- == 8. Contributors
621+ == Contributors
621622
622623Lukas Sommer, Codeplay +
623624Victor Lomüller, Codeplay +
624625Victor Perez, Codeplay +
626+ Ewan Crawford, Codeplay +
625627
626628== Revision History
627629
0 commit comments