From f1b4f49089f67d20cc59273f2c1e7554ea933df9 Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Wed, 19 Jul 2023 13:11:07 -0700 Subject: [PATCH 01/12] [SYCL][Graph] Move extension to experimental folder --- .../{proposed => experimental}/sycl_ext_oneapi_graph.asciidoc | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename sycl/doc/extensions/{proposed => experimental}/sycl_ext_oneapi_graph.asciidoc (100%) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc similarity index 100% rename from sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc rename to sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc From b884341a86747f7b7660dfb47831317c4f5539cf Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Wed, 19 Jul 2023 13:30:41 -0700 Subject: [PATCH 02/12] [SYCL][Graph] Update spec status --- .../experimental/sycl_ext_oneapi_graph.asciidoc | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 178dfa9a4315e..727d3e952da6a 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -62,11 +62,12 @@ SYCL specification refer to that revision. == Status -This is a proposed extension specification, intended to gather community -feedback. Interfaces defined in this specification may not be implemented yet -or may be in a preliminary state. The specification itself may also change in -incompatible ways before it is finalized. *Shipping software products should -not rely on APIs defined in this specification.* +This is an experimental extension specification, intended to provide early +access to features and gather community feedback. Interfaces defined in this +specification are implemented in {dpcpp}, but they are not finalized and may +change incompatibly in future versions of {dpcpp} without prior notice. +*Shipping software products should not rely on APIs defined in this +specification.* == Introduction From a6e7c3b8a5636200c63cf0e4222432d1453cee41 Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Wed, 19 Jul 2023 13:52:07 -0700 Subject: [PATCH 03/12] [SYCL][Graph] Adding section on unimplemented features --- .../sycl_ext_oneapi_graph.asciidoc | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 727d3e952da6a..34279d51463b6 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -1653,6 +1653,24 @@ block size. the finalize call either extending the basic command graph proposal or layered as a separate extension proposal. +== Non-implemented features +The following features are not yet or only partially implemented: + +. Extending lifetime of buffers used in a graph. +. Buffer taking a copy of underlying host data when buffer is used in a graph. +. Executable graph `update()`. +. Using `handler::host_task` in a graph node. +. Using `handler::fill` in a graph node implemented for USM only. +. Using `handler::memset` in a graph node. +. Using `handler::prefetch` in a graph node. +. Using handler::memadvise in a graph node. +. Using specialization constants in a graph node. +. Using reductions in a graph node. +. Using sycl streams in a graph node. +. Thread safety of new methods. +. Profiling an event returned from graph submission with `event::get_profiling_info()`. +. Throwing exceptions for invalid usage is only partially implemented. + == Revision History [cols="5,15,15,70"] From ff5456133954a2f233f797148adcf1ec5d993779 Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Tue, 1 Aug 2023 15:35:42 -0500 Subject: [PATCH 04/12] [SYCL][Graph] Moving non-implemented features to extra section (#282) * [SYCL][Graph] Fix broken link * [SYCL][Graph] Addressing feedback from review * [SYCL][Graph] Bump version and add missing known issue * [SYCL][Graph] Add new exception throwing for unsupported backend to the Spec (#281) --------- Co-authored-by: Maxime France-Pillois Co-authored-by: Ben Tracy --- .../sycl_ext_oneapi_graph.asciidoc | 351 +++++++++--------- 1 file changed, 178 insertions(+), 173 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 34279d51463b6..73d1b653d6d5b 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -315,10 +315,6 @@ class no_cycle_check { no_cycle_check() = default; }; -class no_host_copy { -public: - no_host_copy() = default; -}; } // namespace graph namespace node { @@ -387,7 +383,6 @@ template<> class command_graph { public: command_graph() = delete; - void update(const command_graph& graph); }; } // namespace ext::oneapi::experimental @@ -445,8 +440,8 @@ support using this graph extension. :crs: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:reference-semantics -Node is a class that encapsulates tasks like SYCL kernel functions, memory -operations, or host tasks for deferred execution. A graph must +Node is a class that encapsulates tasks like SYCL kernel functions, or memory +operations for deferred execution. A graph must be created first, the structure of a graph is defined second by adding nodes and edges. @@ -523,19 +518,6 @@ graph LR ==== Graph Properties [[graph-properties]] -===== No-Host-Copy Property - -The `no_host_copy` property is defined by this extension and can be passed to -either the `command_graph` constructor or the `command_graph::begin_recording()` -member function. This property will disable the host data copy that may -occur as detailed in the <> section of -this specification. - -Passing this property represents a promise from the user that host data -associated with a buffer that was created using a host data pointer will -outlive any executable graphs created from a modifiable graph which uses -that buffer. - ===== No-Cycle-Check Property The `property::graph::no_cycle_check` property disables any checks if a newly @@ -548,16 +530,6 @@ puts that `command_graph` into an undefined state. Any further operations performed on a `command_graph` in this state will result in undefined behavior. -==== Executable Graph Update - -A graph in the executable state can have each nodes inputs & outputs updated -using the `command_graph::update()` method. This takes a graph in the -modifiable state and updates the executable graph to use the node input & -outputs of the modifiable graph, a technique called _Whole Graph Update_. The -modifiable graph must have the same topology as the graph originally used to -create the executable graphs, with the nodes targeting the same devices and -added in the same order. - ==== Graph Member Functions Table {counter: tableNumber}. Constructor of the `command_graph` class. @@ -601,6 +573,9 @@ associated with `syclContext`. * Throws synchronously with error code `invalid` if `syclDevice` <>. +* Throws synchronously with error code `invalid` if the backend associated +with `syclDevice` is not supported. + |=== Table {counter: tableNumber}. Member functions of the `command_graph` class. @@ -647,7 +622,7 @@ group function passed to `queue::submit` unless explicitly stated otherwise in <>. Code in the function is executed synchronously, before the function returns back to `command_graph::add`, with the exception of any SYCL commands (e.g. kernels, -host tasks, or explicit memory copy operations). These commands are captured +or explicit memory copy operations). These commands are captured into the graph and executed asynchronously when the graph is submitted to a queue. The requisites of `cgf` will be used to identify any dependent nodes in the graph to form edges with. @@ -858,72 +833,6 @@ Exceptions: |=== -:sycl-kernel-function: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sycl-kernel-function - -Table {counter: tableNumber}. Member functions of the `command_graph` class (executable graph update). -[cols="2a,a"] -|=== -|Member function|Description - -| -[source, c++] ----- -void -update(const command_graph& graph); ----- - - -|Updates the executable graph node inputs & outputs from a topologically -identical modifiable graph. A topologically identical graph is one with the -same structure of nodes and edges, and the nodes added in the same order to -both graphs. Equivalent nodes in topologically identical graphs each have the -same command, targeting the same device. There is the additional limitation that -to update an executable graph, every node in the graph must be either a kernel -command or a host task. - -The only characteristic that can differ between two topologically identical -graphs during an update are the arguments to kernel nodes. For example, -the graph may capture different values for the USM pointers or accessors used -in the graph. It is these kernels arguments in `graph` that constitute the -inputs & outputs to update to. - -Differences in the following characteristics between two graphs during an -update results in undefined behavior: - -* Modifying the native C++ callable of a `host task` node. -* Modifying the {sycl-kernel-function}[kernel function] of a kernel node. - -The effects of the update will be visible on the next submission of the -executable graph without the need for additional user synchronization. - -Preconditions: - -* This member function is only available when the `command_graph` state is - `graph_state::executable`. - -Parameters: - -* `graph` - Modifiable graph object to update graph node inputs & outputs with. - This graph must have the same topology as the original graph used on - executable graph creation. - -Exceptions: - -* Throws synchronously with error code `invalid` if the topology of `graph` is - not the same as the existing graph topology, or if the nodes were not added in - the same order. - -:handler-copy-functions: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.members.handler.copy - -* Throws synchronously with error code `invalid` if `graph` contains any node - which is not a kernel command or host task, e.g. - {handler-copy-functions}[memory operations]. - -* Throws synchronously with error code `invalid` if the context or device - associated with `graph` does not match that of the `command_graph` being - updated. -|=== - === Queue Class Modifications :queue-class: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:interface.queue.class @@ -1118,77 +1027,13 @@ As a result, users don't need to manually wrap queue recording code in a back to the executing state. Instead, an uncaught exception destroying the modifiable graph will perform this action, useful in RAII pattern usage. -=== Storage Lifetimes [[storage-lifetimes]] - -The lifetime of any buffer recorded as part of a submission -to a command graph will be extended in keeping with the common reference -semantics and buffer synchronization rules in the SYCL specification. It will be -extended either for the lifetime of the graph (including both modifiable graphs -and the executable graphs created from them) or until the buffer is no longer -required by the graph (such as after being replaced through executable graph update). - -If a buffer created with a host data pointer is recorded as part of a submission to -a command graph, the lifetime of that host data will also be extended by taking a -copy of that data inside the buffer. To illustrate, consider the following example: - -[source,c++] ----- -void foo(queue q /* queue in recording mode */ ) { - float data[NUM]; - buffer buf{data, range{NUM}}; - q.submit([&](handler &cgh) { - accessor acc{buf, cgh, read_only}; - cgh.single_task([] { - // use "acc" - }); - }); - // "data" goes out of scope -} ----- - -In this example, the implementation extends the lifetime of the buffer because -it is used in the recorded graph. Because the buffer uses the host memory data, -the implementation also makes an internal copy of that host data. As illustrated -above, that host memory might go out of scope before the recorded graph goes out -of scope, or before the data has been copied to the device. - -The default behavior is to always copy the host data in a case like this, but -this is not necessary if the user knows that the lifetime of the host data -outlives the lifetime of the recorded graph. If the user knows this is the -case, they may use the `graph::no_host_copy` property to avoid the internal -copy. Passing the property to `begin_recording()` will prevent host copies only -for commands recorded before `end_recording()` is called for a given queue. -Passing the property to the `command_graph` constructor will prevent host copies -for all commands recorded to the graph. - -The implementation guarantees that the host memory will not be copied internally -if all the commands accessing this buffer use `access_mode::write` or the -`no_init` property because the host memory is not needed in these cases. -Note, however, that these cases require the application to disable copy-back -as described in <>. - === Host Tasks :host-task: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:interfaces.hosttasks -A {host-task}[host task] is a native C++ callable, scheduled according to SYCL -dependency rules. It is valid to record a host task as part of graph, though it -may lead to sub-optimal graph performance because a host task node may prevent -the SYCL runtime from submitting the entire executable `command_graph` to the -device at once. - -Host tasks can be updated as part of <> -by replacing the whole node with the new callable. - -[source,c++] ----- -auto node = graph.add([&](sycl::handler& cgh){ - // Host code here is evaluated during the call to add() - cgh.host_task([=](){ - // Code here is evaluated as part of executing the command graph node - }); -}); ----- +It is not yet supported to have a host task inside a `command_graph`. +Support will be added subsequently as detailed in the <> +part from the <> section of this specification. === Queue Behavior In Recording Mode @@ -1349,7 +1194,7 @@ Recorded commands are not counted as submitted for the purposes of its operation ==== sycl_ext_oneapi_device_global The new handler methods, and queue shortcuts, defined by -link:../proposed/sycl_ext_oneapi_device_global.asciidoc[sycl_ext_oneapi_device_global]. +link:../experimental/sycl_ext_oneapi_device_global.asciidoc[sycl_ext_oneapi_device_global]. cannot be used in graph nodes. A synchronous exception will be thrown with error code `invalid` if a user tries to add them to a graph. @@ -1536,7 +1381,7 @@ submitted in its entirety for execution via ---- -== Future Direction +== Future Direction [[future-direction]] === Memory Allocation Nodes @@ -1595,6 +1440,168 @@ problem this extension currently aims to solve, it is the responsibility of the user to decide the device each command will be processed for, not the SYCL runtime. +=== Storage Lifetimes [[storage-lifetimes]] + +The lifetime of any buffer recorded as part of a submission +to a command graph will be extended in keeping with the common reference +semantics and buffer synchronization rules in the SYCL specification. It will be +extended either for the lifetime of the graph (including both modifiable graphs +and the executable graphs created from them) or until the buffer is no longer +required by the graph (such as after being replaced through executable graph update). + +If a buffer created with a host data pointer is recorded as part of a submission to +a command graph, the lifetime of that host data will also be extended by taking a +copy of that data inside the buffer. To illustrate, consider the following example: + +[source,c++] +---- +void foo(queue q /* queue in recording mode */ ) { + float data[NUM]; + buffer buf{data, range{NUM}}; + q.submit([&](handler &cgh) { + accessor acc{buf, cgh, read_only}; + cgh.single_task([] { + // use "acc" + }); + }); + // "data" goes out of scope +} +---- + +In this example, the implementation extends the lifetime of the buffer because +it is used in the recorded graph. Because the buffer uses the host memory data, +the implementation also makes an internal copy of that host data. As illustrated +above, that host memory might go out of scope before the recorded graph goes out +of scope, or before the data has been copied to the device. + +The default behavior is to always copy the host data in a case like this, but +this is not necessary if the user knows that the lifetime of the host data +outlives the lifetime of the recorded graph. If the user knows this is the +case, they may use the `graph::no_host_copy` property to avoid the internal +copy. Passing the property to `begin_recording()` will prevent host copies only +for commands recorded before `end_recording()` is called for a given queue. +Passing the property to the `command_graph` constructor will prevent host copies +for all commands recorded to the graph. + +The implementation guarantees that the host memory will not be copied internally +if all the commands accessing this buffer use `access_mode::write` or the +`no_init` property because the host memory is not needed in these cases. +Note, however, that these cases require the application to disable copy-back +as described in <>. + +===== No-Host-Copy Property + +The `no_host_copy` property is defined by this extension and can be passed to +either the `command_graph` constructor or the `command_graph::begin_recording()` +member function. This property will disable the host data copy that may +occur as detailed in the <> section of +this specification. + +Passing this property represents a promise from the user that host data +associated with a buffer that was created using a host data pointer will +outlive any executable graphs created from a modifiable graph which uses +that buffer. + +=== Host Tasks [[host-tasks]] + +A {host-task}[host task] is a native C++ callable, scheduled according to SYCL +dependency rules. It is valid to record a host task as part of graph, though it +may lead to sub-optimal graph performance because a host task node may prevent +the SYCL runtime from submitting the entire executable `command_graph` to the +device at once. + +Host tasks can be updated as part of <> +by replacing the whole node with the new callable. + +[source,c++] +---- +auto node = graph.add([&](sycl::handler& cgh){ + // Host code here is evaluated during the call to add() + cgh.host_task([=](){ + // Code here is evaluated as part of executing the command graph node + }); +}); +---- + +=== Graph Update + +==== Executable Graph Update + +A graph in the executable state can have each nodes inputs & outputs updated +using the `command_graph::update()` method. This takes a graph in the +modifiable state and updates the executable graph to use the node input & +outputs of the modifiable graph, a technique called _Whole Graph Update_. The +modifiable graph must have the same topology as the graph originally used to +create the executable graphs, with the nodes targeting the same devices and +added in the same order. + +:sycl-kernel-function: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sycl-kernel-function + +Table {counter: tableNumber}. Member functions of the `command_graph` class (executable graph update). +[cols="2a,a"] +|=== +|Member function|Description + +| +[source, c++] +---- +void +update(const command_graph& graph); +---- + + +|Updates the executable graph node inputs & outputs from a topologically +identical modifiable graph. A topologically identical graph is one with the +same structure of nodes and edges, and the nodes added in the same order to +both graphs. Equivalent nodes in topologically identical graphs each have the +same command, targeting the same device. There is the additional limitation that +to update an executable graph, every node in the graph must be either a kernel +command or a host task. + +The only characteristic that can differ between two topologically identical +graphs during an update are the arguments to kernel nodes. For example, +the graph may capture different values for the USM pointers or accessors used +in the graph. It is these kernels arguments in `graph` that constitute the +inputs & outputs to update to. + +Differences in the following characteristics between two graphs during an +update results in undefined behavior: + +* Modifying the native C++ callable of a `host task` node. +* Modifying the {sycl-kernel-function}[kernel function] of a kernel node. + +The effects of the update will be visible on the next submission of the +executable graph without the need for additional user synchronization. + +Preconditions: + +* This member function is only available when the `command_graph` state is + `graph_state::executable`. + +Parameters: + +* `graph` - Modifiable graph object to update graph node inputs & outputs with. + This graph must have the same topology as the original graph used on + executable graph creation. + +Exceptions: + +* Throws synchronously with error code `invalid` if the topology of `graph` is + not the same as the existing graph topology, or if the nodes were not added in + the same order. + +:handler-copy-functions: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.members.handler.copy + +* Throws synchronously with error code `invalid` if `graph` contains any node + which is not a kernel command or host task, e.g. + {handler-copy-functions}[memory operations]. + +* Throws synchronously with error code `invalid` if the context or device + associated with `graph` does not match that of the `command_graph` being + updated. + +|=== + == Issues === Simultaneous Graph Submission @@ -1653,13 +1660,9 @@ block size. the finalize call either extending the basic command graph proposal or layered as a separate extension proposal. -== Non-implemented features -The following features are not yet or only partially implemented: +== Non-implemented features and known issues +The following features are not yet supported: -. Extending lifetime of buffers used in a graph. -. Buffer taking a copy of underlying host data when buffer is used in a graph. -. Executable graph `update()`. -. Using `handler::host_task` in a graph node. . Using `handler::fill` in a graph node implemented for USM only. . Using `handler::memset` in a graph node. . Using `handler::prefetch` in a graph node. @@ -1669,7 +1672,7 @@ The following features are not yet or only partially implemented: . Using sycl streams in a graph node. . Thread safety of new methods. . Profiling an event returned from graph submission with `event::get_profiling_info()`. -. Throwing exceptions for invalid usage is only partially implemented. +. Subgraph can only be added as a node to any parent graph once, and will not correctly execute by itself after being added as a sub-graph. == Revision History @@ -1681,5 +1684,7 @@ The following features are not yet or only partially implemented: |1|2023-03-23|Pablo Reble, Ewan Crawford, Ben Tracy, Julian Miller |Initial public working draft +|2|2023-08-01|Pablo Reble, Ewan Crawford, Ben Tracy, Julian Miller +|Promote status to experimental |======================================== From 748c8e2060df8b8596a33c51c537123a53cf284c Mon Sep 17 00:00:00 2001 From: Ewan Crawford Date: Wed, 2 Aug 2023 13:11:19 +0100 Subject: [PATCH 05/12] [SYCL][Docs] Updates to experimental status spec (#284) * Add `sycl_ext_oneapi_bindless_images` as an unsupported extension. See https://github.com/intel/llvm/pull/10609 * Fix host-tasks link, as we now have two sections with this heading. * Add Maxime as contributor to revision 2. * Minor formatting. --- .../sycl_ext_oneapi_graph.asciidoc | 25 ++++++++++++++----- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 73d1b653d6d5b..46e0c88ef446d 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -1032,7 +1032,7 @@ modifiable graph will perform this action, useful in RAII pattern usage. :host-task: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:interfaces.hosttasks It is not yet supported to have a host task inside a `command_graph`. -Support will be added subsequently as detailed in the <> +Support will be added subsequently as detailed in the <> part from the <> section of this specification. === Queue Behavior In Recording Mode @@ -1201,6 +1201,16 @@ code `invalid` if a user tries to add them to a graph. Removing this restriction is something we may look at for future revisions of `sycl_ext_oneapi_graph`. +=== sycl_ext_oneapi_bindless_images + +The new handler methods, and queue shortcuts, defined by +link:../experimental/sycl_ext_oneapi_bindless_images.asciidoc[sycl_ext_oneapi_bindless_images] +cannot be used in graph nodes. A synchronous exception will be thrown with error +code `invalid` if a user tries to add them to a graph + +Removing this restriction is something we may look at for future revisions of +`sycl_ext_oneapi_graph`. + == Examples [NOTE] @@ -1502,7 +1512,7 @@ associated with a buffer that was created using a host data pointer will outlive any executable graphs created from a modifiable graph which uses that buffer. -=== Host Tasks [[host-tasks]] +=== Host Tasks [[future-host-tasks]] A {host-task}[host task] is a native C++ callable, scheduled according to SYCL dependency rules. It is valid to record a host task as part of graph, though it @@ -1666,13 +1676,15 @@ The following features are not yet supported: . Using `handler::fill` in a graph node implemented for USM only. . Using `handler::memset` in a graph node. . Using `handler::prefetch` in a graph node. -. Using handler::memadvise in a graph node. +. Using `handler::memadvise` in a graph node. . Using specialization constants in a graph node. . Using reductions in a graph node. . Using sycl streams in a graph node. . Thread safety of new methods. -. Profiling an event returned from graph submission with `event::get_profiling_info()`. -. Subgraph can only be added as a node to any parent graph once, and will not correctly execute by itself after being added as a sub-graph. +. Profiling an event returned from graph submission with + `event::get_profiling_info()`. +. A sub-graph can only be added as a node to any parent graph once, and will not + correctly execute by itself after being added as a sub-graph. == Revision History @@ -1684,7 +1696,8 @@ The following features are not yet supported: |1|2023-03-23|Pablo Reble, Ewan Crawford, Ben Tracy, Julian Miller |Initial public working draft -|2|2023-08-01|Pablo Reble, Ewan Crawford, Ben Tracy, Julian Miller +|2|2023-08-01|Pablo Reble, Ewan Crawford, Ben Tracy, Julian Miller, +Maxime France-Pillois |Promote status to experimental |======================================== From 65f522c64bbb873b11760b80cf2c368d4112fb61 Mon Sep 17 00:00:00 2001 From: Ben Tracy Date: Wed, 2 Aug 2023 15:04:27 +0100 Subject: [PATCH 06/12] [SYCL][Graph] Provide context for future direction section (#285) - Move unimplemented features to top of section - Add some text explaining the contents of the section - Separate fully and partially developed features into subsections for readability --- .../sycl_ext_oneapi_graph.asciidoc | 125 ++++++++++-------- 1 file changed, 67 insertions(+), 58 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 46e0c88ef446d..27236498f331b 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -1393,64 +1393,16 @@ submitted in its entirety for execution via == Future Direction [[future-direction]] -=== Memory Allocation Nodes +This section contains both features of the specification which have been +fully developed, but are not yet implemented, as well as features which are +still in development. -There is no provided interface for users to define a USM allocation/free -operation belonging to the scope of the graph. It would be error prone and -non-performant to allocate or free memory as a node executed during graph -submission. Instead, such a memory allocation API needs to provide a way to -return a pointer which won't be valid until the allocation is made on graph -finalization, as allocating at finalization is the only way to benefit from -the known graph scope for optimal memory allocation, and even optimize to -eliminate some allocations entirely. - -Such a deferred allocation strategy presents challenges however, and as a result -we recommend instead that prior to graph construction users perform core SYCL -USM allocations to be used in the graph submission. Before to coming to this -recommendation we considered the following explicit graph building interfaces -for adding a memory allocation owned by the graph: +Fully developed features will be moved to the main specification once they +have been implemented. -1. Allocation function returning a reference to the raw pointer, i.e. `void*&`, - which will be instantiated on graph finalization with the location of the - allocated USM memory. - -2. Allocation function returning a handle to the allocation. Applications use - the handle in node command-group functions to access memory when allocated. - -3. Allocation function returning a pointer to a virtual allocation, only backed - with an actual allocation when graph is finalized or submitted. +=== Features Awaiting Implementation -Design 1) has the drawback of forcing users to keep the user pointer variable -alive so that the reference is valid, which is unintuitive and is likely to -result in bugs. - -Design 2) introduces a handle object which has the advantages of being a less -error prone way to provide the pointer to the deferred allocation. However, it -requires kernel changes and introduces an overhead above the raw pointers that -are the advantage of USM. - -Design 3) needs specific backend support for deferred allocation. - -=== Device Specific Graph - -A modifiable state `command_graph` contains nodes targeting specific devices, -rather than being a device agnostic representation only tied to devices on -finalization. This allows the implementation to process nodes which require -device information when the command group function is evaluated. For example, -a SYCL reduction implementation may desire the work-group/sub-group size, which -is normally gathered by the runtime from the device associated with the queue. - -This design also enables the future capability for a user to compose a graph -with nodes targeting different devices, allowing the benefits of defining an -execution graph ahead of submission to be extended to multi-device platforms. -Without this capability a user currently has to submit individual single-device -graphs and use events for dependencies, which is a usage model this extension is -aiming to optimize. Automatic load balancing of commands across devices is not a -problem this extension currently aims to solve, it is the responsibility of the -user to decide the device each command will be processed for, not the SYCL -runtime. - -=== Storage Lifetimes [[storage-lifetimes]] +==== Storage Lifetimes [[storage-lifetimes]] The lifetime of any buffer recorded as part of a submission to a command graph will be extended in keeping with the common reference @@ -1512,7 +1464,7 @@ associated with a buffer that was created using a host data pointer will outlive any executable graphs created from a modifiable graph which uses that buffer. -=== Host Tasks [[future-host-tasks]] +==== Host Tasks [[future-host-tasks]] A {host-task}[host task] is a native C++ callable, scheduled according to SYCL dependency rules. It is valid to record a host task as part of graph, though it @@ -1533,8 +1485,6 @@ auto node = graph.add([&](sycl::handler& cgh){ }); ---- -=== Graph Update - ==== Executable Graph Update A graph in the executable state can have each nodes inputs & outputs updated @@ -1612,6 +1562,65 @@ Exceptions: |=== +=== Features Still in Development + +==== Memory Allocation Nodes + +There is no provided interface for users to define a USM allocation/free +operation belonging to the scope of the graph. It would be error prone and +non-performant to allocate or free memory as a node executed during graph +submission. Instead, such a memory allocation API needs to provide a way to +return a pointer which won't be valid until the allocation is made on graph +finalization, as allocating at finalization is the only way to benefit from +the known graph scope for optimal memory allocation, and even optimize to +eliminate some allocations entirely. + +Such a deferred allocation strategy presents challenges however, and as a result +we recommend instead that prior to graph construction users perform core SYCL +USM allocations to be used in the graph submission. Before to coming to this +recommendation we considered the following explicit graph building interfaces +for adding a memory allocation owned by the graph: + +1. Allocation function returning a reference to the raw pointer, i.e. `void*&`, + which will be instantiated on graph finalization with the location of the + allocated USM memory. + +2. Allocation function returning a handle to the allocation. Applications use + the handle in node command-group functions to access memory when allocated. + +3. Allocation function returning a pointer to a virtual allocation, only backed + with an actual allocation when graph is finalized or submitted. + +Design 1) has the drawback of forcing users to keep the user pointer variable +alive so that the reference is valid, which is unintuitive and is likely to +result in bugs. + +Design 2) introduces a handle object which has the advantages of being a less +error prone way to provide the pointer to the deferred allocation. However, it +requires kernel changes and introduces an overhead above the raw pointers that +are the advantage of USM. + +Design 3) needs specific backend support for deferred allocation. + +==== Device Specific Graph + +A modifiable state `command_graph` contains nodes targeting specific devices, +rather than being a device agnostic representation only tied to devices on +finalization. This allows the implementation to process nodes which require +device information when the command group function is evaluated. For example, +a SYCL reduction implementation may desire the work-group/sub-group size, which +is normally gathered by the runtime from the device associated with the queue. + +This design also enables the future capability for a user to compose a graph +with nodes targeting different devices, allowing the benefits of defining an +execution graph ahead of submission to be extended to multi-device platforms. +Without this capability a user currently has to submit individual single-device +graphs and use events for dependencies, which is a usage model this extension is +aiming to optimize. Automatic load balancing of commands across devices is not a +problem this extension currently aims to solve, it is the responsibility of the +user to decide the device each command will be processed for, not the SYCL +runtime. + == Issues === Simultaneous Graph Submission From 525b331f9de9fdb43dc10e5ad25f910862b2cf4a Mon Sep 17 00:00:00 2001 From: Pablo Reble Date: Fri, 4 Aug 2023 05:09:59 -0500 Subject: [PATCH 07/12] [SYCL][Graph] Incorporating Greg's feedback (#286) Addressing https://github.com/intel/llvm/pull/10473#discussion_r1272795906 --- sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc | 1 + 1 file changed, 1 insertion(+) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 27236498f331b..6b8defbc0664f 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -9,6 +9,7 @@ :encoding: utf-8 :lang: en :sectnums: +:dpcpp: pass:[DPC++] :blank: pass:[ +] From b6c5c42c376fe0ca4c3a2605ed9c7a9a421cbf75 Mon Sep 17 00:00:00 2001 From: Ben Tracy Date: Wed, 9 Aug 2023 10:21:20 +0100 Subject: [PATCH 08/12] [SYCL][Graph] Add property for allowing host data created buffers (#289) * [SYCL][Graph] Add property for allowing host data created buffers - Add wording which restricts buffers created with host data in graphs - Add a property which disables those checks * [SYCL][Graph] Remove no_host_copy property * [SYCL][Graph] Add property for assuming buffer lfietimes outlive graph - Add new property for allowing buffers with a lifetime promise - Rework buffer limitations wording for this and general improvement * [SYCL][Graph] Addressing PR feedback - Add missing API modifications for properties - Add punctuation - Fix new property links * Apply suggestions from code review Co-authored-by: Pablo Reble * Link to "Buffer Limitations" section --------- Co-authored-by: Ewan Crawford Co-authored-by: Pablo Reble --- .../sycl_ext_oneapi_graph.asciidoc | 68 +++++++++++++++---- 1 file changed, 54 insertions(+), 14 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 6b8defbc0664f..9aede1db22b50 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -316,6 +316,16 @@ class no_cycle_check { no_cycle_check() = default; }; +class assume_buffer_outlives_graph { + public: + assume_buffer_outlives_graph() = default; +}; + +class assume_data_outlives_buffer { + public: + assume_data_outlives_buffer() = default; +}; + } // namespace graph namespace node { @@ -531,6 +541,27 @@ puts that `command_graph` into an undefined state. Any further operations performed on a `command_graph` in this state will result in undefined behavior. +===== Assume-Buffer-Outlives-Graph Property [[assume-buffer-outlives-graph-property]] + +The `property::graph::assume_buffer_outlives_graph` property disables +<> in a `command_graph` and +can be passed to a `command_graph` on construction via the property list +parameter. This property represents a promise from the user that any buffer +which is used in a graph will be kept alive on the host for the lifetime of the +graph. Destroying that buffer during the lifetime of a `command_graph` +constructed with this property results in undefined behavior. + +===== Assume-Data-Outlives-Buffer Property [[assume-data-outlives-buffer-property]] + +The `property::graph::assume_data_outlives_buffer` property disables +<> which have been created +with a host pointer in a `command_graph`, and can be passed to a `command_graph` +on construction via the property list parameter. This property represents a +promise from the user that any host data passed to a buffer's constructor will +outlive the buffer itself, and by extension any graph in which that buffer is +used. Deleting or otherwise modifying this host data during the lifetime of the +buffer or graph results in undefined behavior when using this property. + ==== Graph Member Functions Table {counter: tableNumber}. Constructor of the `command_graph` class. @@ -648,6 +679,11 @@ Exceptions: * Throws synchronously with error code `invalid` if a queue is recording commands to the graph. +* Throws synchronously with error code `invalid` if this command uses a buffer + which was created with a host data pointer. See the + <> + property for more information. + | [source,c++] ---- @@ -1065,6 +1101,23 @@ synchronously with error code `invalid`. ==== Buffer Limitations +The use of buffers inside a `command_graph` is restricted unless the user +creates the graph with the <> +property. Buffer lifetimes are not extended by a `command_graph` in which they +are used and so the user must ensure that their lifetimes exceed that of the +`command_graph`. Attempting to use a buffer in a `command_graph` without this +property will result in a synchronous error being throw with error code +`invalid`. + +There are also restrictions on using a buffer which has been created with a +host data pointer in commands recorded to a `command_graph`. Because of the +delayed execution of a `command_graph`, data may not be copied to the device +immediately when commands using these buffers are submitted to the graph, +therefore the host data must also outlive the graph to ensure correct behavior. +Users can pass the <> +property to the graph constructor to provide a promise that this will not occur +and that it is safe to use this buffer in the graph. + Because of the delayed execution of a recorded graph, it is not possible to support captured code which relies on the copy-back on destruction behavior of buffers. Typically, applications would rely on this behavior to do work on the host which @@ -1440,7 +1493,7 @@ of scope, or before the data has been copied to the device. The default behavior is to always copy the host data in a case like this, but this is not necessary if the user knows that the lifetime of the host data outlives the lifetime of the recorded graph. If the user knows this is the -case, they may use the `graph::no_host_copy` property to avoid the internal +case, they may use the `graph::assume_data_outlives_buffer` property to avoid the internal copy. Passing the property to `begin_recording()` will prevent host copies only for commands recorded before `end_recording()` is called for a given queue. Passing the property to the `command_graph` constructor will prevent host copies @@ -1452,19 +1505,6 @@ if all the commands accessing this buffer use `access_mode::write` or the Note, however, that these cases require the application to disable copy-back as described in <>. -===== No-Host-Copy Property - -The `no_host_copy` property is defined by this extension and can be passed to -either the `command_graph` constructor or the `command_graph::begin_recording()` -member function. This property will disable the host data copy that may -occur as detailed in the <> section of -this specification. - -Passing this property represents a promise from the user that host data -associated with a buffer that was created using a host data pointer will -outlive any executable graphs created from a modifiable graph which uses -that buffer. - ==== Host Tasks [[future-host-tasks]] A {host-task}[host task] is a native C++ callable, scheduled according to SYCL From 80656d5c6d4217636a27a621ba17109d8171e369 Mon Sep 17 00:00:00 2001 From: Ewan Crawford Date: Thu, 10 Aug 2023 12:44:08 +0100 Subject: [PATCH 09/12] [SYCL][Doc] Fix `graph_support_level` namespace in table (#297) We removed the `device` namespace from `graph_support_level` in https://github.com/reble/llvm/pull/255 However, I forgot to update this table entry. --- sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 9aede1db22b50..ce6f5f2340c8a 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -436,7 +436,7 @@ Table {counter: tableNumber}. Device Info Queries. | Device Descriptors | Return Type | Description |`info::device::graph_support` -|`info::device::graph_support_level` +|`info::graph_support_level` |When passed to `device::get_info<...>()`, the function returns `native` if there is an underlying SYCL backend command-buffer construct which is used to propagate the graph to the backend. If no backend construct exists, or From 4f4ac7b60cf7dd2a653aecc97f8b878d56c4120d Mon Sep 17 00:00:00 2001 From: Ewan Crawford Date: Mon, 14 Aug 2023 06:09:56 +0100 Subject: [PATCH 10/12] [SYCL] Update graphs spec unimplemented features list. (#296) Update the list of unimplemented features in the spec to reflect what we have fixes for (or in-progress) in the bugfix window. Also make it explicit that for unimplemented features we now throw an exception, rather than it being undefined behaviour. --- .../experimental/sycl_ext_oneapi_graph.asciidoc | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index ce6f5f2340c8a..f50498409d445 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -1068,9 +1068,10 @@ modifiable graph will perform this action, useful in RAII pattern usage. :host-task: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#subsec:interfaces.hosttasks -It is not yet supported to have a host task inside a `command_graph`. -Support will be added subsequently as detailed in the <> -part from the <> section of this specification. +It is not yet supported to have a host task inside a `command_graph`, and an +exception will be thrown if used by application code. Support will be added +subsequently as detailed in the <> part from the +<> section of this specification. === Queue Behavior In Recording Mode @@ -1721,7 +1722,9 @@ the finalize call either extending the basic command graph proposal or layered as a separate extension proposal. == Non-implemented features and known issues -The following features are not yet supported: + +The following features are not yet supported, and an exception will be thrown +if used in application code. . Using `handler::fill` in a graph node implemented for USM only. . Using `handler::memset` in a graph node. @@ -1730,11 +1733,9 @@ The following features are not yet supported: . Using specialization constants in a graph node. . Using reductions in a graph node. . Using sycl streams in a graph node. -. Thread safety of new methods. +. Using a kernel bundle in a graph node. . Profiling an event returned from graph submission with `event::get_profiling_info()`. -. A sub-graph can only be added as a node to any parent graph once, and will not - correctly execute by itself after being added as a sub-graph. == Revision History From aa502d50f1821aae4b73e5b2dc63af66a7be207e Mon Sep 17 00:00:00 2001 From: Ewan Crawford Date: Tue, 22 Aug 2023 11:28:58 +0100 Subject: [PATCH 11/12] [SYCL][Doc] Update spec example (#304) Update the buffer example in the spec to respect the buffer restrictions we currently have. * Use lifetime properties in constructor * Use `set_write_back(false)` on buffers used, and use host accessors to check the result. * Make buffer lifetimes exceed that of the graph object. Addresses Lukas feedback on upstream PR https://github.com/intel/llvm/pull/10473#issuecomment-1682160365 --- .../sycl_ext_oneapi_graph.asciidoc | 44 +++++++++++++------ 1 file changed, 30 insertions(+), 14 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index f50498409d445..296c27b825cb3 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -476,7 +476,7 @@ node. [source,c++] ---- -namespace sycl::ext::oneapi::experimental::property::node +namespace sycl::ext::oneapi::experimental::property::node { class depends_on { public: template @@ -1376,14 +1376,24 @@ submitted in its entirety for execution via [source, c++] ---- using namespace sycl; + namespace sycl_ext = sycl::ext::oneapi::experimental; + queue q{default_selector{}}; - // New object representing graph of command-groups - ext::oneapi::experimental::command_graph graph(q.get_context(), q.get_device()); + // Lifetime of buffers must exceed the lifetime of graphs they are used in. + buffer bufferA{dataA.data(), range<1>{elements}}; + bufferA.set_write_back(false); + buffer bufferB{dataB.data(), range<1>{elements}}; + bufferB.set_write_back(false); + buffer bufferC{dataC.data(), range<1>{elements}}; + bufferC.set_write_back(false); + { - buffer bufferA{dataA.data(), range<1>{elements}}; - buffer bufferB{dataB.data(), range<1>{elements}}; - buffer bufferC{dataC.data(), range<1>{elements}}; + // New object representing graph of command-groups + sycl_ext::command_graph graph(q.get_context(), q.get_device(), + {sycl_ext::property::graph::assume_buffer_outlives_graph{}, + sycl_ext::property::graph::assume_data_outlives_buffer{}}); + // `q` will be put in the recording state where commands are recorded to // `graph` rather than submitted for execution immediately. @@ -1433,17 +1443,23 @@ submitted in its entirety for execution via // queue `q` will be returned to the executing state where commands are // submitted immediately for extension. graph.end_recording(); - } - // Finalize the modifiable graph to create an executable graph that can be - // submitted for execution. - auto exec_graph = graph.finalize(); + // Finalize the modifiable graph to create an executable graph that can be + // submitted for execution. + auto exec_graph = graph.finalize(); - // Execute graph - q.submit([&](handler& cgh) { - cgh.ext_oneapi_graph(exec_graph); - }); + // Execute graph + q.submit([&](handler& cgh) { + cgh.ext_oneapi_graph(exec_graph); + }).wait(); + } + + // Check output using host accessors + host_accessor hostAccA(bufferA); + host_accessor hostAccB(bufferB); + host_accessor hostAccC(bufferC); + ... ---- == Future Direction [[future-direction]] From 25b036f40adc440a737030145d75deb180b56151 Mon Sep 17 00:00:00 2001 From: Ewan Crawford Date: Mon, 28 Aug 2023 11:44:11 +0100 Subject: [PATCH 12/12] [SYCL][Doc] Remove graph property for buffer host lifetime (#311) Addressed Greg's most recent feedback points: * [Don't need the assume_data_outlives_buffer property](https://github.com/intel/llvm/pull/10473#discussion_r1306080913) * [Move graph_support_level namespace](https://github.com/intel/llvm/pull/10473#discussion_r1306043382) --- .../sycl_ext_oneapi_graph.asciidoc | 48 ++++++------------- 1 file changed, 14 insertions(+), 34 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc index 296c27b825cb3..bb4402881166d 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc @@ -307,6 +307,12 @@ enum class queue_state { recording }; +enum class graph_support_level { + unsupported, + native, + emulated +}; + namespace property { namespace graph { @@ -320,12 +326,6 @@ class assume_buffer_outlives_graph { public: assume_buffer_outlives_graph() = default; }; - -class assume_data_outlives_buffer { - public: - assume_data_outlives_buffer() = default; -}; - } // namespace graph namespace node { @@ -346,12 +346,6 @@ namespace device { struct graphs_support; } // namespace device - -enum class graph_support_level { - unsupported, - native, - emulated -}; } // namespace info class node {}; @@ -426,17 +420,17 @@ public: === Device Info Query Due to the experimental nature of the extension, support is not available across -all devices. The following device support query is added to report devices which +all devices. The following device support query is added to the +`sycl::ext::oneapi::experimental` namespace for reporting devices which are are currently supported, and how that support is implemented. - Table {counter: tableNumber}. Device Info Queries. [%header] |=== | Device Descriptors | Return Type | Description |`info::device::graph_support` -|`info::graph_support_level` +|`graph_support_level` |When passed to `device::get_info<...>()`, the function returns `native` if there is an underlying SYCL backend command-buffer construct which is used to propagate the graph to the backend. If no backend construct exists, or @@ -551,17 +545,6 @@ which is used in a graph will be kept alive on the host for the lifetime of the graph. Destroying that buffer during the lifetime of a `command_graph` constructed with this property results in undefined behavior. -===== Assume-Data-Outlives-Buffer Property [[assume-data-outlives-buffer-property]] - -The `property::graph::assume_data_outlives_buffer` property disables -<> which have been created -with a host pointer in a `command_graph`, and can be passed to a `command_graph` -on construction via the property list parameter. This property represents a -promise from the user that any host data passed to a buffer's constructor will -outlive the buffer itself, and by extension any graph in which that buffer is -used. Deleting or otherwise modifying this host data during the lifetime of the -buffer or graph results in undefined behavior when using this property. - ==== Graph Member Functions Table {counter: tableNumber}. Constructor of the `command_graph` class. @@ -679,9 +662,10 @@ Exceptions: * Throws synchronously with error code `invalid` if a queue is recording commands to the graph. -* Throws synchronously with error code `invalid` if this command uses a buffer - which was created with a host data pointer. See the - <> +* Throws synchronously with error code `invalid` if the graph wasn't created with + the `property::graph::assume_buffer_outlives_graph` property and this command + uses a buffer. See the + <> property for more information. | @@ -1115,9 +1099,6 @@ host data pointer in commands recorded to a `command_graph`. Because of the delayed execution of a `command_graph`, data may not be copied to the device immediately when commands using these buffers are submitted to the graph, therefore the host data must also outlive the graph to ensure correct behavior. -Users can pass the <> -property to the graph constructor to provide a promise that this will not occur -and that it is safe to use this buffer in the graph. Because of the delayed execution of a recorded graph, it is not possible to support captured code which relies on the copy-back on destruction behavior of buffers. @@ -1391,8 +1372,7 @@ submitted in its entirety for execution via { // New object representing graph of command-groups sycl_ext::command_graph graph(q.get_context(), q.get_device(), - {sycl_ext::property::graph::assume_buffer_outlives_graph{}, - sycl_ext::property::graph::assume_data_outlives_buffer{}}); + {sycl_ext::property::graph::assume_buffer_outlives_graph{}}); // `q` will be put in the recording state where commands are recorded to