diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_forward_progress.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_forward_progress.asciidoc new file mode 100644 index 0000000000000..4c1df1640d021 --- /dev/null +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_forward_progress.asciidoc @@ -0,0 +1,503 @@ += sycl_ext_oneapi_forward_progress + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +[%hardbreaks] +Copyright (C) 2022-2023 Intel Corporation. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by +permission by Khronos. + + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + + +== Dependencies + +This extension is written against the SYCL 2020 revision 6 specification. All +references below to the "core SYCL specification" or to section numbers in the +SYCL specification refer to that revision. + +This extension also depends on the following other SYCL extensions: + +* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ + sycl_ext_oneapi_kernel_properties] +* link:../proposed/sycl_ext_oneapi_launch_queries.asciidoc[ + sycl_ext_oneapi_launch_queries] + +[NOTE] +==== +This extension uses the concept of a "root group" to represent the set of all +work-items in a kernel, in keeping with the +link:../proposed/sycl_ext_oneapi_root_group.asciidoc[sycl_ext_oneapi_root_group] +extension. However, there is no dependency on functionality introduced by that +extension. +==== + + +== Status + +This is a proposed extension specification, intended to gather community +feedback. Interfaces defined in this specification may not be implemented yet +or may be in a preliminary state. The specification itself may also change in +incompatible ways before it is finalized. *Shipping software products should +not rely on APIs defined in this specification.* + + +== Overview + +SYCL does not define any way to query the hardware and software mapping +selected by an implementation, and thus no way for application developers to +reason about forward progress guarantees. Developers must therefore choose +between writing highly performant code that makes unsafe assumptions and may +fail when moved to other implementations/backends/devices, or writing highly +portable code that does not make such assumptions but fails to take full +advantage of hardware capabilities. + +This extension strengthens the forward progress guarantees provided by a SYCL +implementation, and exposes the guarantees of particular +implementations/backends/devices via new device queries and kernel properties. + +Note that when the phrases below appear in this extension, their meaning is +identical to their meaning in the {cpp} core language specification: + +- _Makes progress_ +- _Thread of execution_ +- _Concurrent forward progress guarantees_ +- _Parallel forward progress guarantees_ +- _Weakly parallel forward progress guarantees_ +- _Block with forward progress guarantee delegation_ + + +== Specification + +=== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. An implementation supporting this extension must predefine the +macro `SYCL_EXT_ONEAPI_FORWARD_PROGRESS` to one of the values defined in the +table below. Applications can test for the existence of this macro to +determine if the implementation supports this feature, or applications can test +the macro's value to determine which of the extension's features the +implementation supports. + +[%header,cols="1,5"] +|=== +|Value +|Description + +|1 +|The APIs of this experimental extension are not versioned, so the + feature-test macro always has this value. +|=== + + +=== Execution model + +The work-items in a SYCL ND-range kernel are arranged into a hierarchy of +groups and associated scopes. Work-items can query their position within each +group, and can coordinate with other members of the same group using group +functions and group algorithms. + +The execution model for work-items in an ND-range kernel behaves as if the +work-items are invoked in the following hypothetical implementation: + +- Execution of the entire ND-range kernel starts with a single root-group +thread of execution. +- The root-group thread of execution logically spawns one additional thread of +execution for each work-group, and blocks with forward progress guarantee +delegation upon those threads of execution. +- Each of the work-group threads of execution logically spawns one additional +thread of execution for each sub-group, and then the work-group thread of +execution blocks with forward progress guarantee delegation upon +those sub-group threads of execution. +- Each of the sub-group threads of execution logically spawns an additional +thread of execution for each work-item, and then the sub-group thread of +execution blocks with forward progress guarantee delegation upon those +work-item threads of execution. +- Finally, each work-item thread of execution executes one instance of the +kernel function. + +```mermaid +graph TD; + subgraph root-group + root(Thread); + end + subgraph work-groups + root-->wg1(Thread); + root-->wg2(...); + end + subgraph sub-groups + wg1-->sg1(Thread); + wg1-->sg2(...); + end + subgraph work-items + sg1-->wi1(Thread); + sg1-->wi2(Thread); + sg1-->wi3(Thread); + sg1-->wi4(Thread); + sg1-->wi5(Thread); + sg1-->wi6(Thread); + sg1-->wi7(Thread); + sg1-->wi8(Thread); + end +``` + +[NOTE] +==== +Blocking with forward progress guarantee delegation ensures that the forward +progress guarantees provided by at least one thread of execution in a group +are at least as strong as the forward progress guarantees provided by that +group. However, it is unspecified which thread(s) of execution will be +strengthened and for how many steps. +==== + +Safe coordination of work-items via memory requires careful attention to this +hierarchy, since the ability of work-items to make progress depends on a +combination of the forward progress guarantees provided by the work-item +threads of execution and the forward progress guarantees provided by the +sub-group, work-group and root-group threads of execution. Often, correct +execution of a kernel requires knowledge of the forward progress guarantees +of a certain subset of these threads of execution. For example, kernels using +atomic operations to coordinate sub-groups in the same work-group only require +the threads of execution associated with sub-groups to provide concurrent or +parallel forward progress guarantees---there is no requirement for the +work-item, work-group or root-group threads of execution to provide any +specific forward progress guarantees. + +To facilitate reasoning about work-item coordination, the forward progress +guarantees of threads of execution in SYCL can be qualified by a _coordination +scope_. This extension defines the following coordination scopes: + +- All threads of execution spawned directly or indirectly from the ND-range + kernel's initial root thread of execution are considered part of the same + root-group scope. +- All threads of execution spawned directly or indirectly from the same + work-group thread of execution are considered part of the same work-group + scope. +- All threads of execution spawned from the same sub-group thread of execution + are considered part of the same sub-group scope. +- Each individual work-item thread of execution is the sole member of its own + work-item scope. + +The forward progress guarantee of a thread of execution at scope `S`, relative +to coordination scope `C`, is the weakest of the forward progress guarantees +provided by threads of execution at all scopes between `S` and `C` inclusive. + +[NOTE] +==== +Qualifying a thread of execution's forward progress guarantees with a coordination +scope can be thought of as a shorthand for combining the forward progress +guarantees provided by each thread of execution in the hierarchy. + +For example, consider a hypothetical implementation that creates a separate +`std::thread` (providing concurrent forward progress guarantees) to execute +each sub-group, but which permits an unbounded number of work-groups executed +as tasks (providing parallel forward progress guarantees). With work-group +coordination scope, each sub-group provides concurrent forward progress +guarantees: this reflects the guarantees associated with each `std::thread` +upon creation. With root-group coordination scope, each sub-group provides +only parallel forward progress guarantees: each sub-group's progress guarantees +are weakened to reflect that the creation of a sub-group depends upon a +specific work-group (task) being scheduled and making progress. +==== + +[NOTE] +==== +Choosing a coordination scope is similar to choosing a memory scope for atomic +operations, in that it describes the potential set of threads of execution +that a thread _may_ attempt to coordinate with, and must be chosen to reflect +a kernel's dynamic communication pattern rather than static properties of its +source code. + +For example, consider a kernel using locks/mutexes. Attempting to acquire a +lock is a blocking operation, and so if two threads of execution providing +weakly parallel forward progress guarantees attempt to acquire the lock +simultaneously this may result in a deadlock. To correctly avoid a deadlock, +the developer must correctly identify the coordination scope: if any work-item +may try to acquire the lock, then work-items require stronger forward progress +guarantees at root-group coordination scope; if each work-group has its own +private data structure(s), and only work-items within the same work-group may +try to acquire the same lock, then work-items require stronger forward progress +guarantees only at work-group coordination scope. + +For performance reasons, a coordination scope will typically be the narrowest +scope containing all threads that may coordinate with one another. +==== + +=== Forward progress guarantees + +The `forward_progress_guarantee` `enum` is used to represent the three classes +of forward progress guarantee defined in the {cpp} core language specification. + +[source,c++] +---- +namespace ext::oneapi::experimental::sycl { + +enum class forward_progress_guarantee { + concurrent, + parallel, + weakly_parallel +}; + +} +---- + +[NOTE] +==== +This `enum` is defined similarly to the one defined in +https://wg21.link/p2300[P2300: `std::execution`]. +==== + + +=== Execution scopes + +The `execution_scope` `enum` is used to represent the hierarchy of threads of +execution within a SYCL implementation. + +[source,c++] +---- +namespace ext::oneapi::experimental::sycl { + +enum class execution_scope { + work_item, + sub_group, + work_group, + root_group, +}; + +} +---- + +[NOTE] +==== +The existing `memory_scope` `enum` is not used here to provide greater +flexibility in modifying the forward progress extension in future. +`memory_scope` is lacking values required by this extension (i.e. `root_group`) +and has additional values that are not required here (e.g. `system`). +==== + + +=== Device queries + +New device queries are introduced to allow developers to reason about the +progress guarantees that can be provided by each device. + +The results of these queries represent the forward progress guarantees that a +device can satisfy for each execution scope, and therefore the forward progress +guarantees that can be requested by a kernel's <>. These queries (and their results) must be interpreted relative to +a coordination scope, as described previously. + +[NOTE] +==== +Note that these queries cannot be used to reason about the thread of execution +associated with the root-group, because it is currently the top of the +hierarchy and it is therefore not possible to specify a broader coordination +scope. +==== + +[NOTE] +==== +Explicitly specifying the coordination scope ensures that the interpretation of +forward progress queries is fixed and independent of the number of execution +scopes in the hierarchy. This extension therefore does not prevent SYCL or +other extensions from introducing new execution scopes. +==== + + +[%header,cols="1,5,5"] +|=== +|Device Descriptor +|Return Type +|Description + +|`template + ext::oneapi::info::device::work_group_progress_capabilities` +|`std::vector` +|Return the set of forward progress guarantees that can be requested by + the `work_group_progress` kernel property, for the specified value of + `CoordinationScope`. Specializations of + `ext_oneapi_work_group_progress_capabilities` are only available for + `execution_scope::root_group`. + +|`template + ext::oneapi::info::device::sub_group_progress_capabilities` +|`std::vector` +|Return the set of forward progress guarantees that can be requested by + the `sub_group_progress` kernel property, for the specified value of + `CoordinationScope`. Specializations of + `ext_oneapi_sub_group_progress_capabilities` are only available for + `execution_scope::root_group` and `execution_scope::work_group`. + +|`template + ext::oneapi::info::device::work_item_progress_capabilities` +|`std::vector` +|Return the set of forward progress guarantees that can be requested by + the `work_item_progress` kernel property, for the specified value of + `CoordinationScope`. Specializations of + `ext_oneapi_work_item_progress_capabilities` are only available for + `execution_scope::root_group`, `execution_scope::work_group`, and + `execution_scope::sub_group`. + +|=== + + +=== Kernel properties [[properties]] + +New kernel properties are introduced to allow developers to declare that a +given kernel requires specific forward progress guarantees for correctness. +If a kernel is submitted to a device that cannot satisfy the request for +specific progress guarantees, the implementation must throw an `exception` +with the `errc::feature_not_supported` error code. + +Each property must appear in a property list at most once, and it is therefore +not possible to specify different guarantees and/or coordination scopes for the +same execution scope directly. However, it remains possible for requests with +different execution scopes to have overlapping coordination scopes +(e.g. a request for work-items to provide concurrent forward progress +guarantees at root-group coordination scope implicitly requests for sub-groups +and work-groups to also provide concurrent forward progress guarantees). In +such a case, an implementation must satisfy the strongest request(s). + +Devices may not be able to provide the requested forward progress guarantees +for all launch configurations. The <> defined in a +later section allow developers to identify valid launch configurations for +specific combinations of properties. + +[NOTE] +==== +The mechanism used to provide specific forward progress guarantees is +implementation-defined. +==== + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { + +struct work_group_progress_key { + template + using value_t = property_value, std::integral_constant>; +}; + +struct sub_group_progress_key { + template + using value_t = property_value, std::integral_constant>; +}; + +struct work_item_progress_key { + template + using value_t = property_value, std::integral_constant>; +}; + +template +inline constexpr work_group_progress_key::value_t work_group_progress; + +template +inline constexpr sub_group_progress_key::value_t sub_group_progress; + +template +inline constexpr work_item_progress_key::value_t work_item_progress; + +template <> struct is_property_key : std::true_type {}; +template <> struct is_property_key : std::true_type {}; +template <> struct is_property_key : std::true_type {}; + +} +---- + +|=== +|Property|Description + +|`work_group_progress` +|The `work_group_progress` property adds the requirement that the kernel must be + launched with the specified forward progress guarantees for threads of execution + created at `execution_scope::work_group` scope, with the specified + coordination scope. `CoordinationScope` must be broader than + `execution_scope::work_group`. + +|`sub_group_progress` +|The `sub_group_progress` property adds the requirement that the kernel must be + launched with the specified forward progress guarantees for threads of execution + created at `execution_scope::sub_group` scope, with the specified coordination + scope. `CoordinationScope` must be broader than `execution_scope::sub_group`. + +|`work_item_progress` +|The `work_item_progress` property adds the requirement that the kernel must be + launched with the specified forward progress guarantees for threads of execution + created at `execution_scope::work_item` scope, with the specified coordination + scope. `CoordinationScope` must be broader than `execution_scope::work_item`. + +|=== + + +== Implementation notes + +This non-normative section provides information about one possible +implementation of this extension. It is not part of the specification of the +extension's API. + +The simplest valid implementation of this extension returns `weakly_parallel` +for all queries, and throws an exception upon submission for any kernel +requesting stronger guarantees. Such an implementation is compatible with +any backend currently capable of supporting SYCL 2020. + +Exposing stronger guarantees requires an understanding of both a device's +capabilities and the functionality exposed by specific backends. For example, +it may only be possible to provide `concurrent` forward progress guarantees +for backends that submit kernels eagerly and which support some notion of a +"cooperative" kernel launch. + +The table below shows the expected mapping for CPUs and GPUs using the current +OpenCL and Level Zero backends, respectively. + +[%header,cols="1,5,5,5"] +|=== +|Scope +|Coordination Scope +|CPU & OpenCL +|GPU & Level Zero + +|`work_group` +|`root_group` +|`parallel` +|`concurrent` + +|`sub_group` +|`work_group` +|`weakly_parallel` +|`concurrent` + +|`work_item` +|`sub_group` +|`weakly_parallel` +|`weakly_parallel` + +|=== + +== Issues + +None. + diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_launch_queries.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_launch_queries.asciidoc new file mode 100644 index 0000000000000..a1b9e670c19b6 --- /dev/null +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_launch_queries.asciidoc @@ -0,0 +1,259 @@ += sycl_ext_oneapi_launch_queries + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +[%hardbreaks] +Copyright (C) 2022-2023 Intel Corporation. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by +permission by Khronos. + + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + + +== Dependencies + +This extension is written against the SYCL 2020 revision 6 specification. All +references below to the "core SYCL specification" or to section numbers in the +SYCL specification refer to that revision. + +This extension also depends on the following other SYCL extensions: + +* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ + sycl_ext_oneapi_kernel_properties] + + +== Status + +This is a proposed extension specification, intended to gather community +feedback. Interfaces defined in this specification may not be implemented yet +or may be in a preliminary state. The specification itself may also change in +incompatible ways before it is finalized. *Shipping software products should +not rely on APIs defined in this specification.* + + +== Overview + +This extension introduces a new mechanism for querying any limitations that a +developer must respect when launching a specific kernel on a specific queue. +Such limitations may exist when a kernel is decorated with one or more +properties that require an implementation to enable additional features +(such as providing certain forward progress guarantees, or enabling +cross-work-group synchronization routines), or when a kernel uses certain +features (such as static work-group local memory or group algorithms). + + +== Specification + +=== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. An implementation supporting this extension must predefine the +macro `SYCL_EXT_ONEAPI_LAUNCH_QUERIES` to one of the values defined in the +table below. Applications can test for the existence of this macro to +determine if the implementation supports this feature, or applications can test +the macro's value to determine which of the extension's features the +implementation supports. + +[%header,cols="1,5"] +|=== +|Value +|Description + +|1 +|The APIs of this experimental extension are not versioned, so the + feature-test macro always has this value. +|=== + + +=== Launch queries + +An implementation's ability to satisify requests for specific behaviors +(such as strong forward progress guarantees and/or cross-work-group +synchronization) for a kernel may be dependent upon: the kernel itself, and the +features that it uses; the queue (and underlying device) to which the kernel is +submitted; and the kernel's launch configuration. + +It is a user's responsibility to ensure that a kernel requesting specific +behaviors uses a compatible launch configuration, using the +`ext_oneapi_get_info` function and descriptors from the `info::kernel` +namespace. + +If the `sycl::nd_range` parameter used to launch a kernel is incompatible with +the results of a kernel's launch queries, an implementation must throw a +synchronous exception with the `errc::nd_range` error code. + +[NOTE] +==== +The values returned by `ext_oneapi_get_info` account for all properties +attached to a kernel (via the mechanisms defined in the +sycl_ext_oneapi_kernel_properties extension), as well as the usage of features +like group algorithms and work-group local memory. Developers should assume +that the values will differ across kernels. +==== + +[source,c++] +---- +namespace sycl { + +class kernel { + public: + template + /*return-type*/ ext_oneapi_get_info(T... args) const; +}; + +} +---- + +[source,c++] +---- +template +/*return-type*/ ext_oneapi_get_info(T... args) const; +---- +_Constraints_: Available only when the types `+T...+` described by the parameter +pack match the types defined in the table below. + +_Preconditions_: `Param` must be one of the `info::kernel` descriptors defined +in this extension. + +_Returns_: Information about the kernel that applies when the kernel is +submitted with the configuration described by the parameter pack `+T...+`. +The return type is defined in the table below. + +This extension adds several new queries to this interface, many of which +already have equivalents in the `kernel_device_specific` or `device` +namespaces. + +NOTE: These queries are queue- and not device-specific because it is +anticipated that implementations will introduce finer-grained queue +controls that impact the scheduling of kernels. + +NOTE: Allowing devices to return a value of 1 for these queries maximizes the +chances that code written to use certain extension remains portable. However, +the performance of kernels using only one work-group, sub-group or work-item +may be limited on some (highly parallel) devices. If certain properties (e.g. +forward progress guarantees, cross-work-group synchronization) are being used +as part of a performance optimization, developers should check that the values +returned by these queries is not 1. + +[source, c++] +---- +namespace ext::oneapi::experimental::info::kernel { + +template +struct max_work_item_sizes; + +struct max_work_group_size; +struct max_num_work_groups; + +} +---- + +[%header,cols="1,5,5,5"] +|=== +|Kernel Descriptor +|Argument Types +|Return Type +|Description + +|`template + max_work_item_sizes` +|`sycl::queue` +|`id` +|Returns the maximum number of work-items that are permitted in each dimension + of a work-group, when the kernel is submitted to the specified queue, + accounting for any kernel properties or features. If the kernel can be + submitted to the specified queue without an error, the minimum value returned + by this query is 1, otherwise it is 0. + +|`max_work_group_size` +|`sycl::queue` +|`size_t` +|Returns the maximum number of work-items that are permitted in a work-group, +when the kernel is submitted to the specified queue, accounting for any +kernel properties or features. If the kernel can be submitted to the specified +queue without an error, the minimum value returned by this query is 1, +otherwise it is 0. + +|`max_num_work_groups` +|`sycl::queue`, `sycl::range`, `size_t` +|`size_t` +|Returns the maximum number of work-groups, when the kernel is submitted to the +specified queue with the specified work-group size and the specified amount of +dynamic work-group local memory (in bytes), accounting for any kernel +properties or features. If the kernel can be submitted to the specified queue +without an error, the minimum value returned by this query is 1, otherwise it +is 0. + +|=== + +A separate set of launch queries can be used to reason about how an +implementation will launch a kernel on the specified queue. The values of these +queries should also be checked if a kernel is expected to be launched in a +specific way (e.g., if the kernel requires two sub-groups for correctness). + +[source, c++] +---- +namespace ext::oneapi::experimental::info::kernel { + +struct max_sub_group_size; +struct num_sub_groups; + +} +---- + +[%header,cols="1,5,5,5"] +|=== +|Kernel Descriptor +|Argument Types +|Return Type +|Description + +|`max_sub_group_size` +|`sycl::queue`, `sycl::range` +|`uint32_t` +|Returns the maximum sub-group size, when the kernel is submitted to the +specified queue with the specified work-group size, accounting for any kernel +properties or features. The return value of this query must match the value +returned by `sub_group::get_max_local_range()` inside the kernel. If the kernel +can be submitted to the specified queue without an error, the minimum value +returned by this query is 1, otherwise it is 0. + +|`num_sub_groups` +|`sycl::queue`, `sycl::range` +|`uint32_t` +|Returns the number of sub-groups per work-group, when the kernel is submitted +to the specified queue with the specified work-group size, accounting for any +kernel properties or features. If the kernel can be submitted to the specified +queue without an error, the minimum value returned by this query is 1, +otherwise it is 0. + +|=== + +== Issues + +None. + diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_root_group.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_root_group.asciidoc index b2dfa639b3f75..73e1e5c4078d3 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_root_group.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_root_group.asciidoc @@ -20,7 +20,7 @@ == Notice [%hardbreaks] -Copyright (C) 2022-2022 Intel Corporation. All rights reserved. +Copyright (C) 2022-2023 Intel Corporation. All rights reserved. Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by @@ -44,8 +44,10 @@ This extension also depends on the following other SYCL extensions: * link:../experimental/sycl_ext_oneapi_properties.asciidoc[ sycl_ext_oneapi_properties] -* link:../proposed/sycl_ext_oneapi_kernel_properties.asciidoc[ +* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ sycl_ext_oneapi_kernel_properties] +* link:../proposed/sycl_ext_oneapi_launch_queries.asciidoc[ + sycl_ext_oneapi_launch_queries] == Status @@ -143,72 +145,6 @@ supports. feature-test macro always has this value. |=== -=== Queries - -A `sycl::kernel` can be queried for queue-specific information using -the `ext_oneapi_get_info` function and descriptors from the -`info::kernel_queue_specific` namespace. - -[source,c++] ----- -namespace sycl { - -class kernel { - public: - template - typename Param::return_type ext_oneapi_get_info(const queue &q) const; -}; - -namespace ext { -namespace oneapi { -namespace experimental { -namespace info { -namespace kernel_queue_specific { - -struct max_num_work_group_sync; - -} // namespace kernel_queue_specific -} // namespace info -} // namespace experimental -} // namespace oneapi -} // namespace ext -} // namespace sycl ----- - -NOTE: These queries are queue- and not device-specific because it is -anticipated that implementations will introduce finer-grained queue -controls that impact the scheduling of kernels using the root-group. - -[source,c++] ----- -template -typename Param::return_type ext_oneapi_get_info(const queue &q) const; ----- -_Preconditions_: `Param` must be one of the `info::kernel_queue_specific` -descriptors defined in this extension, and the type alias `Param::return_type` -must be defined in accordance with the table below. - -_Returns_: Information about the kernel that applies when the kernel is -submitted to the queue specified by _q_. - -[%header,cols="1,5,5"] -|=== -|Kernel Descriptor -|Return Type -|Description - -|`kernel_queue_specific::max_num_work_group_sync` -|`size_t` -|Return the maximum number of work-groups that can be synchronized by passing -a `root_group` to group functions and algorithms. Must be at least 1. -|=== - -NOTE: Requiring all devices to return a value of at least 1 ensures that code -written to use this extension remains portable. However, the performance of -kernels using only one work-group may be limited on some (highly parallel) -devices. If root-group synchronization is being used as part of a performance -optimization, developers should check that the value returned by this query -is not 1. === Kernel properties @@ -237,9 +173,9 @@ inline constexpr use_root_sync_key::value_t use_root_sync; |The `use_root_sync` property adds the requirement that the kernel must be launched in a manner that is compatible with using a root-group in group functions and algorithms. If the `sycl::nd_range` parameter used to launch the - kernel does not meet the requirements described in the Queries section of this - extension, an implementation must throw a synchronous exception with the - `errc::nd_range` error code. + kernel is incompatible with the results of the launch queries described in the + sycl_ext_oneapi_launch_queries extension, an implementation must throw a + synchronous exception with the `errc::nd_range` error code. |===