Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions sycl/doc/extensions/GroupAlgorithms/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# SYCL_INTEL_group_algorithms

A library of group functions, including common parallel algorithms such as reductions and scans.
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
= SYCL_INTEL_group_algorithms
:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en

:blank: pass:[ +]

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}

== Introduction
IMPORTANT: This specification is a draft.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos.

NOTE: This document is better viewed when rendered as html with asciidoctor. GitHub does not render image icons.

This document describes an extension which introduces a library of group functions, including common parallel algorithms such as reductions and scans.

== Name Strings

+SYCL_INTEL_group_algorithms+

== Notice

Copyright (c) 2020 Intel Corporation. All rights reserved.

== Status

Working Draft

This is a preview extension specification, intended to provide early access to a feature for review and community feedback. When the feature matures, this specification may be released as a formal extension.

Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products.

== Version

Built On: {docdate} +
Revision: 1

== Contact
John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com)

== Dependencies

This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6.

== Overview

This extension introduces a library of group algorithms, providing a simple way for developers to apply common parallel algorithms across data held by different work-items in the same group, or to use the work-items of a group to apply common parallel algorithms across data stored in memory.

The extension introduces the following functions:

- +any_of+
- +all_of+
- +none_of+
- +broadcast+
- +reduce+
- +exclusive_scan+
- +inclusive_scan+

=== Alignment with OpenCL vs C++

Where a feature is common to both OpenCL and {cpp}, this proposal opts for {cpp}-like naming:

- Collective operators are named as in +<functional>+ (e.g. +plus+ instead of +sum+) and to avoid clashes with names in +<algorithm>+ (e.g. +minimum+ instead of +min+).
- Algorithms are named as in +<algorithm>+ (e.g. +inclusive_scan+ instead of +scan_inclusive+).

=== Towards a Generic Group Interface

This extension adds a number of static members to the +group+ class to simplify the interfaces of the functions in this library:

- +id_type+: The type used to represent work-item IDs within the group
- +range_type+: The type used to represent the range of the group
- +linear_id_type+: The type used to represent linear work-item IDs within the group
- +dimensions+: An integral value representing the dimensionality of the group

[source, c++]
----
template <int Dimensions = 1>
class group
{
public:
using id_type = id<Dimensions>;
using range_type = range<Dimensions>;
using linear_id_type = size_t;
static constexpr int dimensions = Dimensions;
};
----

=== Data Types

All functions are supported for the fundamental scalar types supported by SYCL and instances of the SYCL +vec+ class. The fundamental scalar types (as defined in Section 6.5 of the SYCL 1.2.1 specification) are: +bool+, +char+, +signed char+, +unsigned char+, +short int+, +unsigned short int+, +int+, +unsigned int+, +long int+, +unsigned long int+, +long long int+, +unsigned long long int+, +size_t+, +float+, +double+, +half+.

Functions with arguments of type +vec<T,N>+ are applied component-wise: they are semantically equivalent to N calls to a scalar function of type +T+.

=== Function Objects

A number of function objects are provided in the +cl::sycl::intel+ namespace. These function objects are used for all interfaces requiring an operator to be specified. All function objects obey C++ conversion and promotion rules.

Transparent function objects are provided if using a {cpp}14 compiler -- the parameter types and return type for transparent function objects will be deduced if +T+ is not specified.

The following function objects alias objects in the +<functional>+ header from the {cpp} standard library:

- +cl::sycl::intel::plus+
- +cl::sycl::intel::multiplies+
- +cl::sycl::intel::bit_and+
- +cl::sycl::intel::bit_or+
- +cl::sycl::intel::bit_xor+
- +cl::sycl::intel::logical_and+
- +cl::sycl::intel::logical_or+

New function objects without {cpp} standard library equivalents are defined in the table below:

|===
|Function Object|Description

|+template <typename T=void> struct minimum;+
|+T operator(const T&, const T&) const+ applies +std::less+ to its arguments, in the same order, then returns the lesser argument unchanged.

|+template <typename T=void> struct maximum;+
|+T operator(const T&, const T&) const+ applies +std::greater+ to its arguments, in the same order, then returns the greater argument unchanged.
|===

Function objects supported by the group algorithms library can be identified using the +cl::sycl::intel::is_native_function_object+ and +cl::sycl::intel::is_native_function_object_v+ traits classes.

=== Functions

The group algorithms library is based on the algorithms library described in Section 28 of the {cpp}17 standard. The syntax and restrictions are aligned, with two notable differences: the first argument to each function is a group of work-items, in place of an execution policy; and pointers are accepted in place of iterators in order to guarantee that address space information is visible to the compiler.

Group algorithms are performed collaboratively by the work-items in a group. All functions therefore act as synchronization points and must be encountered in converged control flow by all work-items in the group -- if one work-item in the group reaches the function, then all work-items in the group must reach the function. Additionally, restrictions may be placed on the arguments passed to each function in order to ensure that all work-items in the group agree on the operation that is being performed. Any such restrictions on the arguments passed to a function are defined within the descriptions of those functions.

Many functions provide at least two overloads: one operating directly on data produced by the work-items in the group, and another operating on a range of data in memory specified by a pair of pointers. If the pointers passed to such a group function are not the same for all work-items in the group, their behavior is undefined. How the elements of a range are processed by the work-items in a group is undefined.

Using functions from the group algorithms library inside of a kernel may introduce additional limits on the resources available to user code inside the same kernel (e.g. private memory, work-group local memory). The behavior of these limits is implementation-defined, but must be reflected by calls to kernel querying functions such as +kernel::get_work_group_info+.

It is undefined behavior for any of these functions to be invoked within a +parallel_for_work_group+ or +parallel_for_work_item+ context, but this restriction may be lifted in a future version of the proposal.

==== Vote

|===
|Function|Description

|+template <typename Group> bool any_of(Group g, bool pred);+
|Return +true+ if _pred_ is +true+ for any work-item in the group.

|+template <typename Group> bool all_of(Group g, bool pred);+
|Return +true+ if _pred_ is +true+ for all work-items in the group.

|+template <typename Group> bool none_of(Group g, bool pred);+
|Return +true+ if _pred_ is +true+ for no work-items in the group.

|+template <typename Group, typename T, class Predicate> bool any_of(Group g, T x, Predicate pred);+
|Return +true+ if _pred(x)_ is +true+ for any work-item in the group. _pred_ must be the same for all work-items in the group.

|+template <typename Group, typename T, class Predicate> bool all_of(Group g, T x, Predicate pred);+
|Return +true+ if _pred(x)_ is +true+ for all work-items in the group. _pred_ must be the same for all work-items in the group.

|+template <typename Group, typename T, class Predicate> bool none_of(Group g, T x, Predicate pred);+
|Return +true+ if _pred(x)_ is +true+ for no work-items in the group. _pred_ must be the same for all work-items in the group.
|===

|===
|Function|Description

|+template <typename Group, typename Ptr, class Predicate> bool any_of(Group g, Ptr first, Ptr last, Predicate pred);+
|Return +true+ if _pred_ returns +true+ for any element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group.

|+template <typename Group, typename Ptr, class Predicate> bool all_of(Group g, Ptr first, Ptr last, Predicate pred);+
|Return +true+ if _pred_ returns +true+ for all elements in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group.

|+template <typename Group, typename Ptr, class Predicate> bool none_of(Group g, Ptr first, Ptr last, Predicate pred);+
|Return +true+ if _pred_ returns +true+ for no element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group.
|===

==== Collectives

In this section, the meaning of "exclusive scan" and "inclusive scan" are as defined in Sections 29.8.7 and 29.8.8 of the {cpp}17 specification, respectively.

|===
|Function|Description

|+template <typename Group, typename T> T broadcast(Group g, T x);+
|Broadcast the value of _x_ from the work-item with the lowest id to all work-items within the group.

|+template <typename Group, typename T> T broadcast(Group g, T x, Group::linear_id_type local_linear_id);+
|Broadcast the value of _x_ from the work-item with the specified linear id to all work-items within the group. The value of _local_linear_id_ must be the same for all work-items in the group.

|+template <typename Group, typename T> T broadcast(Group g, T x, Group::id_type local_id);+
|Broadcast the value of _x_ from the work-item with the specified id to all work-items within the group. The value of _local_id_ must be the same for all work-items in the group, and its dimensionality must match the dimensionality of the group.

|+template <typename Group, typename T, class BinaryOperation> T reduce(Group g, T x, BinaryOperation binary_op);+
|Combine the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename V, typename T, class BinaryOperation> T reduce(Group g, V x, T init, BinaryOperation binary_op);+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to deduce V and T separately and use T as return type? Also it doesn't say what the accumulator type is. I suggest we keep it simple and don't allow mixed type to be automatic. Meaning only deduce once (like e.g. std::max) and also require that is_same_v<decltype(binary_op(x,x)), T>. If we keep it simple like that then we should make it easy to specify the type and should make T the first template argument.

Copy link
Contributor Author

@Pennycook Pennycook Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've attempted to address this in f09a7c9. Is the new text clear?

|Combine the values of _x_ from all work-items in the group using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename T, class BinaryOperation> T exclusive_scan(Group g, T x, BinaryOperation binary_op);+
|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work-items in the group and the identity value of _binary_op_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename V, typename T, class BinaryOperation> T exclusive_scan(Group g, V x, T init, BinaryOperation binary_op);+
|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _init_ and _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename T, class BinaryOperation> T inclusive_scan(Group g, T x, BinaryOperation binary_op);+
|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename V, class BinaryOperation, typename T> T inclusive_scan(Group g, V x, BinaryOperation binary_op, T init);+
|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ and _init_ must be the same for all work-items in the group.
|===

|===
|Function|Description

|+template <typename Group, typename Ptr, class BinaryOperation> Ptr::element_type reduce(Group g, Ptr first, Ptr last, BinaryOperation binary_op);+
|Combine the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_ and _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename Ptr, typename T, class BinaryOperation> T reduce(Group g, Ptr first, Ptr last, T init, BinaryOperation binary_op);+
|Combine the values in the range [_first_, _last_) using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_, _init__ and _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation> OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+
|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and the identity value of _binary_op_. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename InPtr, typename OutPtr, typename T, class BinaryOperation> OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, T init, BinaryOperation binary_op);+
|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _init_ and _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation> OutPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+
|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation, typename T> OutrPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op, T init);+
|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _binary_op_ and _init_ must be the same for all work-items in the group.
|===

== Issues

None.

//. asd
//+
//--
//*RESOLUTION*: Not resolved.
//--

== Revision History

[cols="5,15,15,70"]
[grid="rows"]
[options="header"]
|========================================
|Rev|Date|Author|Changes
|1|2020-01-30|John Pennycook|*Initial public working draft*
|========================================

//************************************************************************
//Other formatting suggestions:
//
//* Use *bold* text for host APIs, or [source] syntax highlighting.
//* Use +mono+ text for device APIs, or [source] syntax highlighting.
//* Use +mono+ text for extension names, types, or enum values.
//* Use _italics_ for parameters.
//************************************************************************
Loading