From e07b429bc7d40fcea29164849511acb1fd82d34b Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 26 Aug 2021 17:21:23 -0400 Subject: [PATCH 01/25] First public review of device global design spec --- sycl/doc/DeviceGlobal.md | 406 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 406 insertions(+) create mode 100644 sycl/doc/DeviceGlobal.md diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md new file mode 100644 index 0000000000000..d19b83edb89b0 --- /dev/null +++ b/sycl/doc/DeviceGlobal.md @@ -0,0 +1,406 @@ +# Implementation design for "device\_global" + +This document describes the implementation design for the DPC++ extension +[SYCL\_EXT\_ONEAPI\_DEVICE\_GLOBAL][1], which allows applications to declare global +variables in device code. + +[1]: + + +## Requirements + +The extension specification document referenced above contains the full set of +requirements for this feature, but some requirements that are particularly +relevant to the design are called out here. + +The first issue relates to the mechanism for integrating host and device code. +Like specialization constants, device global variables are referenced in both +host and device code, so they require some mechanism to correlate the variable +instance in device code with the variable name in host code. The API for +reading a device global variable from device code, however, is different from +the API for specialization constants. Whereas specialization constants are +read through a templated member function: + +``` +sycl::specialization_id spec_var; + +void func(sycl::kernel_handler kh) { + int i = kh.get_specialization_constant(); +} +``` + +Device global variables are read directly: + +``` +sycl::ext::oneapi::device_global dev_var; + +void func() { + int i = dev_var; +} +``` + +As we will see later, this difference has a ramification on the integration +mechanism. + +The second issue relates to the semantics of device global variables relative +to SPIR-V module scope global variables. The semantics are similar, but not +quite the same. In particular, a device global variable must retain its value +even if a module is recompiled (e.g. to change the value of a specialization +constant), whereas a SPIR-V module scope variable would not retain its value +in this case. Therefore, device global variables cannot be implemented solely +via SPIR-V module scope global variables. Instead, the design uses a +combination of SPIR-V module scope variables and also USM device allocated +memory. + + +## Design + +### Changes to DPC++ headers + +The headers must be changed, of course, to declare the new `device_global` +class, which is described in the [extension specification][1]. However, +there is no special magic required in the headers beyond the declaration of +this type. + +### New LLVM IR attributes + +As described below, the device compiler front-end decorates each +`device_global` variable with two attributes which convey information to the +`sycl-post-link` tool. + +The `sycl-unique-id` attribute contains a string which uniquely identifies the +variable instance. If the variable has external linkage, the string must be +the same for all translation units that define the variable (e.g. if the +variable is defined as `inline`). If the variable has internal linkage, the +string must not be shared by any other `device_global` variable, even in +another translation unit. (These rules for the identifying string are the +same as the rules we have for `specialization_id` variables.) + +The `sycl-device-global-size` attribute contains the size in bytes of the +underlying data type `T` of the `device_global` variable. As with all +attributes, the value is a string, so the size is represented as a string in +decimal format. + +Note that language rules ensure that `device_global` variables are always +declared at namespace scope (i.e. a global variable), and LLVM IR [allows +attributes to be attached to global variables][2]. + +[2]: + +### Changes to the DPC++ front-end + +The device compiler front-end must be changed in two ways: it must generate new +content in the integration footer and it must add the `sycl-unique-id` and +`sycl-device-global-size` attributes to the IR definition of of any +`device_global` variable. These two tasks are related because the integration +footer contains the same string that is stored in the `sycl-unique-id` +attribute. + +**TODO**: See also the "Unresolved issues" section at the bottom of this +document for another change that is needed in the front-end. + +#### New content in the integration footer + +New content in the integration footer provides a mapping from a host instance +of a `device_global` variable to its unique ID string. This is done through +partial specialization of a template function in much the same way that we do +for `specialization_id` variables. To illustrate, consider a translation unit +that defines two `device_global` variables: + +``` +#include + +sycl::device_global Foo; +static sycl::device_global Bar; + +// ... +``` + +The corresponding integration footer looks like this: + +``` +inline namespace cl { +namespace sycl::detail { + +template<> +inline const char *get_device_global_symbolic_ID_impl<::Foo>() { + return /* unique string for "Foo" */; +} + +template<> +inline const char *get_device_global_symbolic_ID_impl<::Bar>() { + return /* globally unique string because "Bar" has internal linkage */ +} + +} +} + +#include +``` + +As with the integration footer for `specialization_id` variables, the generated +code is more complex when `device_global` variables are defined in an unnamed +namespace. See the [specialization constant][3] specification for details. + +[3]: + +The `` file contains the definition +of the wrapper function which calls the partial specializations. This must be +last in the translation unit to satisfy the C++ requirement that references to +the template function must occur after all partial specializations are defined. + +``` +inline namespace cl { +namespace sycl::detail { + +template const char *get_device_global_symbolic_ID() { + return get_device_global_symbolic_ID_impl(); +} + +} +} +``` + +#### Decorating the IR with new attributes + +The device compiler front-end also adds the new `sycl-unique-id` and +`sycl-device-global-size` attribute to the IR definition of any `device_global` +variables. The `sycl-unique-id` attribute must contain the same string that is +emitted in the integration footer. + +### Changes to the `sycl-post-link` tool + +The `sycl-post-link` tool performs its normal algorithm to identify the set of +kernels and device functions that are bundled together into each module. Once +it identifies the functions in each module, it scans those functions looking +for references to global variables of type `device_global`. The +`sycl-post-link` tool then includes the following additional IR into each +module: + +1. The IR definition of each `device_global` variable that is referenced by + that module. + +2. If the module references at least one `device_global` variable, the IR + definition of a synthesized kernel function that initializes each of those + `device_global` variables. The following example shows the structure of + this kernel function, where `Foo` and `Bar` match the code example above: + + ``` + void __sycl_detail_UNIQUE_STRING(void *p1, void *p2) { + Foo.usmptr = p1; + Bar.usmptr = p2; + } + ``` + + The kernel takes one argument for each `device_global` variable and assigns + the `usmptr` field of each of those variables to its corresponding argument. + Note that the name of the kernel must be some unique string. Otherwise, + there is a danger that it will conflict with the name of another synthesized + initialization function if this module is online-linked with device code in + a shared library. For example, the implementation can construct a name using + a GUID. + +The `sycl-post-link` tool also emits new property set information as described +below. + +### New property in "SYCL/misc properties" + +If a device code module has one or more device global variables, a new property +is added to the "SYCL/misc properties" set named "device-global-initializer". +The value of this property has property type `PI_PROPERTY_TYPE_STRING` +containing the name of the synthesized kernel that initializes the device +global variables. + +### New "SYCL/device globals" property set + +Each device code module that references one or more device global variables +must have an associated "SYCL/device globals" property set. The name of each +property in this set is the `sycl-unique-id` string of a `device_global` +variable that is referenced in the module. The value of each property has +property type `PI_PROPERTY_TYPE_UINT32` which tells the size (in bytes) from +the `sycl-device-global-size` attribute for the `device_global` variable. + +The order of the properties in this set is important. The order matches the +order of the parameters accepted by the `__sycl_detail_UNIQUE_STRING` kernel +that is synthesized by the `sycl-post-link` tool. + +### Changes to the DPC++ runtime + +Several changes are needed to the DPC++ runtime + +* The runtime must allocate a buffer from USM device memory for each + `device_global` variable for each device that accesses that variable. As + noted in the requirements, the value of a device global variable must be + shared even across different device code modules that are loaded onto the + same device. As a result, we can't store the value in a SPIR-V module + scope global variable, which isn't shared across different modules. All + modules that access the same variable on a given device must share the same + USM buffer for that variable. + +* We need to call the synthesized `__sycl_detail_UNIQUE_STRING` kernel for each + device code module to initialize the `device_global` variables. + +* We need to implement the new `copy` and `memcpy` functions in the `queue` and + `handler` classes which copy to or from `device_global` variables. + +### Initializing the device global variables in device code + +When a DPC++ application submits a kernel, the runtime constructs a +`pi_program` containing this kernel that is compiled for the target device, if +such a `pi_program` does not yet exist. If the kernel resides in a device code +module that calls into a shared library, the runtime identifies a set of device +code modules that need to be online-linked together in order to construct the +`pi_program`. + +After creating a `pi_program` and before invoking any kernel it contains, the +runtime must do the following: + +* Scan the entries in the "SYCL/device globals" property sets for each device + code module that contributes to the `pi_program` to get the full set of + device global variables used by the `pi_program`. For each of the device + global variables, the runtime checks to see if a USM buffer has already been + created for that variable on this target device. If not, the runtime + allocates the buffer from USM device memory, using the size from the + "SYCL/device globals" property set. The runtime maintains a mapping from + the device global's unique string and this USM pointer. + +* Scan the "SYCL/misc properties" property set for "device-global-initializer" + properties. Each such property names a kernel in the `pi_program` which the + runtime must call to initialize the device global variables it contains. The + runtime uses the contents of the "SYCL/device globals" property set to + determine the number and order of USM device pointers to pass as arguments to + this kernel. The runtime waits for these kernel calls to complete before + submitting any application kernels from this `pi_program`. + +### Implementing the `copy` and `memcpy` functions in `queue` and `handler` + +Each of these functions is templated on a reference to a device global variable +like so: + +``` +template +event queue::copyto(/*...*/) {/*...*/} +``` + +The implementation can use the template parameter to obtain the variable's +unique string by calling the mapping function from the integration footer: + +``` +const char *name = detail::get_device_global_symbolic_ID(); +``` + +Once the runtime has this name, it is a simple matter to check if a USM buffer +has already been allocated for this device global variable on this device. +If it has not yet been allocated, this means that the application has not yet +submitted any kernels to this device that come from a module that defines this +device global variable. In this case, the runtime must allocate a buffer from +USM device memory using the size from the template parameter. The runtime +maintains a mapping from the unique string to this new USM pointer. + +``` +size_t numBytes = sizeof(decltype(DeviceGlobal)::type); +void *usmptr = malloc_device(numBytes, dev, ctxt); +``` + +The runtime can now copy to / from this USM buffer using any of the standard +USM explicit copy functions in the `queue` or `handler` class. + +Note that the runtime can avoid the cost of subsequent lookups of this +variable's unique string by caching the variable's USM pointer in the host +instance of the `device_global` variable: + +``` +template +event queue::copyto(/*...*/) { + if (!DeviceGlobal.usmptr) { + const char *name = detail::get_device_global_symbolic_ID(); + /* etc. */ + DeviceGlobal.usmptr = usmptr; + } + /* copy to / from the USM pointer */ +} +``` + +### Accessing the device global from device code + +Accessing the value of a `device_global` variable from device code is a simple +matter of accessing the memory from the USM pointer, which is available in the +variable's `usmptr` member. For example, the implementation of +`device_global::get()` might look like this: + +``` +T& get() noexcept { + return *usmptr; +} +``` + + +## Unresolved issues + +### Need some way to avoid errors referencing `device_global` variables + +The device compiler front-end currently diagnoses an error if device code +references a global variable, unless it is `constexpr` or `const` and constant +initialized. This is consistent with the SYCL 2020 specification, but the new +device global feature is an exception to this rule. Device code, of course, +can reference a `device_global` variable even if it is not declared `constexpr` +or `const`. We need some way to avoid the error diagnostic in this case. + +The [newly added][4] `sycl_global_var` attribute is almost what we need, +however that attribute is only allowed to decorate a variable declaration. +This doesn't help us because we don't want to force users to add an attribute +to each declaration of a `device_global` variable. Instead, we want to +decorate the class declaration of `device_global` with some attribute which +allows any variables of that type to be accessible from device code. + +[4]: + +Since the `sycl_global_var` attribute is currently used only as an +implementation detail for [device-side asserts][5], one options is to repurpose +this attribute. Rather than applying it to a variable declaration, we could +allow it only on a type declaration. The implementation of device-side asserts +could be changed to use the attribute on a new type, rather than on a variable +declaration. + +[5]: + +### Need some way to force `device_global` variables into global address space + +Although the underlying `T` type of a device global variable is stored in a USM +buffer, the `device_global` variable itself is a module scope global variable. +Unless we decorate these variables in some special way, the current behavior of +the `llvm-spirv` tool is to generate these variables in the private address +space, even though they are declared at module scope. + +The [existing OpenCL attribute][6] `[[clang::opencl_global]]` is almost what we +need, but again this attribute can only be applied to a variable declaration. +Instead, we want some attribute that can be applied to the type declaration of +`class device_global`. We could invent some new attribute with this semantic, +but there is another problem. + +[6]: + +Applying `[[clang::opencl_global]]` to a variable of class type currently +raises an error message saying there is no candidate "global" constructor for +the type. Apparently, the compiler expects a constructor to be defined with +the `__global` keyword: + +``` +class device_global { + public: + device_global() __global; +}; +``` + +We could add a default constructor like that, but the compiler only recognizes +this syntax when it is in OpenCL C++ mode, which is not the case when compiling +SYCL applications. Therefore, if we invented a new attribute that added +"global address space" semantics to a type, we would probably want that +attribute to cause any constructors to behave as though they were implicitly +declared with the `__global` keyword. + +Another option entirely is to change the default behavior of the SYCL device +compiler so that namespace scope variables are implicitly treated as though +they are in the global address space (as opposed to the private address space +as is currently the case). This behavior would be consistent with the way the +compiler works in OpenCL C 2.0 mode. From 78c301d73639287b991859837b7fc35040e7725b Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 22 Sep 2021 17:23:21 -0400 Subject: [PATCH 02/25] Add implementation for "device_image_life" --- sycl/doc/DeviceGlobal.md | 654 +++++++++++++++++++++++++-------------- 1 file changed, 426 insertions(+), 228 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index d19b83edb89b0..7639dea0bf95e 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -1,172 +1,311 @@ # Implementation design for "device\_global" This document describes the implementation design for the DPC++ extension -[SYCL\_EXT\_ONEAPI\_DEVICE\_GLOBAL][1], which allows applications to declare global -variables in device code. +SYCL\_EXT\_ONEAPI\_DEVICE\_GLOBAL, which allows applications to declare global +variables in device code. There are two API specifications for this feature, +the [main specification][1] and a specification for [properties that are +primarily used on FPGA][2]. This design document covers the APIs in both of +those specifications. [1]: +[2]: ## Requirements -The extension specification document referenced above contains the full set of +The extension specification documents referenced above contain the full set of requirements for this feature, but some requirements that are particularly relevant to the design are called out here. The first issue relates to the mechanism for integrating host and device code. Like specialization constants, device global variables are referenced in both host and device code, so they require some mechanism to correlate the variable -instance in device code with the variable name in host code. The API for -reading a device global variable from device code, however, is different from -the API for specialization constants. Whereas specialization constants are -read through a templated member function: +instance in device code with the variable instance in host code. The API for +referencing a device global variable, however, is different from the API for +specialization constants. Whereas specialization constants are referenced +through a templated member function: ``` sycl::specialization_id spec_var; -void func(sycl::kernel_handler kh) { - int i = kh.get_specialization_constant(); +void func(sycl::queue q) { + q.submit([&](sycl::handler &cgh) { + cgh.set_specialization_constant(42); + cgh.single_task([=](sycl::kernel_handler kh) { + int i = kh.get_specialization_constant(); + }); + }); } ``` -Device global variables are read directly: +Device global variables, by contrast, are referenced by their address: ``` sycl::ext::oneapi::device_global dev_var; -void func() { - int i = dev_var; +void func(sycl::queue q) { + int val = 42; + q.copy(&val, dev_var).wait(); // The 'dev_var' parameter is by reference + q.submit([&](sycl::handler &cgh) { + cgh.single_task([=] { + int i = dev_var; + }); + }); } ``` -As we will see later, this difference has a ramification on the integration -mechanism. - -The second issue relates to the semantics of device global variables relative -to SPIR-V module scope global variables. The semantics are similar, but not -quite the same. In particular, a device global variable must retain its value -even if a module is recompiled (e.g. to change the value of a specialization -constant), whereas a SPIR-V module scope variable would not retain its value -in this case. Therefore, device global variables cannot be implemented solely -via SPIR-V module scope global variables. Instead, the design uses a -combination of SPIR-V module scope variables and also USM device allocated -memory. +This is a key difference because the compiler does not statically know which +device global variable is being referenced; we only know the address at +runtime. As we will see later, this has a ramification on the integration +headers and on the mechanism that connects instances of device global variables +in host code with their corresponding instances in device code. + +Another issue relates to the `device_image_life` property which can be applied +to a device global variable declaration. The intent of this property is to +allow a device global variable to be implemented directly on top of a SPIR-V +module scope global variable. When this property is **not** present, an +instance of a device global variable is shared across all device images that +are loaded onto a particular device. This makes it easy for the user to reason +about the scope of a variable because the user need not understand which device +image contains each kernel. However, this semantic makes the implementation +less efficient, especially on FPGA targets. + +By contrast, the `device_image_life` property changes the semantic of a device +global variable such that the user must understand which device image contains +each kernel, which is difficult to reason about. For example, changing the +value of a specialization constant may cause a kernel to be recompiled into a +separate device image on some targets. As a result, a device global variable +referenced in a kernel may actually have several disjoint instances if the +kernel uses specialization constants. This problem is more tractable on FPGA +targets because specialization constants are not implemented via separate +device images on those targets, however, there are other factors that FPGA +users need to be aware of when using the `device_image_life` property. These +are documented more throughly in the extension specification. + +The important impact on the design, though, is that device global variables +declared with the `device_image_life` property have an implementation that is +quite different from device global variables that are not declared with this +property. The sections below describe both implementations. ## Design ### Changes to DPC++ headers -The headers must be changed, of course, to declare the new `device_global` -class, which is described in the [extension specification][1]. However, -there is no special magic required in the headers beyond the declaration of -this type. +The headers, of course, include the declaration of the new `device_global` +class, which is described in the [extension specification][1]. The declaration +of this class uses partial specialization to define the class differently +depending on whether is has the `device_image_life` property. When the +property is not present, the class has a member variable which is a pointer to +the underlying type. Member functions which return a reference to the value +(e.g. `get`) return the value of this pointer: + +``` +template +class device_global { + T *usmptr; + public: + T& get() noexcept { return *usmptr; } + /* other member functions */ +}; +``` + +However, when the property is present, it has a member variable which is the +type itself, and member functions return the address of this value. + +``` +template +class device_global { + T val; + public: + T& get() noexcept { return &val; } + /* other member functions */ +}; +``` + +In both cases the member variable (either `usmptr` or `val`) must be the first +member variable in the class. As we will see later, the runtime assumes that +the address of the `device_global` variable itself is the same as the address +of this member variable. ### New LLVM IR attributes As described below, the device compiler front-end decorates each -`device_global` variable with two attributes which convey information to the -`sycl-post-link` tool. - -The `sycl-unique-id` attribute contains a string which uniquely identifies the -variable instance. If the variable has external linkage, the string must be -the same for all translation units that define the variable (e.g. if the -variable is defined as `inline`). If the variable has internal linkage, the -string must not be shared by any other `device_global` variable, even in -another translation unit. (These rules for the identifying string are the -same as the rules we have for `specialization_id` variables.) - -The `sycl-device-global-size` attribute contains the size in bytes of the -underlying data type `T` of the `device_global` variable. As with all -attributes, the value is a string, so the size is represented as a string in -decimal format. +`device_global` variable with the `sycl-unique-id` attribute, which provides a +unique string identifier for each device global variable. + +This string will also be used to name the variable in SPIR-V, so it's better +for debuggability if the string matches the mangled name for variables with +external linkage. This is not possible, though, for variables with internal +linkage because the mangled name is not unique in this case. For these +variables, we use the mangled name and append a unique suffix. Note that language rules ensure that `device_global` variables are always declared at namespace scope (i.e. a global variable), and LLVM IR [allows -attributes to be attached to global variables][2]. +attributes to be attached to global variables][3]. -[2]: +[3]: ### Changes to the DPC++ front-end -The device compiler front-end must be changed in two ways: it must generate new -content in the integration footer and it must add the `sycl-unique-id` and -`sycl-device-global-size` attributes to the IR definition of of any -`device_global` variable. These two tasks are related because the integration -footer contains the same string that is stored in the `sycl-unique-id` -attribute. +The device compiler front-end is changed in two ways: it generates new content +in both the integration header and the integration footer, and it adds the +`sycl-unique-id` attribute to the IR definition of any `device_global` +variable. These two tasks are related because the integration footer contains +the same string that is stored in the `sycl-unique-id` attribute. -**TODO**: See also the "Unresolved issues" section at the bottom of this -document for another change that is needed in the front-end. +**NOTE**: See also the "Unresolved issues" section at the bottom of this +document for other changes that are needed in the front-end. -#### New content in the integration footer +#### New content in the integration header and footer -New content in the integration footer provides a mapping from a host instance -of a `device_global` variable to its unique ID string. This is done through -partial specialization of a template function in much the same way that we do -for `specialization_id` variables. To illustrate, consider a translation unit -that defines two `device_global` variables: +New content in the integration header and footer provides a mapping from the +host address of each device global variable to the unique string for that +variable. To illustrate, consider a translation unit that defines two +`device_global` variables: ``` #include -sycl::device_global Foo; -static sycl::device_global Bar; +static sycl::device_global Foo; +namespace inner { + sycl::device_global Bar; +} // namespace inner // ... ``` -The corresponding integration footer looks like this: +The corresponding integration header defines a namespace scope variable of +class type whose sole purpose is to run its constructor before the +application's `main()` function: ``` -inline namespace cl { namespace sycl::detail { +namespace { -template<> -inline const char *get_device_global_symbolic_ID_impl<::Foo>() { - return /* unique string for "Foo" */; -} +class __sycl_device_global_registration { + public: + __sycl_device_global_registration() noexcept; +}; +__sycl_device_global_registration __sycl_device_global_registerer; -template<> -inline const char *get_device_global_symbolic_ID_impl<::Bar>() { - return /* globally unique string because "Bar" has internal linkage */ -} +} // namepsace (unnamed) +} // namespace sycl::detail +``` -} +The integration footer contains the definition of the constructor, which calls +a function in the DPC++ runtime with the following information for each device +global variable that is defined in the translation unit: + +* The (host) address of the variable. +* The variable's string from the `sycl-unique-id` attribute. +* The size (in bytes) of the underlying `T` type for the variable. +* A boolean telling whether the variable is decorated with the + `device_image_life` property. + +``` +namespace sycl::detail { +namespace { + +__sycl_device_global_registration::__sycl_device_global_registration() noexcept { + device_global_map::add(&::Foo, + /* mangled name of '::Foo' with unique suffix appended */, + /* size of underlying 'T' type */, + /* bool telling whether variable has 'device_image_life` property */); + device_global_map::add(&::inner::Bar, + /* mangled name of '::inner::Bar' */, + /* size of underlying 'T' type */, + /* bool telling whether variable has 'device_image_life` property */); } -#include +} // namepsace (unnamed) +} // namespace sycl::detail ``` -As with the integration footer for `specialization_id` variables, the generated -code is more complex when `device_global` variables are defined in an unnamed -namespace. See the [specialization constant][3] specification for details. +Note that a SYCL application can legally call SYCL APIs even before `main()` by +calling them from a global constructor. However, the integration headers have +been designed to ensure that the address of each device global variable is +registered with the DPC++ runtime before the user's application could legally +use the variable, even if that use occurs before `main()` executes. + +The user's application cannot legally use a device global variable until the +variable's constructor has been called, otherwise the application would be +using an unconstructed object which has undefined behavior by C++ rules. Since +all device globals must be defined at namespace scope, the C++ rules for the +order of global constructors only guarantee that the device global will be +constructed before subsequent global variables in the same translation unit. +Therefore, a user application could reference a device global from another +global constructor only if that global constructor is for an object defined +*after* the device global in the same translation unit. However, the +integration header defines `__sycl_device_global_registerer` *before* all +device globals in the user's translation unit. Therefore, the address of all +device global variables in the translation unit will be registered with the +DPC++ runtime before any user code could legally use them. + +#### Handling shadowed device global variables + +The example above shows a simple case where the user's device global variables +can all be uniquely referenced via fully qualified lookup (e.g. +`::inner::Bar`). However, it is possible for users to construct applications +where this is not the case, for example: + +``` +sycl::device_global FuBar; +namespace { + sycl::device_global FuBar; +} +``` -[3]: +In this example, the `FuBar` variable in the global namespace shadows a +variable with the same name in the unnamed namespace. The integration footer +could reference the variable in the global namespace as `::FuBar`, but there is +no way to reference the variable in the unnamed namespace using fully qualified +lookup. -The `` file contains the definition -of the wrapper function which calls the partial specializations. This must be -last in the translation unit to satisfy the C++ requirement that references to -the template function must occur after all partial specializations are defined. +Such programs are still legal, though. The integration footer can support +cases like this by defining a temporary variable that holds the address of the +shadowed device global: ``` -inline namespace cl { -namespace sycl::detail { - -template const char *get_device_global_symbolic_ID() { - return get_device_global_symbolic_ID_impl(); +namespace { +const void *__sycl_UNIQUE_STRING = &FuBar; // References 'FuBar' in the + // unnamed namespace } +namespace sycl::detail { +namespace { + +__sycl_device_global_registration::__sycl_device_global_registration() noexcept { + device_global_map::add(&::FuBar, + /* mangled name of '::FuBar' */, + /* size of underlying 'T' type */, + /* bool telling whether variable has 'device_image_life` property */); + device_global_map::add(::__sycl_UNIQUE_STRING, + /* mangled name of '::(unnamed)::FuBar' with unique suffix appended */, + /* size of underlying 'T' type */, + /* bool telling whether variable has 'device_image_life` property */); } -} + +} // namepsace (unnamed) +} // namespace sycl::detail ``` +The `__sycl_UNIQUE_STRING` variable is defined in the same namespace as the +second `FuBar` device global, so it can reference the variable through +unqualified name lookup. Furthermore, the name of the temporary variable +(`__sycl_UNIQUE_STRING`) is globally unique, so it is guaranteed not to be +shadowed by any other name in the translation unit. This problem with variable +shadowing is also a problem for the integration footer we use for +specialization constants. See the [specialization constant design document][4] +for more details on this topic. + +[4]: + #### Decorating the IR with new attributes -The device compiler front-end also adds the new `sycl-unique-id` and -`sycl-device-global-size` attribute to the IR definition of any `device_global` -variables. The `sycl-unique-id` attribute must contain the same string that is -emitted in the integration footer. +The device compiler front-end also adds the new `sycl-unique-id` attribute to +the IR definition of any device global variables. The value of this attribute +is the same string that is emitted in the integration footer. ### Changes to the `sycl-post-link` tool @@ -174,76 +313,70 @@ The `sycl-post-link` tool performs its normal algorithm to identify the set of kernels and device functions that are bundled together into each module. Once it identifies the functions in each module, it scans those functions looking for references to global variables of type `device_global`. The -`sycl-post-link` tool then includes the following additional IR into each -module: - -1. The IR definition of each `device_global` variable that is referenced by - that module. - -2. If the module references at least one `device_global` variable, the IR - definition of a synthesized kernel function that initializes each of those - `device_global` variables. The following example shows the structure of - this kernel function, where `Foo` and `Bar` match the code example above: - - ``` - void __sycl_detail_UNIQUE_STRING(void *p1, void *p2) { - Foo.usmptr = p1; - Bar.usmptr = p2; - } - ``` - - The kernel takes one argument for each `device_global` variable and assigns - the `usmptr` field of each of those variables to its corresponding argument. - Note that the name of the kernel must be some unique string. Otherwise, - there is a danger that it will conflict with the name of another synthesized - initialization function if this module is online-linked with device code in - a shared library. For example, the implementation can construct a name using - a GUID. - -The `sycl-post-link` tool also emits new property set information as described -below. +`sycl-post-link` tool then includes the IR definition of each of these +`device_global` variables in the module. -### New property in "SYCL/misc properties" +The [backend functions described below][5] that allow the host to copy to or +from a device global require the variable to have "export" linkage in SPIR-V. +Therefore, the `sycl-post-link` tool needs to make the following IR +transformations for any `device_global` variable that has internal linkage: -If a device code module has one or more device global variables, a new property -is added to the "SYCL/misc properties" set named "device-global-initializer". -The value of this property has property type `PI_PROPERTY_TYPE_STRING` -containing the name of the synthesized kernel that initializes the device -global variables. +[5]: <#back-end-specific-function-to-copy-to--from-a-device-symbol> + +* The linkage is changed to be external. +* The name of the variable is changed to be the string from the + `sycl-unique-id` attribute. -### New "SYCL/device globals" property set +**NOTE**: It seems likely that changing the name of internal linkage variables +will be bad for debuggability of the code. The user may attempt to print the +value of a variable in the debugger, but the debugger won't know the variable +by that name. See the "Unresolved issues" section below for more discussion +on this. -Each device code module that references one or more device global variables -must have an associated "SYCL/device globals" property set. The name of each -property in this set is the `sycl-unique-id` string of a `device_global` -variable that is referenced in the module. The value of each property has -property type `PI_PROPERTY_TYPE_UINT32` which tells the size (in bytes) from -the `sycl-device-global-size` attribute for the `device_global` variable. +The `sycl-post-link` tool also adds the new "device-globals" property to the +"SYCL/misc properties" set, as described below. -The order of the properties in this set is important. The order matches the -order of the parameters accepted by the `__sycl_detail_UNIQUE_STRING` kernel -that is synthesized by the `sycl-post-link` tool. +### New property in "SYCL/misc properties" + +If a device code module has one or more device global variables, a new property +is added to the "SYCL/misc properties" set named "device-globals". The value +of this property has property type `PI_PROPERTY_TYPE_BYTE_ARRAY` and contains +the `sycl-unique-id` strings for each device global variable that the module +contains. The value of the property is the concatenation of all these +strings, where each string ends with a null character (`\0`). ### Changes to the DPC++ runtime Several changes are needed to the DPC++ runtime -* The runtime must allocate a buffer from USM device memory for each - `device_global` variable for each device that accesses that variable. As - noted in the requirements, the value of a device global variable must be - shared even across different device code modules that are loaded onto the - same device. As a result, we can't store the value in a SPIR-V module - scope global variable, which isn't shared across different modules. All - modules that access the same variable on a given device must share the same - USM buffer for that variable. +* As noted in the requirements section, an instance of a device global variable + that does not have the `device_image_life` property is shared by all device + images on a device. To satisfy this requirement, the device global variable + contains a pointer to a buffer allocated from USM device memory, and the + content of the variable is stored in this buffer. All device images point to + the same buffer, so the variable's state is shared. The runtime, therefore, + must allocate this USM buffer for each such device global variable. + +* As we noted above, the front-end generates new content in the integration + footer which calls the function `sycl::detail::device_global_map::add()`. + The runtime defines this function and maintains information about all the + device global variables in the application. This information includes: -* We need to call the synthesized `__sycl_detail_UNIQUE_STRING` kernel for each - device code module to initialize the `device_global` variables. + - The host address of the variable. + - The string which uniquely identifies the variable. + - The size (in bytes) of the underlying `T` type for the variable. + - A boolean telling whether the variable is decorated with the + `device_image_life` property. + - The associated per-device USM buffer pointer, if this variable does not + have the `device_image_life` property. -* We need to implement the new `copy` and `memcpy` functions in the `queue` and - `handler` classes which copy to or from `device_global` variables. + We refer to this information as the "device global database" below. -### Initializing the device global variables in device code +* The runtime also implements the new `copy` and `memcpy` functions in the + `queue` and `handler` classes which copy to or from `device_global` + variables. + +#### Initializing the device global variables in device code When a DPC++ application submits a kernel, the runtime constructs a `pi_program` containing this kernel that is compiled for the target device, if @@ -253,86 +386,87 @@ code modules that need to be online-linked together in order to construct the `pi_program`. After creating a `pi_program` and before invoking any kernel it contains, the -runtime must do the following: - -* Scan the entries in the "SYCL/device globals" property sets for each device - code module that contributes to the `pi_program` to get the full set of - device global variables used by the `pi_program`. For each of the device - global variables, the runtime checks to see if a USM buffer has already been - created for that variable on this target device. If not, the runtime - allocates the buffer from USM device memory, using the size from the - "SYCL/device globals" property set. The runtime maintains a mapping from - the device global's unique string and this USM pointer. - -* Scan the "SYCL/misc properties" property set for "device-global-initializer" - properties. Each such property names a kernel in the `pi_program` which the - runtime must call to initialize the device global variables it contains. The - runtime uses the contents of the "SYCL/device globals" property set to - determine the number and order of USM device pointers to pass as arguments to - this kernel. The runtime waits for these kernel calls to complete before - submitting any application kernels from this `pi_program`. - -### Implementing the `copy` and `memcpy` functions in `queue` and `handler` - -Each of these functions is templated on a reference to a device global variable -like so: +runtime does the following: + +* Scan the strings in the "device-globals" properties of the + "SYCL/misc properties" sets of each device code module that contributes to + the `pi_program` to get the unique string associated with each device global + variable that is used by the `pi_program`. For each of these strings, the + runtime uses the device global database to see if the variable was decorated + with `device_image_life`. If it was not so decorated and if a USM buffer has + not already been created for the variable on this target device, the runtime + allocates the buffer from USM device memory using the size from the database. + The pointer to this buffer is saved in the database for future reuse. + +* For each device global variable that is not decorated with + `device_image_life`, the runtime initializes the `usmptr` member in the + *device instance* of the variable by using a backend-specific function which + copies data from the host to a device variable. It is a simple matter to use + this function to overwrite the `usmptr` member with the address of the USM + buffer. The details of this device-specific function are described below. + +#### Implementing the `copy` and `memcpy` functions in `queue` and `handler` + +Each of these functions accepts a (host) pointer to a device global variable as +one of its parameters, and the runtime uses this pointer to find the associated +information for this variable in the device global database. The remaining +behavior depends on whether the variable is decorated with `device_image_life`. + +If the variable is not decorated with this property, the runtime uses the +database to determine if a USM buffer has been allocated yet for this variable +on this device. If not, the runtime allocates the buffer using the size from +the database. Regardless, the runtime implements the `copy` / `memcpy` by +copying to or from this USM buffer, using the normal mechanism for copying +to / from a USM pointer. + +The runtime avoids the future cost of looking up the variable in the database +by caching the USM pointer in the host instance of the variable's `usmptr` +member. + +If the variable is decorated with the `device_image_life` property, the runtime +gets the unique string identifier for the variable from the database and uses +a backend-specific function to copy to or from the variable with that +identifier. Again, the details of this function are described below. + +In all cases, the runtime diagnoses invalid calls that write beyond the device +global variable's size by using the size in the database. + +#### Back-end specific function to copy to / from a device symbol + +As noted above, we need a backend-specific function copy to / from the device +instance of a variable. All backends provide this functionality, which is +abstracted with these new PI interfaces: ``` -template -event queue::copyto(/*...*/) {/*...*/} -``` +pi_result piextCopyToDeviceVariable(pi_device Device, const char *name, + const void *src, size_t count, size_t offset); -The implementation can use the template parameter to obtain the variable's -unique string by calling the mapping function from the integration footer: - -``` -const char *name = detail::get_device_global_symbolic_ID(); +pi_result piextCopyFromDeviceVariable(pi_device Device, const char *name, + void *dst, size_t count, size_t offset); ``` -Once the runtime has this name, it is a simple matter to check if a USM buffer -has already been allocated for this device global variable on this device. -If it has not yet been allocated, this means that the application has not yet -submitted any kernels to this device that come from a module that defines this -device global variable. In this case, the runtime must allocate a buffer from -USM device memory using the size from the template parameter. The runtime -maintains a mapping from the unique string to this new USM pointer. +In both cases the `name` parameter is the same as the "unique string +identifier" for the device global variable. -``` -size_t numBytes = sizeof(decltype(DeviceGlobal)::type); -void *usmptr = malloc_device(numBytes, dev, ctxt); -``` +On the Level Zero backend, these PI interfaces are implemented by first calling +[`zeModuleGetGlobalPointer()`][6] to get a device pointer for the variable and +then calling [`zeCommandListAppendMemoryCopy()`][7] to copy to or from that +pointer. -The runtime can now copy to / from this USM buffer using any of the standard -USM explicit copy functions in the `queue` or `handler` class. +[6]: +[7]: -Note that the runtime can avoid the cost of subsequent lookups of this -variable's unique string by caching the variable's USM pointer in the host -instance of the `device_global` variable: +On the OpenCL backend, these PI interfaces are implemented by first calling +`clGetDeviceGlobalVariablePointerINTEL()` to get a device pointer for the +variable. This function is provided by the +[`cl_intel_global_variable_pointers`][8] extension which is not yet +productized. Once we get a pointer, the PI layer calls +`clEnqueueMemcpyINTEL()` to copy to or from that pointer. -``` -template -event queue::copyto(/*...*/) { - if (!DeviceGlobal.usmptr) { - const char *name = detail::get_device_global_symbolic_ID(); - /* etc. */ - DeviceGlobal.usmptr = usmptr; - } - /* copy to / from the USM pointer */ -} -``` +[8]: -### Accessing the device global from device code - -Accessing the value of a `device_global` variable from device code is a simple -matter of accessing the memory from the USM pointer, which is available in the -variable's `usmptr` member. For example, the implementation of -`device_global::get()` might look like this: - -``` -T& get() noexcept { - return *usmptr; -} -``` +On the CUDA backend, these PI interfaces are implemented on top of +`cudaMemcpyToSymbol()` and `cudaMemcpyFromSymbol()`. ## Unresolved issues @@ -346,23 +480,42 @@ device global feature is an exception to this rule. Device code, of course, can reference a `device_global` variable even if it is not declared `constexpr` or `const`. We need some way to avoid the error diagnostic in this case. -The [newly added][4] `sycl_global_var` attribute is almost what we need, +The [newly added][9] `sycl_global_var` attribute is almost what we need, however that attribute is only allowed to decorate a variable declaration. This doesn't help us because we don't want to force users to add an attribute to each declaration of a `device_global` variable. Instead, we want to decorate the class declaration of `device_global` with some attribute which allows any variables of that type to be accessible from device code. -[4]: +[9]: Since the `sycl_global_var` attribute is currently used only as an -implementation detail for [device-side asserts][5], one options is to repurpose +implementation detail for [device-side asserts][10], one option is to repurpose this attribute. Rather than applying it to a variable declaration, we could allow it only on a type declaration. The implementation of device-side asserts could be changed to use the attribute on a new type, rather than on a variable declaration. -[5]: +[10]: + +### Need to diagnose invalid declarations of `device_global` variables + +The device global extension specification places restrictions on where a +`device_global` variable can be declared. These restrictions are similar to +ones we have already for variables of type `specialization_id`: + +* A `device_global` variable can be declared at namespace scope. +* A `device_global` variable can be declared as a static member variable in + class scope, but only if the declaration has public visibility from namespace + scope. +* No other declarations are allowed for a variable of type `device_global`. + +The device compiler front-end needs to emit a diagnostic if a `device_global` +variable is declared in a way that violates these restrictions. We do not have +agreement yet, though, on how this should be done. For example, should the +front-end recognize these variable declarations by the name of their type, or +should we decorate the type with some C++ attribute that helps the front-end +recognize them? ### Need some way to force `device_global` variables into global address space @@ -372,13 +525,13 @@ Unless we decorate these variables in some special way, the current behavior of the `llvm-spirv` tool is to generate these variables in the private address space, even though they are declared at module scope. -The [existing OpenCL attribute][6] `[[clang::opencl_global]]` is almost what we -need, but again this attribute can only be applied to a variable declaration. -Instead, we want some attribute that can be applied to the type declaration of -`class device_global`. We could invent some new attribute with this semantic, -but there is another problem. +The [existing OpenCL attribute][11] `[[clang::opencl_global]]` is almost what +we need, but again this attribute can only be applied to a variable +declaration. Instead, we want some attribute that can be applied to the type +declaration of `class device_global`. We could invent some new attribute with +this semantic, but there is another problem. -[6]: +[11]: Applying `[[clang::opencl_global]]` to a variable of class type currently raises an error message saying there is no candidate "global" constructor for @@ -404,3 +557,48 @@ compiler so that namespace scope variables are implicitly treated as though they are in the global address space (as opposed to the private address space as is currently the case). This behavior would be consistent with the way the compiler works in OpenCL C 2.0 mode. + +### Need some way to propagate properties to SPIR-V + +The [specification of properties normally used on FPGA][2] includes three +properties that must be propagated from DPC++ source code, through LLVM IR, and +into SPIR-V where they are represented as SPIR-V decorations: + +* `copy_access` +* `init_via` +* `implement_in_csr` + +It's not clear how this should work. One of the goals of the new property +mechanism is to make it easy to propagate information like this through the +compiler toolchain, so hopefully we can leverage some common infrastructure +rather than hard-coding support for these three properties. However, there is +not yet a design document for the new properties mechanism, so it's not yet +clear what this infrastructure will be. + +### Will changing the name of internal symbols be bad for debugging? + +The [backend functions for copying to / from a device symbol][5] currently +require the symbol to have export linkage in SPIR-V. (This is the case for the +Level Zero and OpenCL functions. We are not sure about the CUDA functions, but +it seems likely they have the same limitation.) However, the device global +extension allows these variables to also have internal linkage, and this seems +like a useful feature. The current strategy is to convert internal linkage +variables to external linkage at the IR level and also rename the symbol in a +way that is globally unique. + +This should result in functionally correct code, but it seems likely to make +debugging more difficult. If the debugger uses the name from SPIR-V, this name +will not match what the user expects. We attempt to mitigate this somewhat by +preserving the user's name and appending a unique suffix, but this seems like a +weak mitigation. + +Do we think the debugging experience will be so bad that we should change the +strategy? The fundamental requirement is that we need some unique way to +identify each device code variable when using these backend functions. +Currently, we use the variable's mangled name, but this could be changed. +An alternative solution would be to augment the SPIR-V with some new decoration +that gives a unique name to each `OpVariable` that needs to be accessed from +the host. We could then use that name with the backend functions, and avoid +renaming variables with internal linkage. This would be more effort, though, +because we would need a new SPIR-V extension, and we would need to change the +implementation of the Level Zero and OpenCL backends. From ea929140fe59697c599c6dfe0fdb93ab716c4bc2 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 27 Sep 2021 14:40:32 -0400 Subject: [PATCH 03/25] Address review comments Address review comments from @GarveyJoe. --- sycl/doc/DeviceGlobal.md | 45 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 7639dea0bf95e..5495738ad2731 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -111,14 +111,14 @@ class device_global { ``` However, when the property is present, it has a member variable which is the -type itself, and member functions return the address of this value. +type itself, and member functions return a reference to this value. ``` template class device_global { T val; public: - T& get() noexcept { return &val; } + T& get() noexcept { return val; } /* other member functions */ }; ``` @@ -602,3 +602,44 @@ the host. We could then use that name with the backend functions, and avoid renaming variables with internal linkage. This would be more effort, though, because we would need a new SPIR-V extension, and we would need to change the implementation of the Level Zero and OpenCL backends. + +### Does compiler need to be deterministic? + +The compiler is normally deterministic. If you compile the exact same source +file twice specifying the same command line options each time, you get exactly +the same object file. However, this will no longer be the case. + +The design in this document generates a GUID and uses that GUID to rename +device global variable with internal linkage. Since the GUID is different each +time the compiler is executed, the resulting object file is different even if +the source file did not change. The existing design for specialization +constants has exactly the same issue because it also uses a GUID to generate a +unique string for `specialization_id` variables that have internal linkage. + +Is this a problem? If we want to preserve determinism, we could generate +a unique ID as a hash (e.g. SHA-256) from the content of the source file +**and** the command line arguments passed to the compiler. However, this would +require reading the content of the source file, which would have an impact on +compilation time. It's not clear how significant this impact would be, though. + +Note that the non-determinism will cause a problem with the FPGA `-reuse-exe` +compiler option. That option uses the result of a previous compilation to +avoid regenerating FPGA native code if the device code in a translation unit +did not change. (For example, this option avoids regenerating device native +code if the only change in the translation unit was to the host code.) The +option is implemented by comparing device IR from the previous compilation with +the IR in the new compilation. Native code is regenerated only if the IR is +different. This logic will break, though, if the compiler is +non-deterministic because the IR will always be different, so native code will +always be regenerated. This is a showstopper issue for FPGA because native +code generation takes a very long time. + +I see two ways to solve the problem with `-reuse-exe`: + +1. We could change the GUID to be a deterministic hash as outlined above. + +2. We could change SPIR-V as proposed above to give a unique name to each + `OpVariable` which needs to be referenced from the host. This would avoid + the need to change the exported variable name to be a GUID, thus the IR will + be deterministic. (It is also possible to generate the unique `OpVariable` + names in a deterministic way, so this won't cause a problem.) From c5cc57fd3085506936681191176a1002760cd839 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 30 Sep 2021 17:44:56 -0400 Subject: [PATCH 04/25] Reconcile design with latest spec Update the design to align with the latest version of the extension API spec. Also add specifications for underlying changes to SPIR-V and OpenCL. --- sycl/doc/DeviceGlobal.md | 106 +++++----- ...TEL_fpga_device_global_properties.asciidoc | 186 +++++++++++++++++ ...cl_intel_global_variable_pointers.asciidoc | 188 ++++++++++++++++++ 3 files changed, 434 insertions(+), 46 deletions(-) create mode 100644 sycl/doc/extensions/DeviceGlobal/SPV_INTEL_fpga_device_global_properties.asciidoc create mode 100644 sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 5495738ad2731..e48cca11c7a70 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -1,19 +1,15 @@ # Implementation design for "device\_global" This document describes the implementation design for the DPC++ extension -SYCL\_EXT\_ONEAPI\_DEVICE\_GLOBAL, which allows applications to declare global -variables in device code. There are two API specifications for this feature, -the [main specification][1] and a specification for [properties that are -primarily used on FPGA][2]. This design document covers the APIs in both of -those specifications. +[SYCL\_EXT\_ONEAPI\_DEVICE\_GLOBAL][1], which allows applications to declare +global variables in device code. -[1]: -[2]: +[1]: ## Requirements -The extension specification documents referenced above contain the full set of +The extension specification document referenced above contains the full set of requirements for this feature, but some requirements that are particularly relevant to the design are called out here. @@ -128,31 +124,43 @@ member variable in the class. As we will see later, the runtime assumes that the address of the `device_global` variable itself is the same as the address of this member variable. +The headers are also updated to add the new `copy()` and `memcpy()` member +functions to `handler` and `queue` which copy data to or from a device global +variable. These declarations use SFINAE such that they are conditionally +available depending on the `copy_access` property. + ### New LLVM IR attributes -As described below, the device compiler front-end decorates each -`device_global` variable with the `sycl-unique-id` attribute, which provides a -unique string identifier for each device global variable. +Two new attributes are added to communicate information about device global +variable to the `sycl-post-link` tool: `sycl-unique-id` and +`sycl-device-global-image-life`. As described below, the device compiler +front-end is responsible for adding the attributes to the LLVM IR. + +Each device global variable is decorated with `sycl-unique-id`, which provides +a unique string identifier for each device global variable. This string will +also be used to name the variable in SPIR-V, so it's better for debuggability +if the string matches the mangled name for variables with external linkage. +This is not possible, though, for variables with internal linkage because the +mangled name is not unique in this case. For these variables, we use the +mangled name and append a unique suffix. -This string will also be used to name the variable in SPIR-V, so it's better -for debuggability if the string matches the mangled name for variables with -external linkage. This is not possible, though, for variables with internal -linkage because the mangled name is not unique in this case. For these -variables, we use the mangled name and append a unique suffix. +Each device global variable that has the `device_image_life` property is also +decorated with the `sycl-device-global-image-life` attribute. Note that language rules ensure that `device_global` variables are always declared at namespace scope (i.e. a global variable), and LLVM IR [allows -attributes to be attached to global variables][3]. +attributes to be attached to global variables][2]. -[3]: +[2]: ### Changes to the DPC++ front-end The device compiler front-end is changed in two ways: it generates new content in both the integration header and the integration footer, and it adds the -`sycl-unique-id` attribute to the IR definition of any `device_global` -variable. These two tasks are related because the integration footer contains -the same string that is stored in the `sycl-unique-id` attribute. +`sycl-unique-id` and `sycl-device-global-image-life` attributes to the IR +definitions of `device_global` variables as defined above. These two tasks are +related because the integration footer contains the same string that is stored +in the `sycl-unique-id` attribute. **NOTE**: See also the "Unresolved issues" section at the bottom of this document for other changes that are needed in the front-end. @@ -296,32 +304,35 @@ unqualified name lookup. Furthermore, the name of the temporary variable (`__sycl_UNIQUE_STRING`) is globally unique, so it is guaranteed not to be shadowed by any other name in the translation unit. This problem with variable shadowing is also a problem for the integration footer we use for -specialization constants. See the [specialization constant design document][4] +specialization constants. See the [specialization constant design document][3] for more details on this topic. -[4]: - -#### Decorating the IR with new attributes - -The device compiler front-end also adds the new `sycl-unique-id` attribute to -the IR definition of any device global variables. The value of this attribute -is the same string that is emitted in the integration footer. +[3]: ### Changes to the `sycl-post-link` tool The `sycl-post-link` tool performs its normal algorithm to identify the set of kernels and device functions that are bundled together into each module. Once it identifies the functions in each module, it scans those functions looking -for references to global variables of type `device_global`. The -`sycl-post-link` tool then includes the IR definition of each of these -`device_global` variables in the module. +for references to global variables of type `device_global`. If any device +global variable decorated with `sycl-device-global-image-life` appears in more +than one module, the `sycl-post-link` tool issues an error diagnostic: -The [backend functions described below][5] that allow the host to copy to or +``` +error: device_global variable with property "device_image_life" + is contained in more than one device image. +``` + +Assuming that no error diagnostic is issued, the `sycl-post-link` tool includes +the IR definition of each `device_global` variable in the modules that +reference that variable. + +The [backend functions described below][4] that allow the host to copy to or from a device global require the variable to have "export" linkage in SPIR-V. Therefore, the `sycl-post-link` tool needs to make the following IR transformations for any `device_global` variable that has internal linkage: -[5]: <#back-end-specific-function-to-copy-to--from-a-device-symbol> +[4]: <#back-end-specific-function-to-copy-to--from-a-device-symbol> * The linkage is changed to be external. * The name of the variable is changed to be the string from the @@ -449,21 +460,21 @@ In both cases the `name` parameter is the same as the "unique string identifier" for the device global variable. On the Level Zero backend, these PI interfaces are implemented by first calling -[`zeModuleGetGlobalPointer()`][6] to get a device pointer for the variable and -then calling [`zeCommandListAppendMemoryCopy()`][7] to copy to or from that +[`zeModuleGetGlobalPointer()`][5] to get a device pointer for the variable and +then calling [`zeCommandListAppendMemoryCopy()`][6] to copy to or from that pointer. -[6]: -[7]: +[5]: +[6]: On the OpenCL backend, these PI interfaces are implemented by first calling `clGetDeviceGlobalVariablePointerINTEL()` to get a device pointer for the variable. This function is provided by the -[`cl_intel_global_variable_pointers`][8] extension which is not yet +[`cl_intel_global_variable_pointers`][7] extension which is not yet productized. Once we get a pointer, the PI layer calls `clEnqueueMemcpyINTEL()` to copy to or from that pointer. -[8]: +[7]: On the CUDA backend, these PI interfaces are implemented on top of `cudaMemcpyToSymbol()` and `cudaMemcpyFromSymbol()`. @@ -480,14 +491,14 @@ device global feature is an exception to this rule. Device code, of course, can reference a `device_global` variable even if it is not declared `constexpr` or `const`. We need some way to avoid the error diagnostic in this case. -The [newly added][9] `sycl_global_var` attribute is almost what we need, +The [newly added][8] `sycl_global_var` attribute is almost what we need, however that attribute is only allowed to decorate a variable declaration. This doesn't help us because we don't want to force users to add an attribute to each declaration of a `device_global` variable. Instead, we want to decorate the class declaration of `device_global` with some attribute which allows any variables of that type to be accessible from device code. -[9]: +[8]: Since the `sycl_global_var` attribute is currently used only as an implementation detail for [device-side asserts][10], one option is to repurpose @@ -560,9 +571,12 @@ compiler works in OpenCL C 2.0 mode. ### Need some way to propagate properties to SPIR-V -The [specification of properties normally used on FPGA][2] includes three -properties that must be propagated from DPC++ source code, through LLVM IR, and -into SPIR-V where they are represented as SPIR-V decorations: +The following three device global properties must be propagated from DPC++ +source code, through LLVM IR, and into SPIR-V where they are represented as +SPIR-V decorations (defined in the +[SPV\_INTEL\_fpga\_device\_global\_properties][12] extension). + +[12]: * `copy_access` * `init_via` @@ -577,7 +591,7 @@ clear what this infrastructure will be. ### Will changing the name of internal symbols be bad for debugging? -The [backend functions for copying to / from a device symbol][5] currently +The [backend functions for copying to / from a device symbol][4] currently require the symbol to have export linkage in SPIR-V. (This is the case for the Level Zero and OpenCL functions. We are not sure about the CUDA functions, but it seems likely they have the same limitation.) However, the device global diff --git a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_fpga_device_global_properties.asciidoc b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_fpga_device_global_properties.asciidoc new file mode 100644 index 0000000000000..6ad5406b7c8f4 --- /dev/null +++ b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_fpga_device_global_properties.asciidoc @@ -0,0 +1,186 @@ += SPV_INTEL_fpga_device_global_properties + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + +// This is necessary for asciidoc, but not for asciidoctor +:cpp: C++ + +== Name Strings + +SPV_INTEL_fpga_device_global_properties + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/KhronosGroup/SPIRV-Headers + +== Contributors + +Artem Radzikhovskyy, Intel + +Michael Kinsner, Intel + +Joe Garvey, Intel + +Mohammad Fawaz, Intel + +== Notice + +Copyright (c) 2021 Intel Corporation. All rights reserved. + +== Status + +Final Draft + +== Version + +Built On: {docdate} + +Revision: 1 + +== Dependencies + +This extension is written against the SPIR-V Specification, +Version 1.5 Revision 5. + +This extension requires SPIR-V 1.0. + +== Overview + +This extension adds decorations that influence compiler generation of `device_global` structures on FPGA targets. + +== Extension Name +To use this extension within a SPIR-V module, the following *OpExtension* must be present in the module: + +---- +OpExtension "SPV_INTEL_fpga_device_global_properties" +---- + +== New Capabilities +This extension introduces a new capability: + +---- +FPGADeviceGlobalPropertiesINTEL +---- + +== Token Number Assignments + +-- +[width="40%"] +[cols="70%,30%"] +[grid="rows"] +|==== +|FPGADeviceGlobalPropertiesINTEL |6146 +|NoCopyINTEL |6147 +|InitViaINTEL |6148 +|ImplementInCSRINTEL |6149 +|==== +-- + +== Modifications to the SPIR-V Specification, Version 1.5 + +=== Decoration + +Modify Section 3.20, Decoration, adding these rows to the Decoration table: + +-- +[options="header"] +|==== +2+^| Decoration | Extra Operands | Enabling Capabilities +// --- ROW BREAK --- +| 6147 +| *CopyAccessINTEL* + +Only valid on global variables. + +Controls which memory ports are exposed for use by a generated IP core. + +Legal values of _Access_: + +0 [Read] - Only a read port is exposed. + +1 [Write] - Only a write port is exposed. + +2 [Read/Write] - A read/write port is exposed. + +3 [None] - No port is exposed. + +| Literal Number + +_Access_ +| *FPGADeviceGlobalPropertiesINTEL* + +// --- ROW BREAK --- +| 6148 +| *InitViaINTEL* + +Only valid on global variables. + +Controls the event which triggers memory initialization. + +Legal values of _Trigger_: + +0 [init on device reprogram] - initialization is performed when the device is programmed. + +1 [init on device reset] - initialization is performed when a reset signal is sent to the device. + + +| Literal Number + +_Trigger_ +| *FPGADeviceGlobalPropertiesINTEL* + +// --- ROW BREAK --- +| 6149 +| *ImplementInCSRINTEL* + +Only valid on global variables. + +Controls the interface of this global variable with hardware outside the boundary of the SPIR-V module. + +Legal values of _Value_: + +0 [False] - Access to this memory is through a dedicated interface. + +1 [True] - Access to this memory is through a CSR interface shared with the kernel arguments. + + +| Literal Number + +_Value_ +| *FPGADeviceGlobalPropertiesINTEL* + +|==== +-- + +=== Capability + +Modify Section 3.31, Capability, adding a row to the Capability table: +-- +[options="header"] +|==== +2+^| Capability ^| Implicitly Declares +| 6146 +| FPGADeviceGlobalPropertiesINTEL +| +|==== +-- + +=== Validation Rules + +None. + +== Issues + +None. + +//. Issue. +//+ +//-- +//*RESOLVED*: Resolution. +//-- + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|1|2021-8-9|Artem Radzikhovskyy|*Initial draft* +|======================================== diff --git a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc new file mode 100644 index 0000000000000..92bbf33ef4758 --- /dev/null +++ b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc @@ -0,0 +1,188 @@ += cl_intel_global_variable_pointers + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + +== Name Strings + +`cl_intel_global_variable_pointers` + +== Contact + +Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com) + +== Contributors + +// spell-checker: disable +* Ben Ashbaugh, Intel +* Jaroslaw Chodor, Intel +* Vyacheslav Zakharin, Intel +// spell-checker: enable + +== Notice + +Copyright (c) 2021 Intel Corporation. All rights reserved. + +== Status + +Working Draft + +This is a preview extension specification, intended to provide early access to a feature for review and community feedback. +When the feature matures, this specification may be released as a formal extension. + +Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products. +If you are interested in using this feature in your software product, please let us know! + +== Version + +Built On: {docdate} + +Revision: 0.9.0 + +== Dependencies + +This extension is written against the OpenCL API Specification Version 3.0.6. + +This extension requires support for the `cl_intel_unified_shared_memory` +extension. + +== Overview + +The extension allows programmers to obtain pointers to global variables (aka +program scope variables) of OpenCL Programs and read/write such global variables +using host APIs. + +== New API Functions + +[source] +---- +cl_int clGetDeviceGlobalVariablePointerINTEL( + cl_device_id device, + cl_program program, + const char *global_variable_name, + size_t *global_variable_size_ret, + void **global_variable_pointer_ret ); +---- + +== New API Enums + +None. + +== New OpenCL C Functions + +None. + +== Modifications to the OpenCL API Specification + +Modify Section 5.8.9, *Program Object Queries* :: ++ +-- + +The function +---- +cl_int clGetDeviceGlobalVariablePointerINTEL( + cl_device_id device, + cl_program program, + const char *global_variable_name, + size_t *global_variable_size_ret, + void **global_variable_pointer_ret ); +---- +is used to retrieve a pointer to a user-defined global variable +_global_variable_name_ which exists in built _program_. + +The returned pointer is a Unified Shared Memory pointer to the requested global +variable and is specific to given _device_. + +The lifetime of the returned pointer corresponds to the lifetime of _program_ +object which was queried. Subsequent calls to *clBuildProgram*, +*clCompileProgram* or *clLinkProgram* invalidates the result of +*clGetDeviceGlobalVariablePointerINTEL*. The returned value remains valid +between subsequent calls to *clEnqueueNDRangeKernel* or *clEnqueueTask*. + +The returned pointer can be stored to a global object inside a kernel. The +returned pointer cannot be used in program other than specified as _program_ +argument. + +* _device_ indicates which device should be used to retrieve the global variable pointer. + The Address of a global variable might be different for each device. + +* _program_ indicates the program with the _global_variable_name_ global variable. + +* _global_variable_name_ is the global variable name to query. Must be non-NULL. + +* _global_variable_size_ret_ if non-NULL, can be used to query the size of the requested global variable. + +* _global_variable_pointer_ret_ used to return the result from the function. Must be non-NULL. + +*clGetDeviceGlobalVariablePointerINTEL* returns `CL_SUCCESS` if the function executed +successfully. Otherwise, it returns one of the following errors: + +* `CL_INVALID_VALUE` if _global_variable_pointer_ret_ or _global_variable_name_ is NULL, + +* `CL_INVALID_DEVICE` if _device_ is not in the list of devices associated with _program_. + +* `CL_INVALID_PROGRAM` if _program_ is not a valid program object. + +* `CL_INVALID_PROGRAM_EXECUTABLE` if there is no successfully built executable + for _program_ for the specified _device_. + +* `CL_INVALID_ARG_VALUE` if _global_variable_name_ is not found in _program_ or it is + not possible to obtain address of _global_variable_name_. +-- + +== Modifications to the OpenCL SPIR-V Environment Specification + +*TODO* + +Modify sub-section *Extensions* :: ++ +-- +cl_intel_global_variable_pointers ... +-- + +== Issues + +. Can this extension be used to query the address of any global variable in the program? ++ +-- +*UNRESOLVED* + +It is not possible to obtain address of *any* arbitrary global variable using +`clGetDeviceGlobalVariablePointerINTEL`, and the mechanism to identify that the +global variable may have it's address taken is dependent on how program object +is created: + +* for programs created from source, *TODO* (extern?) + +* for programs created from SPIR-V IL, *TODO* (global variable needs to have *export* linkage?) + +* for programs created from a binary, the mechanism is implementation defined. +-- + +. Can a queried global variable pointer be used by a kernel from another program object? ++ +-- +*UNRESOLVED* + +Currently this is disallowed. +-- + + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|0.9.0|2019-09-14|Jaroslaw Chodor|*Initial revision* +|0.9.0|2021-09-22|Ben Ashbaugh|Added preview disclaimer, general cleanup. +|======================================== From 6c07b1a4d54a052a1691ac95394b53defe0e215d Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 1 Oct 2021 11:22:25 -0400 Subject: [PATCH 05/25] Updates to device global SPIR-V spec Update names and wording to make it clear the the new SPIR-V decorations apply to all consumers, even though they are mostly useful for FPGA. --- sycl/doc/DeviceGlobal.md | 4 +- ...TEL_fpga_device_global_properties.asciidoc | 186 --------------- ...INTEL_global_variable_decorations.asciidoc | 220 ++++++++++++++++++ 3 files changed, 222 insertions(+), 188 deletions(-) delete mode 100644 sycl/doc/extensions/DeviceGlobal/SPV_INTEL_fpga_device_global_properties.asciidoc create mode 100644 sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index e48cca11c7a70..5b418f4f32559 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -574,9 +574,9 @@ compiler works in OpenCL C 2.0 mode. The following three device global properties must be propagated from DPC++ source code, through LLVM IR, and into SPIR-V where they are represented as SPIR-V decorations (defined in the -[SPV\_INTEL\_fpga\_device\_global\_properties][12] extension). +[SPV\_INTEL\_global\_variable\_decorations][12] extension). -[12]: +[12]: * `copy_access` * `init_via` diff --git a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_fpga_device_global_properties.asciidoc b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_fpga_device_global_properties.asciidoc deleted file mode 100644 index 6ad5406b7c8f4..0000000000000 --- a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_fpga_device_global_properties.asciidoc +++ /dev/null @@ -1,186 +0,0 @@ -= SPV_INTEL_fpga_device_global_properties - -:source-highlighter: coderay -:coderay-linenums-mode: table - -// This section needs to be after the document title. -:doctype: book -:toc2: -:toc: left -:encoding: utf-8 -:lang: en - -:blank: pass:[ +] - -// Set the default source code type in this document to C++, -// for syntax highlighting purposes. This is needed because -// docbook uses c++ and html5 uses cpp. -:language: {basebackend@docbook:c++:cpp} - -// This is necessary for asciidoc, but not for asciidoctor -:cpp: C++ - -== Name Strings - -SPV_INTEL_fpga_device_global_properties - -== Contact - -To report problems with this extension, please open a new issue at: - -https://github.com/KhronosGroup/SPIRV-Headers - -== Contributors - -Artem Radzikhovskyy, Intel + -Michael Kinsner, Intel + -Joe Garvey, Intel + -Mohammad Fawaz, Intel - -== Notice - -Copyright (c) 2021 Intel Corporation. All rights reserved. - -== Status - -Final Draft - -== Version - -Built On: {docdate} + -Revision: 1 - -== Dependencies - -This extension is written against the SPIR-V Specification, -Version 1.5 Revision 5. - -This extension requires SPIR-V 1.0. - -== Overview - -This extension adds decorations that influence compiler generation of `device_global` structures on FPGA targets. - -== Extension Name -To use this extension within a SPIR-V module, the following *OpExtension* must be present in the module: - ----- -OpExtension "SPV_INTEL_fpga_device_global_properties" ----- - -== New Capabilities -This extension introduces a new capability: - ----- -FPGADeviceGlobalPropertiesINTEL ----- - -== Token Number Assignments - --- -[width="40%"] -[cols="70%,30%"] -[grid="rows"] -|==== -|FPGADeviceGlobalPropertiesINTEL |6146 -|NoCopyINTEL |6147 -|InitViaINTEL |6148 -|ImplementInCSRINTEL |6149 -|==== --- - -== Modifications to the SPIR-V Specification, Version 1.5 - -=== Decoration - -Modify Section 3.20, Decoration, adding these rows to the Decoration table: - --- -[options="header"] -|==== -2+^| Decoration | Extra Operands | Enabling Capabilities -// --- ROW BREAK --- -| 6147 -| *CopyAccessINTEL* + -Only valid on global variables. - -Controls which memory ports are exposed for use by a generated IP core. - -Legal values of _Access_: + -0 [Read] - Only a read port is exposed. + -1 [Write] - Only a write port is exposed. + -2 [Read/Write] - A read/write port is exposed. + -3 [None] - No port is exposed. + -| Literal Number + -_Access_ -| *FPGADeviceGlobalPropertiesINTEL* - -// --- ROW BREAK --- -| 6148 -| *InitViaINTEL* + -Only valid on global variables. - -Controls the event which triggers memory initialization. - -Legal values of _Trigger_: + -0 [init on device reprogram] - initialization is performed when the device is programmed. + -1 [init on device reset] - initialization is performed when a reset signal is sent to the device. + - -| Literal Number + -_Trigger_ -| *FPGADeviceGlobalPropertiesINTEL* - -// --- ROW BREAK --- -| 6149 -| *ImplementInCSRINTEL* + -Only valid on global variables. - -Controls the interface of this global variable with hardware outside the boundary of the SPIR-V module. - -Legal values of _Value_: + -0 [False] - Access to this memory is through a dedicated interface. + -1 [True] - Access to this memory is through a CSR interface shared with the kernel arguments. + - -| Literal Number + -_Value_ -| *FPGADeviceGlobalPropertiesINTEL* - -|==== --- - -=== Capability - -Modify Section 3.31, Capability, adding a row to the Capability table: --- -[options="header"] -|==== -2+^| Capability ^| Implicitly Declares -| 6146 -| FPGADeviceGlobalPropertiesINTEL -| -|==== --- - -=== Validation Rules - -None. - -== Issues - -None. - -//. Issue. -//+ -//-- -//*RESOLVED*: Resolution. -//-- - -== Revision History - -[cols="5,15,15,70"] -[grid="rows"] -[options="header"] -|======================================== -|Rev|Date|Author|Changes -|1|2021-8-9|Artem Radzikhovskyy|*Initial draft* -|======================================== diff --git a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc new file mode 100644 index 0000000000000..188c9e842a6fe --- /dev/null +++ b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc @@ -0,0 +1,220 @@ += SPV_INTEL_global_variable_decorations + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + +// This is necessary for asciidoc, but not for asciidoctor +:cpp: C++ + +== Name Strings + +SPV_INTEL_global_variable_decorations + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/KhronosGroup/SPIRV-Headers + +== Contributors + +Artem Radzikhovskyy, Intel + +Michael Kinsner, Intel + +Joe Garvey, Intel + +Mohammad Fawaz, Intel + +Gregory Lueck, Intel + +== Notice + +Copyright (c) 2021 Intel Corporation. All rights reserved. + +== Status + +Final Draft + +== Version + +Built On: {docdate} + +Revision: 2 + +== Dependencies + +This extension is written against the SPIR-V Specification, +Version 1.5 Revision 5. + +This extension requires SPIR-V 1.0. + +== Overview + +This extension adds decorations that can be applied to global (module scope) +variables. These decorations are intended to help code generation for FPGA +devices, but they must be recognized by any consumer that supports this +extension. + +== Extension Name +To use this extension within a SPIR-V module, the following *OpExtension* must +be present in the module: + +---- +OpExtension "SPV_INTEL_global_variable_decorations" +---- + +== New Capabilities +This extension introduces a new capability: + +---- +GlobalVariableDecorationsINTEL +---- + +== Token Number Assignments + +-- +[width="40%"] +[cols="70%,30%"] +[grid="rows"] +|==== +|GlobalVariableDecorationsINTEL | 6146 +|HostAccessINTEL | 6147 +|InitModeINTEL | 6148 +|ImplementInCSRINTEL | 6149 +|==== +-- + +== Modifications to the SPIR-V Specification, Version 1.5 + +=== Decoration + +Modify Section 3.20, Decoration, adding these rows to the Decoration table: + +-- +[options="header"] +|==== +2+^| Decoration | Extra Operands | Enabling Capabilities + +// --- ROW BREAK --- +| 6147 +a| +*HostAccessINTEL* + +Only valid on global (module scope) *OpVariable*. + +The client API's execution environment may provide a way to access a global +variable's value from the host system. If it does, this decoration provides +an assertion by the producer about the types of these accesses. This may allow +the consumer to perform certain optimizations, especially for FPGA devices. + +Legal values of _Access_: + +* 0 [Read] - The execution environment may read the variable from the host but + will never write it. On an FPGA device, only a read memory port is exposed + by the generated IP core. +* 1 [Write] - The execution environment may write the variable from the host + but will never read it. On an FPGA device, only a write memory port is + exposed by the generated IP core. +* 2 [Read/Write] - The execution environment may read or write the variable + from the host. On an FPGA device, a read/write memory port is exposed by the + generated IP core. +* 3 [None] - The execution environment may neither read nor write the variable + from the host. On an FPGA device, no memory port is exposed by the generated + IP core. + +If a global *OpVariable* is not decorated with *HostAccessINTEL*, the default +behavior is [Read/Write]. +| Literal Number + +_Access_ +| *GlobalVariableDecorationsINTEL* + +// --- ROW BREAK --- +| 6148 +a| +*InitModeINTEL* + +Only valid on global (module scope) *OpVariable*. + +This decoration only has an effect when the consumer is an FPGA or similar +device. The _Trigger_ value tells how the global variable should be +initialized. + +Legal values of _Trigger_: + +* 0 [init on device reprogram] - Initialization is performed when the device is + programmed. +* 1 [init on device reset] - Initialization is performed when a reset signal is + sent to the device. + +| Literal Number + +_Trigger_ +| *GlobalVariableDecorationsINTEL* + +// --- ROW BREAK --- +| 6149 +a| +*ImplementInCSRINTEL* + +Only valid on global (module scope) *OpVariable*. + +This decoration only has an effect when the consumer is an FPGA or similar +device. The _Value_ value controls the interface of this global variable with +hardware outside the boundary of the SPIR-V module. + +Legal values of _Value_: + +* 0 [False] - Access to this memory is through a dedicated interface. +* 1 [True] - Access to this memory is through a CSR interface shared with the + kernel arguments. + +| Literal Number + +_Value_ +| *GlobalVariableDecorationsINTEL* + +|==== +-- + +=== Capability + +Modify Section 3.31, Capability, adding a row to the Capability table: +-- +[options="header"] +|==== +2+^| Capability ^| Implicitly Declares +| 6146 +| GlobalVariableDecorationsINTEL +| +|==== +-- + +=== Validation Rules + +None. + +== Issues + +None. + +//. Issue. +//+ +//-- +//*RESOLVED*: Resolution. +//-- + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|1|2021-8-9|Artem Radzikhovskyy|*Initial draft* +|2|2021-10-1|Gregory Lueck|*Clarify wording and change names* +|======================================== From e54e376271b8855ebc868a02ceb43c34cc615771 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 1 Oct 2021 13:25:24 -0400 Subject: [PATCH 06/25] Add a few more tweaks Add some final clarifications to the design doc before creating PR. Rename properties to align with latest review comments against API spec. --- sycl/doc/DeviceGlobal.md | 77 +++++++++++++++++++++------------------- 1 file changed, 40 insertions(+), 37 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 5b418f4f32559..adff8a13a541b 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -56,7 +56,7 @@ runtime. As we will see later, this has a ramification on the integration headers and on the mechanism that connects instances of device global variables in host code with their corresponding instances in device code. -Another issue relates to the `device_image_life` property which can be applied +Another issue relates to the `device_image_scope` property which can be applied to a device global variable declaration. The intent of this property is to allow a device global variable to be implemented directly on top of a SPIR-V module scope global variable. When this property is **not** present, an @@ -66,7 +66,7 @@ about the scope of a variable because the user need not understand which device image contains each kernel. However, this semantic makes the implementation less efficient, especially on FPGA targets. -By contrast, the `device_image_life` property changes the semantic of a device +By contrast, the `device_image_scope` property changes the semantic of a device global variable such that the user must understand which device image contains each kernel, which is difficult to reason about. For example, changing the value of a specialization constant may cause a kernel to be recompiled into a @@ -75,11 +75,11 @@ referenced in a kernel may actually have several disjoint instances if the kernel uses specialization constants. This problem is more tractable on FPGA targets because specialization constants are not implemented via separate device images on those targets, however, there are other factors that FPGA -users need to be aware of when using the `device_image_life` property. These +users need to be aware of when using the `device_image_scope` property. These are documented more throughly in the extension specification. The important impact on the design, though, is that device global variables -declared with the `device_image_life` property have an implementation that is +declared with the `device_image_scope` property have an implementation that is quite different from device global variables that are not declared with this property. The sections below describe both implementations. @@ -91,7 +91,7 @@ property. The sections below describe both implementations. The headers, of course, include the declaration of the new `device_global` class, which is described in the [extension specification][1]. The declaration of this class uses partial specialization to define the class differently -depending on whether is has the `device_image_life` property. When the +depending on whether is has the `device_image_scope` property. When the property is not present, the class has a member variable which is a pointer to the underlying type. Member functions which return a reference to the value (e.g. `get`) return the value of this pointer: @@ -127,7 +127,7 @@ of this member variable. The headers are also updated to add the new `copy()` and `memcpy()` member functions to `handler` and `queue` which copy data to or from a device global variable. These declarations use SFINAE such that they are conditionally -available depending on the `copy_access` property. +available depending on the `host_access` property. ### New LLVM IR attributes @@ -144,7 +144,7 @@ This is not possible, though, for variables with internal linkage because the mangled name is not unique in this case. For these variables, we use the mangled name and append a unique suffix. -Each device global variable that has the `device_image_life` property is also +Each device global variable that has the `device_image_scope` property is also decorated with the `sycl-device-global-image-life` attribute. Note that language rules ensure that `device_global` variables are always @@ -209,7 +209,7 @@ global variable that is defined in the translation unit: * The variable's string from the `sycl-unique-id` attribute. * The size (in bytes) of the underlying `T` type for the variable. * A boolean telling whether the variable is decorated with the - `device_image_life` property. + `device_image_scope` property. ``` namespace sycl::detail { @@ -219,11 +219,11 @@ __sycl_device_global_registration::__sycl_device_global_registration() noexcept device_global_map::add(&::Foo, /* mangled name of '::Foo' with unique suffix appended */, /* size of underlying 'T' type */, - /* bool telling whether variable has 'device_image_life` property */); + /* bool telling whether variable has 'device_image_scope` property */); device_global_map::add(&::inner::Bar, /* mangled name of '::inner::Bar' */, /* size of underlying 'T' type */, - /* bool telling whether variable has 'device_image_life` property */); + /* bool telling whether variable has 'device_image_scope` property */); } } // namepsace (unnamed) @@ -287,11 +287,11 @@ __sycl_device_global_registration::__sycl_device_global_registration() noexcept device_global_map::add(&::FuBar, /* mangled name of '::FuBar' */, /* size of underlying 'T' type */, - /* bool telling whether variable has 'device_image_life` property */); + /* bool telling whether variable has 'device_image_scope` property */); device_global_map::add(::__sycl_UNIQUE_STRING, /* mangled name of '::(unnamed)::FuBar' with unique suffix appended */, /* size of underlying 'T' type */, - /* bool telling whether variable has 'device_image_life` property */); + /* bool telling whether variable has 'device_image_scope` property */); } } // namepsace (unnamed) @@ -319,7 +319,7 @@ global variable decorated with `sycl-device-global-image-life` appears in more than one module, the `sycl-post-link` tool issues an error diagnostic: ``` -error: device_global variable with property "device_image_life" +error: device_global variable with property "device_image_scope" is contained in more than one device image. ``` @@ -361,12 +361,13 @@ strings, where each string ends with a null character (`\0`). Several changes are needed to the DPC++ runtime * As noted in the requirements section, an instance of a device global variable - that does not have the `device_image_life` property is shared by all device + that does not have the `device_image_scope` property is shared by all device images on a device. To satisfy this requirement, the device global variable contains a pointer to a buffer allocated from USM device memory, and the - content of the variable is stored in this buffer. All device images point to - the same buffer, so the variable's state is shared. The runtime, therefore, - must allocate this USM buffer for each such device global variable. + content of the variable is stored in this buffer. All device images on a + particular device point to the same buffer, so the variable's state is + shared. The runtime, therefore, must allocate this USM buffer for each such + device global variable. * As we noted above, the front-end generates new content in the integration footer which calls the function `sycl::detail::device_global_map::add()`. @@ -377,9 +378,9 @@ Several changes are needed to the DPC++ runtime - The string which uniquely identifies the variable. - The size (in bytes) of the underlying `T` type for the variable. - A boolean telling whether the variable is decorated with the - `device_image_life` property. + `device_image_scope` property. - The associated per-device USM buffer pointer, if this variable does not - have the `device_image_life` property. + have the `device_image_scope` property. We refer to this information as the "device global database" below. @@ -404,13 +405,14 @@ runtime does the following: the `pi_program` to get the unique string associated with each device global variable that is used by the `pi_program`. For each of these strings, the runtime uses the device global database to see if the variable was decorated - with `device_image_life`. If it was not so decorated and if a USM buffer has - not already been created for the variable on this target device, the runtime - allocates the buffer from USM device memory using the size from the database. - The pointer to this buffer is saved in the database for future reuse. + with `device_image_scope`. If it was not so decorated and if a USM buffer + has not already been created for the variable on this target device, the + runtime allocates the buffer from USM device memory using the size from the + database and zero-initializes the content of the buffer. The pointer to this + buffer is saved in the database for future reuse. * For each device global variable that is not decorated with - `device_image_life`, the runtime initializes the `usmptr` member in the + `device_image_scope`, the runtime initializes the `usmptr` member in the *device instance* of the variable by using a backend-specific function which copies data from the host to a device variable. It is a simple matter to use this function to overwrite the `usmptr` member with the address of the USM @@ -421,22 +423,23 @@ runtime does the following: Each of these functions accepts a (host) pointer to a device global variable as one of its parameters, and the runtime uses this pointer to find the associated information for this variable in the device global database. The remaining -behavior depends on whether the variable is decorated with `device_image_life`. +behavior depends on whether the variable is decorated with +`device_image_scope`. If the variable is not decorated with this property, the runtime uses the database to determine if a USM buffer has been allocated yet for this variable on this device. If not, the runtime allocates the buffer using the size from -the database. Regardless, the runtime implements the `copy` / `memcpy` by -copying to or from this USM buffer, using the normal mechanism for copying -to / from a USM pointer. +the database and zero-initializes the buffer. Regardless, the runtime +implements the `copy` / `memcpy` by copying to or from this USM buffer, using +the normal mechanism for copying to / from a USM pointer. The runtime avoids the future cost of looking up the variable in the database by caching the USM pointer in the host instance of the variable's `usmptr` member. -If the variable is decorated with the `device_image_life` property, the runtime -gets the unique string identifier for the variable from the database and uses -a backend-specific function to copy to or from the variable with that +If the variable is decorated with the `device_image_scope` property, the +runtime gets the unique string identifier for the variable from the database +and uses a backend-specific function to copy to or from the variable with that identifier. Again, the details of this function are described below. In all cases, the runtime diagnoses invalid calls that write beyond the device @@ -578,8 +581,8 @@ SPIR-V decorations (defined in the [12]: -* `copy_access` -* `init_via` +* `host_access` +* `init_mode` * `implement_in_csr` It's not clear how this should work. One of the goals of the new property @@ -613,11 +616,11 @@ Currently, we use the variable's mangled name, but this could be changed. An alternative solution would be to augment the SPIR-V with some new decoration that gives a unique name to each `OpVariable` that needs to be accessed from the host. We could then use that name with the backend functions, and avoid -renaming variables with internal linkage. This would be more effort, though, -because we would need a new SPIR-V extension, and we would need to change the -implementation of the Level Zero and OpenCL backends. +renaming variables with internal linkage. This would be only a minor change to +the [SPV\_INTEL\_global\_variable\_decorations][12] extension, but it would +also require changes in the Level Zero and OpenCL backends. -### Does compiler need to be deterministic? +### Does the compiler need to be deterministic? The compiler is normally deterministic. If you compile the exact same source file twice specifying the same command line options each time, you get exactly From eda8c3ebab95bf348bf4781466656d2ca8d2c275 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 4 Oct 2021 11:37:30 -0400 Subject: [PATCH 07/25] Point to PR for device global var restrictions Update the unresolved issue about diagnosing invalid `device_global` declarations to reference the PR against the API spec that describes exactly what these restrictions are. --- sycl/doc/DeviceGlobal.md | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index adff8a13a541b..604b579c57215 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -515,16 +515,12 @@ declaration. ### Need to diagnose invalid declarations of `device_global` variables The device global extension specification places restrictions on where a -`device_global` variable can be declared. These restrictions are similar to -ones we have already for variables of type `specialization_id`: +`device_global` variable can be declared aas clarified in [this PR][11] against +the extension specification API. -* A `device_global` variable can be declared at namespace scope. -* A `device_global` variable can be declared as a static member variable in - class scope, but only if the declaration has public visibility from namespace - scope. -* No other declarations are allowed for a variable of type `device_global`. +[11]: -The device compiler front-end needs to emit a diagnostic if a `device_global` +The device compiler front-end should emit a diagnostic if a `device_global` variable is declared in a way that violates these restrictions. We do not have agreement yet, though, on how this should be done. For example, should the front-end recognize these variable declarations by the name of their type, or @@ -539,13 +535,13 @@ Unless we decorate these variables in some special way, the current behavior of the `llvm-spirv` tool is to generate these variables in the private address space, even though they are declared at module scope. -The [existing OpenCL attribute][11] `[[clang::opencl_global]]` is almost what +The [existing OpenCL attribute][12] `[[clang::opencl_global]]` is almost what we need, but again this attribute can only be applied to a variable declaration. Instead, we want some attribute that can be applied to the type declaration of `class device_global`. We could invent some new attribute with this semantic, but there is another problem. -[11]: +[12]: Applying `[[clang::opencl_global]]` to a variable of class type currently raises an error message saying there is no candidate "global" constructor for @@ -577,9 +573,9 @@ compiler works in OpenCL C 2.0 mode. The following three device global properties must be propagated from DPC++ source code, through LLVM IR, and into SPIR-V where they are represented as SPIR-V decorations (defined in the -[SPV\_INTEL\_global\_variable\_decorations][12] extension). +[SPV\_INTEL\_global\_variable\_decorations][13] extension). -[12]: +[13]: * `host_access` * `init_mode` @@ -617,7 +613,7 @@ An alternative solution would be to augment the SPIR-V with some new decoration that gives a unique name to each `OpVariable` that needs to be accessed from the host. We could then use that name with the backend functions, and avoid renaming variables with internal linkage. This would be only a minor change to -the [SPV\_INTEL\_global\_variable\_decorations][12] extension, but it would +the [SPV\_INTEL\_global\_variable\_decorations][13] extension, but it would also require changes in the Level Zero and OpenCL backends. ### Does the compiler need to be deterministic? From a0c4f635703c7c78be47f392ab6d967ba09facd4 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 19 Oct 2021 08:01:52 -0400 Subject: [PATCH 08/25] Update SPV and OCL specs w.r.t. variable name We decided that we need some alternate way for the host to reference a global variable in the kernel other than using the exported SPIR-V name. Extend the `HostAccessINTEL` decoration to take a new `Name` parameter. This allows the host to access even a non-exported `OpVariable`. Changes are still needed to the "DeviceGlobal.md" spec to explain how this new field will be set by the compiler. Those changes will be made in a separate commit. --- ...INTEL_global_variable_decorations.asciidoc | 18 +++++---- ...cl_intel_global_variable_pointers.asciidoc | 37 ++++++++++++------- 2 files changed, 35 insertions(+), 20 deletions(-) diff --git a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc index 188c9e842a6fe..ce52704cdee00 100644 --- a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc @@ -61,8 +61,8 @@ This extension requires SPIR-V 1.0. == Overview This extension adds decorations that can be applied to global (module scope) -variables. These decorations are intended to help code generation for FPGA -devices, but they must be recognized by any consumer that supports this +variables. These decorations are mostly intended to help code generation for +FPGA devices, but they must be recognized by any consumer that supports this extension. == Extension Name @@ -103,7 +103,7 @@ Modify Section 3.20, Decoration, adding these rows to the Decoration table: -- [options="header"] |==== -2+^| Decoration | Extra Operands | Enabling Capabilities +2+^| Decoration 2+^| Extra Operands | Enabling Capabilities // --- ROW BREAK --- | 6147 @@ -113,8 +113,10 @@ Only valid on global (module scope) *OpVariable*. The client API's execution environment may provide a way to access a global variable's value from the host system. If it does, this decoration provides -an assertion by the producer about the types of these accesses. This may allow -the consumer to perform certain optimizations, especially for FPGA devices. +two pieces of information. _Access_ is an assertion by the producer about the +types of these accesses, which may allow the consumer to perform certain +optimizations, especially for FPGA devices. _Name_ is a name which the client +API's execution environment may use to identify this variable. Legal values of _Access_: @@ -135,6 +137,8 @@ If a global *OpVariable* is not decorated with *HostAccessINTEL*, the default behavior is [Read/Write]. | Literal Number + _Access_ +| Literal String + +_Name_ | *GlobalVariableDecorationsINTEL* // --- ROW BREAK --- @@ -154,7 +158,7 @@ Legal values of _Trigger_: * 1 [init on device reset] - Initialization is performed when a reset signal is sent to the device. -| Literal Number + +2+| Literal Number + _Trigger_ | *GlobalVariableDecorationsINTEL* @@ -174,7 +178,7 @@ Legal values of _Value_: * 1 [True] - Access to this memory is through a CSR interface shared with the kernel arguments. -| Literal Number + +2+| Literal Number + _Value_ | *GlobalVariableDecorationsINTEL* diff --git a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc index 92bbf33ef4758..a91b5ec26bd0b 100644 --- a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc @@ -96,7 +96,26 @@ cl_int clGetDeviceGlobalVariablePointerINTEL( void **global_variable_pointer_ret ); ---- is used to retrieve a pointer to a user-defined global variable -_global_variable_name_ which exists in built _program_. +_global_variable_name_ which exists in built _program_. The interpretation of +_global_variable_name_ depends on how the _program_ was created: + +* If the _program_ was created with *clCreateProgramWithSource* from OpenCL C + sources, *TODO* (see Issues section below). + +* If the _program_ was created with *clCreateProgramWithIL* from SPIR-V, there + are two cases: + + - If the SPIR-V module declares the *GlobalVariableDecorationsINTEL* + capability, the implementation looks first for an *OpVariable* that is + decorated with *HostAccessINTEL* where the _Name_ operand is the same as + _global_variable_name_. + + - The implementation next looks for an *OpVariable* that is decorated with + *LinkageAttributes* where the _Linkage Type_ is *Export* and the _Name_ + operand is the same as _global_variable_name_. + +* If the _program_ was created in any other way, the interpretation of + _global_variable_name_ is implementation-defined. The returned pointer is a Unified Shared Memory pointer to the requested global variable and is specific to given _device_. @@ -150,21 +169,12 @@ cl_intel_global_variable_pointers ... == Issues -. Can this extension be used to query the address of any global variable in the program? +. Exactly what is the interpretation of _global_variable_name_ for a _program_ + that was created from OpenCL C sources? Must the variable be declared + "extern"? Are "constant" variables allowed? + -- *UNRESOLVED* - -It is not possible to obtain address of *any* arbitrary global variable using -`clGetDeviceGlobalVariablePointerINTEL`, and the mechanism to identify that the -global variable may have it's address taken is dependent on how program object -is created: - -* for programs created from source, *TODO* (extern?) - -* for programs created from SPIR-V IL, *TODO* (global variable needs to have *export* linkage?) - -* for programs created from a binary, the mechanism is implementation defined. -- . Can a queried global variable pointer be used by a kernel from another program object? @@ -185,4 +195,5 @@ Currently this is disallowed. |Rev|Date|Author|Changes |0.9.0|2019-09-14|Jaroslaw Chodor|*Initial revision* |0.9.0|2021-09-22|Ben Ashbaugh|Added preview disclaimer, general cleanup. +|0.9.0|2021-10-18|Greg Lueck|Clarify variable name for SPIR-V case. |======================================== From da1124837b604f6941e0807ece7cd2292e66f136 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 19 Oct 2021 10:02:18 -0400 Subject: [PATCH 09/25] Describe handling of properties / address comments Resolve the remaining open issues and address review comments: * Describe how compile-time properties are handled, using the new compile-time property design document as a basis. * Address remaining open issues, mostly with the front-end. * Address review comments that have been made so far. --- sycl/doc/DeviceGlobal.md | 511 +++++++++++++++++---------------------- 1 file changed, 221 insertions(+), 290 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 604b579c57215..6b68089880cd4 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -4,7 +4,7 @@ This document describes the implementation design for the DPC++ extension [SYCL\_EXT\_ONEAPI\_DEVICE\_GLOBAL][1], which allows applications to declare global variables in device code. -[1]: +[1]: ## Requirements @@ -88,16 +88,18 @@ property. The sections below describe both implementations. ### Changes to DPC++ headers +#### Class specializations based on `device_image_scope` + The headers, of course, include the declaration of the new `device_global` class, which is described in the [extension specification][1]. The declaration of this class uses partial specialization to define the class differently -depending on whether is has the `device_image_scope` property. When the +depending on whether it has the `device_image_scope` property. When the property is not present, the class has a member variable which is a pointer to the underlying type. Member functions which return a reference to the value (e.g. `get`) return the value of this pointer: ``` -template +template class device_global { T *usmptr; public: @@ -110,60 +112,140 @@ However, when the property is present, it has a member variable which is the type itself, and member functions return a reference to this value. ``` -template +template class device_global { T val; public: + device_global() : val() {} T& get() noexcept { return val; } /* other member functions */ }; ``` +Note that the default constructor "value initializes" the `val` member +variable. Since the type `T` is limited to types that are trivially +constructible, this means that `val` will be zero initialized. + In both cases the member variable (either `usmptr` or `val`) must be the first member variable in the class. As we will see later, the runtime assumes that the address of the `device_global` variable itself is the same as the address of this member variable. +#### Attributes attached to the class + +The `device_global` class declaration contains three C++ attributes which +convey information to the front-end. These attributes are only needed for the +device compiler, and the `#ifdef __SYCL_DEVICE_ONLY__` allows the customer to +use another host compiler, even if it does not recognize these attributes. +Also note that these attributes are all in the `__sycl_detail__` namespace, so +they are considered implementation details of DPC++. We do not intend to +support them as general attributes that customer code can use. + +``` +template > +class device_global {/*...*/}; + +// Partial specialization to make PropertyListT visible as a parameter pack +// of properties. +template +class +#ifdef __SYCL_DEVICE_ONLY__ + [[__sycl_detail__::add_ir_global_variable_attributes( + "sycl-device-global-size", + Props::meta_name..., + sizeof(T), + Props::meta_value... + )]] + [[__sycl_detail__::global_variable_allowed]] + [[__sycl_detail__::restrictions(device_global)]] +#endif + device_global> {/*...*/}; +``` + +The `[[__sycl_detail__::add_ir_global_variable_attributes()]]` attribute is +used to convey information about the compile-time properties to the front-end, +and it is described more fully by the [compile-time properties][2] design +document. This attribute is also used for other classes that have properties, +so it is not specific to the `device_global` class. + +[2]: + +Note that the parameter list to +`[[__sycl_detail__::add_ir_global_variable_attributes()]]` contains one +additional property named `"sycl-device-global-size"`. The `sycl-post-link` +tool uses this property to distinguish device global variables from other +module scope variables, and the property tells the size of the underlying data +type of the device global variable. + +The `[[__sycl_detail__::global_variable_allowed]]` attribute informs the +front-end that global variables of this type are allowed to be referenced in +device code. By default, the front-end diagnoses an error if device code +references a global variable unless the variable is `constexpr` or `const` and +constant initialized. However, the presence of this attribute informs the +front-end that variables of this type are an exception to this rule, so the +front-end does not diagnose an error when device code references a +`device_global` variable. This attribute could also be used by other types, +so it is also not specific to the `device_global` class. + +**NOTE**: The implementation of [device-side asserts][3] recently introduced a +new C++ attribute [`sycl_global_var`][4] for a similar purpose. The design for +device global variables cannot use that attribute because `sycl_global_var` is +intended to be specified on the variable definition (not the type declaration), +and we do not want to force users to add an attribute to each definition of a +`device_global` variable. However, the implementation of device-side asserts +could be changed to use `[[__sycl_detail__::global_variable_allowed]]`. We +could then remove the support for `sycl_global_var`. + +[3]: +[4]: + +The last attribute `[[__sycl_detail__::restrictions(device_global)]]` controls +error reporting for variables declared of this type. The device global +extension specification places restrictions on where a `device_global` variable +can be declared as clarified in [this PR][5] against the extension +specification API. Rather than have the front-end recognize the name of the +`device_global` type, the front-end uses this attribute to know which +restrictions to enforce for this type. + +[5]: + +**NOTE**: The front-end does currently recognize the `specialization_id` class +by its name, and it has hard-coded knowledge that variables declared with this +type have certain restrictions. If we wanted to avoid having the front-end +recognize the `specialization_id` class by its name, we could decorate that +class with `[[__sycl_detail__::restrictions(spec_id)]]`. Note that we would +need a different parameter name because the restrictions for +`specialization_id` are different from those for `device_global`. + +#### Declaration of member functions to copy device global variables + The headers are also updated to add the new `copy()` and `memcpy()` member functions to `handler` and `queue` which copy data to or from a device global variable. These declarations use SFINAE such that they are conditionally available depending on the `host_access` property. -### New LLVM IR attributes - -Two new attributes are added to communicate information about device global -variable to the `sycl-post-link` tool: `sycl-unique-id` and -`sycl-device-global-image-life`. As described below, the device compiler -front-end is responsible for adding the attributes to the LLVM IR. - -Each device global variable is decorated with `sycl-unique-id`, which provides -a unique string identifier for each device global variable. This string will -also be used to name the variable in SPIR-V, so it's better for debuggability -if the string matches the mangled name for variables with external linkage. -This is not possible, though, for variables with internal linkage because the -mangled name is not unique in this case. For these variables, we use the -mangled name and append a unique suffix. - -Each device global variable that has the `device_image_scope` property is also -decorated with the `sycl-device-global-image-life` attribute. +### Changes to the DPC++ front-end -Note that language rules ensure that `device_global` variables are always -declared at namespace scope (i.e. a global variable), and LLVM IR [allows -attributes to be attached to global variables][2]. +There are several changes to the device compiler front-end: -[2]: +* The front-end adds a new LLVM IR attribute `sycl-unique-id` to the definition + of each `device_global` variable, which provides a unique string identifier + for each device global variable. The rules for creating this string are the + same as `__builtin_sycl_unique_stable_id`, so the front-end can use the same + algorithm when generating the string. -### Changes to the DPC++ front-end +* The front-end checks for restrictions on variable declarations using the + `device_global` type. As described above, the front-end uses the + `[[__sycl_detail__::restrictions()]]` attribute (rather than the class name) + to know which set of restrictions to check. The restrictions specific to + device global variables are documented in [this PR][5]. -The device compiler front-end is changed in two ways: it generates new content -in both the integration header and the integration footer, and it adds the -`sycl-unique-id` and `sycl-device-global-image-life` attributes to the IR -definitions of `device_global` variables as defined above. These two tasks are -related because the integration footer contains the same string that is stored -in the `sycl-unique-id` attribute. +* The front-end *avoids* diagnosing an error when variables of type + `device_global` are referenced in device code because the type is decorated + with the `[[__sycl_detail__::global_variable_allowed]]` attribute. -**NOTE**: See also the "Unresolved issues" section at the bottom of this -document for other changes that are needed in the front-end. +* The front-end generates new content in both the integration header and the + integration footer, which is described in more detail below. #### New content in the integration header and footer @@ -183,9 +265,9 @@ namespace inner { // ... ``` -The corresponding integration header defines a namespace scope variable of -class type whose sole purpose is to run its constructor before the -application's `main()` function: +The corresponding integration header defines a namespace scope variable of type +`__sycl_device_global_registration` whose sole purpose is to run its +constructor before the application's main() function: ``` namespace sycl::detail { @@ -207,9 +289,6 @@ global variable that is defined in the translation unit: * The (host) address of the variable. * The variable's string from the `sycl-unique-id` attribute. -* The size (in bytes) of the underlying `T` type for the variable. -* A boolean telling whether the variable is decorated with the - `device_image_scope` property. ``` namespace sycl::detail { @@ -217,13 +296,9 @@ namespace { __sycl_device_global_registration::__sycl_device_global_registration() noexcept { device_global_map::add(&::Foo, - /* mangled name of '::Foo' with unique suffix appended */, - /* size of underlying 'T' type */, - /* bool telling whether variable has 'device_image_scope` property */); + /* same string returned from __builtin_sycl_unique_stable_id(::Foo) */); device_global_map::add(&::inner::Bar, - /* mangled name of '::inner::Bar' */, - /* size of underlying 'T' type */, - /* bool telling whether variable has 'device_image_scope` property */); + /* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */); } } // namepsace (unnamed) @@ -266,7 +341,7 @@ namespace { In this example, the `FuBar` variable in the global namespace shadows a variable with the same name in the unnamed namespace. The integration footer -could reference the variable in the global namespace as `::FuBar`, but there is +can reference the variable in the global namespace as `::FuBar`, but there is no way to reference the variable in the unnamed namespace using fully qualified lookup. @@ -285,13 +360,9 @@ namespace { __sycl_device_global_registration::__sycl_device_global_registration() noexcept { device_global_map::add(&::FuBar, - /* mangled name of '::FuBar' */, - /* size of underlying 'T' type */, - /* bool telling whether variable has 'device_image_scope` property */); + /* same string returned from __builtin_sycl_unique_stable_id(::FuBar) */); device_global_map::add(::__sycl_UNIQUE_STRING, - /* mangled name of '::(unnamed)::FuBar' with unique suffix appended */, - /* size of underlying 'T' type */, - /* bool telling whether variable has 'device_image_scope` property */); + /* same string returned from __builtin_sycl_unique_stable_id(::(unnamed)::FuBar) */); } } // namepsace (unnamed) @@ -304,18 +375,20 @@ unqualified name lookup. Furthermore, the name of the temporary variable (`__sycl_UNIQUE_STRING`) is globally unique, so it is guaranteed not to be shadowed by any other name in the translation unit. This problem with variable shadowing is also a problem for the integration footer we use for -specialization constants. See the [specialization constant design document][3] +specialization constants. See the [specialization constant design document][6] for more details on this topic. -[3]: +[6]: ### Changes to the `sycl-post-link` tool The `sycl-post-link` tool performs its normal algorithm to identify the set of kernels and device functions that are bundled together into each module. Once it identifies the functions in each module, it scans those functions looking -for references to global variables of type `device_global`. If any device -global variable decorated with `sycl-device-global-image-life` appears in more +for references to global variables that are decorated with the LLVM IR +attribute `"sycl-device-global-size"` (these are the variables of type +`device_global`). If any device global variable decorated with the LLVM IR +attribute corresponding to the `device_image_scope` property appears in more than one module, the `sycl-post-link` tool issues an error diagnostic: ``` @@ -327,34 +400,47 @@ Assuming that no error diagnostic is issued, the `sycl-post-link` tool includes the IR definition of each `device_global` variable in the modules that reference that variable. -The [backend functions described below][4] that allow the host to copy to or -from a device global require the variable to have "export" linkage in SPIR-V. -Therefore, the `sycl-post-link` tool needs to make the following IR -transformations for any `device_global` variable that has internal linkage: +As described in the design for [compile-time properties][2], the +`sycl-post-link` tool is responsible for generating idiomatic LLVM IR for any +compile-time properties that need to be generated in SPIR-V. The +`sycl-post-link` tool does this for the following properties on each device +global variable: -[4]: <#back-end-specific-function-to-copy-to--from-a-device-symbol> +* `host_access` +* `device_image_scope` +* `implement_in_csr` -* The linkage is changed to be external. -* The name of the variable is changed to be the string from the - `sycl-unique-id` attribute. +The `host_access` property is handled specially because the SPIR-V decoration +requires two "extra operands", but the SYCL property has only one operand. The +second SPIR-V operand is the "name" of the variable, and the `sycl-post-link` +tool passes the value from `sycl-unique-id` for this name. -**NOTE**: It seems likely that changing the name of internal linkage variables -will be bad for debuggability of the code. The user may attempt to print the -value of a variable in the debugger, but the debugger won't know the variable -by that name. See the "Unresolved issues" section below for more discussion -on this. +The `sycl-post-link` tool also create a "SYCL/device globals" property set for +each device code module that contains at least one device global variable. -The `sycl-post-link` tool also adds the new "device-globals" property to the -"SYCL/misc properties" set, as described below. +### New "SYCL/device globals" property set -### New property in "SYCL/misc properties" +Each device code module that references one or more device global variables +has an associated "SYCL/device globals" property set. The name of each +property in this set is the `sycl-unique-id` string of a `device_global` +variable that is contained by the module. The value of each property has +property type `PI_PROPERTY_TYPE_BYTE_ARRAY` and contains a structure with the +following fields: -If a device code module has one or more device global variables, a new property -is added to the "SYCL/misc properties" set named "device-globals". The value -of this property has property type `PI_PROPERTY_TYPE_BYTE_ARRAY` and contains -the `sycl-unique-id` strings for each device global variable that the module -contains. The value of the property is the concatenation of all these -strings, where each string ends with a null character (`\0`). +``` +struct { + uint32_t size; + uint8_t device_image_scope; +}; +``` + +The `size` field contains the size (in bytes) of the underlying type `T` of the +device global variable. The `sycl-post-link` tool gets this value from the +LLVM IR attribute `"sycl-device-global-size"`. + +The `device_image_scope` field is either `1` (true) or `0` (false), telling +whether the device global variable was declared with the `device_image_scope` +property. ### Changes to the DPC++ runtime @@ -400,56 +486,76 @@ code modules that need to be online-linked together in order to construct the After creating a `pi_program` and before invoking any kernel it contains, the runtime does the following: -* Scan the strings in the "device-globals" properties of the - "SYCL/misc properties" sets of each device code module that contributes to - the `pi_program` to get the unique string associated with each device global - variable that is used by the `pi_program`. For each of these strings, the - runtime uses the device global database to see if the variable was decorated - with `device_image_scope`. If it was not so decorated and if a USM buffer - has not already been created for the variable on this target device, the - runtime allocates the buffer from USM device memory using the size from the - database and zero-initializes the content of the buffer. The pointer to this - buffer is saved in the database for future reuse. - -* For each device global variable that is not decorated with - `device_image_scope`, the runtime initializes the `usmptr` member in the - *device instance* of the variable by using a backend-specific function which - copies data from the host to a device variable. It is a simple matter to use - this function to overwrite the `usmptr` member with the address of the USM - buffer. The details of this device-specific function are described below. +* Scan the entries of the "SYCL/device globals" property sets of each device + code module that contributes to the the `pi_program` to get information about + each device global variable that is used by the `pi_program`. This + information is added to device global database. + +* For each device global variable that is not decorated with the + `device_image_scope` property: + + - Check if a USM buffer has already been created for the variable on this + target device. If not, the runtime allocates the buffer from USM device + memory using the size from the database and zero-initializes the content + of the buffer. The pointer to this buffer is saved in the database for + future reuse. + + - Regardless of whether the USM buffer has already been created for the + variable, the runtime initializes the `usmptr` member in the *device + instance* of the variable by using a backend-specific function which + copies data from the host to a device variable. It is a simple matter to + use this function to overwrite the `usmptr` member with the address of + the USM buffer. The details of this device-specific function are + described below. + +Note that the runtime does not need to initialize the `val` member variable of +device global variables that are decorated with `device_image_scope` because +the constructor already guarantees that this member variable is zero +initialized. #### Implementing the `copy` and `memcpy` functions in `queue` and `handler` Each of these functions accepts a (host) pointer to a device global variable as one of its parameters, and the runtime uses this pointer to find the associated -information for this variable in the device global database. The remaining -behavior depends on whether the variable is decorated with -`device_image_scope`. +information for this variable in the device global database. In the common +case where a kernel using this device global has already been submitted to the +target device, the database will contain all of the information for this device +global variable. + +However, in the case when the application has not previously submitted a kernel +that uses this device global, the database will contain only the address of the +variable and its unique string. In this case, the runtime must scan all +"SYCL/device globals" property sets in the application searching for an entry +with that same unique string. The runtime can then add the remaining +information about the device global variable to the database. + +The remaining behavior depends on whether the variable is decorated with the +`device_image_scope` property. If the variable is not decorated with this property, the runtime uses the database to determine if a USM buffer has been allocated yet for this variable on this device. If not, the runtime allocates the buffer using the size from the database and zero-initializes the buffer. Regardless, the runtime -implements the `copy` / `memcpy` by copying to or from this USM buffer, using -the normal mechanism for copying to / from a USM pointer. +implements the `copy` / `memcpy` function by copying to or from this USM +buffer, using the normal mechanism for copying to / from a USM pointer. The runtime avoids the future cost of looking up the variable in the database by caching the USM pointer in the host instance of the variable's `usmptr` member. If the variable is decorated with the `device_image_scope` property, the -runtime gets the unique string identifier for the variable from the database -and uses a backend-specific function to copy to or from the variable with that -identifier. Again, the details of this function are described below. +runtime uses the unique string identifier for the variable to call a +backend-specific function that copies to or from the variable. Again, the +details of this function are described below. In all cases, the runtime diagnoses invalid calls that write beyond the device global variable's size by using the size in the database. #### Back-end specific function to copy to / from a device symbol -As noted above, we need a backend-specific function copy to / from the device -instance of a variable. All backends provide this functionality, which is -abstracted with these new PI interfaces: +As noted above, we need a backend-specific function to copy to / from the +device instance of a variable. All backends provide this functionality, which +is abstracted with these new PI interfaces: ``` pi_result piextCopyToDeviceVariable(pi_device Device, const char *name, @@ -459,200 +565,25 @@ pi_result piextCopyFromDeviceVariable(pi_device Device, const char *name, void *dst, size_t count, size_t offset); ``` -In both cases the `name` parameter is the same as the "unique string -identifier" for the device global variable. +In both cases the `name` parameter is the same as the `sycl-unique-id` string +that is associated with the device global variable. On the Level Zero backend, these PI interfaces are implemented by first calling -[`zeModuleGetGlobalPointer()`][5] to get a device pointer for the variable and -then calling [`zeCommandListAppendMemoryCopy()`][6] to copy to or from that +[`zeModuleGetGlobalPointer()`][7] to get a device pointer for the variable and +then calling [`zeCommandListAppendMemoryCopy()`][8] to copy to or from that pointer. -[5]: -[6]: +[7]: +[8]: On the OpenCL backend, these PI interfaces are implemented by first calling `clGetDeviceGlobalVariablePointerINTEL()` to get a device pointer for the variable. This function is provided by the -[`cl_intel_global_variable_pointers`][7] extension which is not yet +[`cl_intel_global_variable_pointers`][9] extension which is not yet productized. Once we get a pointer, the PI layer calls `clEnqueueMemcpyINTEL()` to copy to or from that pointer. -[7]: +[9]: On the CUDA backend, these PI interfaces are implemented on top of `cudaMemcpyToSymbol()` and `cudaMemcpyFromSymbol()`. - - -## Unresolved issues - -### Need some way to avoid errors referencing `device_global` variables - -The device compiler front-end currently diagnoses an error if device code -references a global variable, unless it is `constexpr` or `const` and constant -initialized. This is consistent with the SYCL 2020 specification, but the new -device global feature is an exception to this rule. Device code, of course, -can reference a `device_global` variable even if it is not declared `constexpr` -or `const`. We need some way to avoid the error diagnostic in this case. - -The [newly added][8] `sycl_global_var` attribute is almost what we need, -however that attribute is only allowed to decorate a variable declaration. -This doesn't help us because we don't want to force users to add an attribute -to each declaration of a `device_global` variable. Instead, we want to -decorate the class declaration of `device_global` with some attribute which -allows any variables of that type to be accessible from device code. - -[8]: - -Since the `sycl_global_var` attribute is currently used only as an -implementation detail for [device-side asserts][10], one option is to repurpose -this attribute. Rather than applying it to a variable declaration, we could -allow it only on a type declaration. The implementation of device-side asserts -could be changed to use the attribute on a new type, rather than on a variable -declaration. - -[10]: - -### Need to diagnose invalid declarations of `device_global` variables - -The device global extension specification places restrictions on where a -`device_global` variable can be declared aas clarified in [this PR][11] against -the extension specification API. - -[11]: - -The device compiler front-end should emit a diagnostic if a `device_global` -variable is declared in a way that violates these restrictions. We do not have -agreement yet, though, on how this should be done. For example, should the -front-end recognize these variable declarations by the name of their type, or -should we decorate the type with some C++ attribute that helps the front-end -recognize them? - -### Need some way to force `device_global` variables into global address space - -Although the underlying `T` type of a device global variable is stored in a USM -buffer, the `device_global` variable itself is a module scope global variable. -Unless we decorate these variables in some special way, the current behavior of -the `llvm-spirv` tool is to generate these variables in the private address -space, even though they are declared at module scope. - -The [existing OpenCL attribute][12] `[[clang::opencl_global]]` is almost what -we need, but again this attribute can only be applied to a variable -declaration. Instead, we want some attribute that can be applied to the type -declaration of `class device_global`. We could invent some new attribute with -this semantic, but there is another problem. - -[12]: - -Applying `[[clang::opencl_global]]` to a variable of class type currently -raises an error message saying there is no candidate "global" constructor for -the type. Apparently, the compiler expects a constructor to be defined with -the `__global` keyword: - -``` -class device_global { - public: - device_global() __global; -}; -``` - -We could add a default constructor like that, but the compiler only recognizes -this syntax when it is in OpenCL C++ mode, which is not the case when compiling -SYCL applications. Therefore, if we invented a new attribute that added -"global address space" semantics to a type, we would probably want that -attribute to cause any constructors to behave as though they were implicitly -declared with the `__global` keyword. - -Another option entirely is to change the default behavior of the SYCL device -compiler so that namespace scope variables are implicitly treated as though -they are in the global address space (as opposed to the private address space -as is currently the case). This behavior would be consistent with the way the -compiler works in OpenCL C 2.0 mode. - -### Need some way to propagate properties to SPIR-V - -The following three device global properties must be propagated from DPC++ -source code, through LLVM IR, and into SPIR-V where they are represented as -SPIR-V decorations (defined in the -[SPV\_INTEL\_global\_variable\_decorations][13] extension). - -[13]: - -* `host_access` -* `init_mode` -* `implement_in_csr` - -It's not clear how this should work. One of the goals of the new property -mechanism is to make it easy to propagate information like this through the -compiler toolchain, so hopefully we can leverage some common infrastructure -rather than hard-coding support for these three properties. However, there is -not yet a design document for the new properties mechanism, so it's not yet -clear what this infrastructure will be. - -### Will changing the name of internal symbols be bad for debugging? - -The [backend functions for copying to / from a device symbol][4] currently -require the symbol to have export linkage in SPIR-V. (This is the case for the -Level Zero and OpenCL functions. We are not sure about the CUDA functions, but -it seems likely they have the same limitation.) However, the device global -extension allows these variables to also have internal linkage, and this seems -like a useful feature. The current strategy is to convert internal linkage -variables to external linkage at the IR level and also rename the symbol in a -way that is globally unique. - -This should result in functionally correct code, but it seems likely to make -debugging more difficult. If the debugger uses the name from SPIR-V, this name -will not match what the user expects. We attempt to mitigate this somewhat by -preserving the user's name and appending a unique suffix, but this seems like a -weak mitigation. - -Do we think the debugging experience will be so bad that we should change the -strategy? The fundamental requirement is that we need some unique way to -identify each device code variable when using these backend functions. -Currently, we use the variable's mangled name, but this could be changed. -An alternative solution would be to augment the SPIR-V with some new decoration -that gives a unique name to each `OpVariable` that needs to be accessed from -the host. We could then use that name with the backend functions, and avoid -renaming variables with internal linkage. This would be only a minor change to -the [SPV\_INTEL\_global\_variable\_decorations][13] extension, but it would -also require changes in the Level Zero and OpenCL backends. - -### Does the compiler need to be deterministic? - -The compiler is normally deterministic. If you compile the exact same source -file twice specifying the same command line options each time, you get exactly -the same object file. However, this will no longer be the case. - -The design in this document generates a GUID and uses that GUID to rename -device global variable with internal linkage. Since the GUID is different each -time the compiler is executed, the resulting object file is different even if -the source file did not change. The existing design for specialization -constants has exactly the same issue because it also uses a GUID to generate a -unique string for `specialization_id` variables that have internal linkage. - -Is this a problem? If we want to preserve determinism, we could generate -a unique ID as a hash (e.g. SHA-256) from the content of the source file -**and** the command line arguments passed to the compiler. However, this would -require reading the content of the source file, which would have an impact on -compilation time. It's not clear how significant this impact would be, though. - -Note that the non-determinism will cause a problem with the FPGA `-reuse-exe` -compiler option. That option uses the result of a previous compilation to -avoid regenerating FPGA native code if the device code in a translation unit -did not change. (For example, this option avoids regenerating device native -code if the only change in the translation unit was to the host code.) The -option is implemented by comparing device IR from the previous compilation with -the IR in the new compilation. Native code is regenerated only if the IR is -different. This logic will break, though, if the compiler is -non-deterministic because the IR will always be different, so native code will -always be regenerated. This is a showstopper issue for FPGA because native -code generation takes a very long time. - -I see two ways to solve the problem with `-reuse-exe`: - -1. We could change the GUID to be a deterministic hash as outlined above. - -2. We could change SPIR-V as proposed above to give a unique name to each - `OpVariable` which needs to be referenced from the host. This would avoid - the need to change the exported variable name to be a GUID, thus the IR will - be deterministic. (It is also possible to generate the unique `OpVariable` - names in a deterministic way, so this won't cause a problem.) From 5461cda49d56b1b3620548d41b078e023b20fc9a Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 15 Nov 2021 13:14:19 -0500 Subject: [PATCH 10/25] Remove unnecessary device_image_scope wording Remove some extraneous commentary about the `device_image_scope` property. Details of that property are describe more thoroughly in the extension API specification. --- sycl/doc/DeviceGlobal.md | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 6b68089880cd4..950742430b801 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -61,22 +61,11 @@ to a device global variable declaration. The intent of this property is to allow a device global variable to be implemented directly on top of a SPIR-V module scope global variable. When this property is **not** present, an instance of a device global variable is shared across all device images that -are loaded onto a particular device. This makes it easy for the user to reason -about the scope of a variable because the user need not understand which device -image contains each kernel. However, this semantic makes the implementation -less efficient, especially on FPGA targets. - -By contrast, the `device_image_scope` property changes the semantic of a device -global variable such that the user must understand which device image contains -each kernel, which is difficult to reason about. For example, changing the -value of a specialization constant may cause a kernel to be recompiled into a -separate device image on some targets. As a result, a device global variable -referenced in a kernel may actually have several disjoint instances if the -kernel uses specialization constants. This problem is more tractable on FPGA -targets because specialization constants are not implemented via separate -device images on those targets, however, there are other factors that FPGA -users need to be aware of when using the `device_image_scope` property. These -are documented more throughly in the extension specification. +are loaded onto a particular device. By contrast, when this property **is** +present, each device image has its own instance of the device global variable. +However, since multiple variable instances have confusing semantics, the API +requires the user to ensure that each such variable exists in exactly one +device image. The extension specification has more details on this property. The important impact on the design, though, is that device global variables declared with the `device_image_scope` property have an implementation that is From 2c5de3ff276ee62a6d5b846ea3e5a83b1493f284 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 15 Nov 2021 14:34:11 -0500 Subject: [PATCH 11/25] Clarify backend requirements Clarify the backend requirements for copying to / from a module scope (global) variable. --- sycl/doc/DeviceGlobal.md | 51 ++++++++++++++----- ...cl_intel_global_variable_pointers.asciidoc | 3 -- 2 files changed, 39 insertions(+), 15 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 950742430b801..b9b8afee174fe 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -557,22 +557,49 @@ pi_result piextCopyFromDeviceVariable(pi_device Device, const char *name, In both cases the `name` parameter is the same as the `sycl-unique-id` string that is associated with the device global variable. -On the Level Zero backend, these PI interfaces are implemented by first calling -[`zeModuleGetGlobalPointer()`][7] to get a device pointer for the variable and -then calling [`zeCommandListAppendMemoryCopy()`][8] to copy to or from that -pointer. +The Level Zero backend has existing APIs that can implement these PI +interfaces. DPC++ first calls [`zeModuleGetGlobalPointer()`][7] to get a +device pointer for the variable and then calls +[`zeCommandListAppendMemoryCopy()`][8] to copy to or from that pointer. +However, the documentation (and implementation) of `zeModuleGetGlobalPointer()` +needs to be extended slightly. The description currently says: + +> * The application may query global pointer from any module that either +> exports or imports it. +> +> * The application must dynamically link a module that imports a global before +> the global pointer can be queried from it. + +This must be changed to say something along these lines: + +> * The interpretation of `pGlobalName` depends on how the module was created. +> If the module was created from SPIR-V that declares the +> **GlobalVariableDecorationsINTEL** capability, the implementation looks +> first for an **OpVariable** that is decorated with **HostAccessINTEL** +> where the *Name* operand is the same as `pGlobalName`. If no such variable +> is found, the implementation then looks for an **OpVariable** that is +> decorated with **LinkageAttributes** where the *Name* operand is the same +> as `pGlobalName`. (The implementation considers both exported and imported +> variables as candidates.) +> +> If the module was created from native code that came from a previous call +> to `zeModuleGetNativeBinary` and that other module was created from SPIR-V, +> then the interpretation of `pGlobalName` is the same as the SPIR-V case. +> +> * If `pGlobalName` identifies an imported SPIR-V variable, the module must be +> dynamically linked before the variable's pointer may be queried. [7]: [8]: -On the OpenCL backend, these PI interfaces are implemented by first calling -`clGetDeviceGlobalVariablePointerINTEL()` to get a device pointer for the -variable. This function is provided by the -[`cl_intel_global_variable_pointers`][9] extension which is not yet -productized. Once we get a pointer, the PI layer calls -`clEnqueueMemcpyINTEL()` to copy to or from that pointer. +The OpenCL backend has a proposed extension +[`cl_intel_global_variable_pointers`][9] that can implement these PI +interfaces. DPC++ first calls `clGetDeviceGlobalVariablePointerINTEL()` to get +a device pointer for the variable and then calls `clEnqueueMemcpyINTEL()` to +copy to or from that pointer. This DPC++ design depends upon implementation of +that OpenCL extension. [9]: -On the CUDA backend, these PI interfaces are implemented on top of -`cudaMemcpyToSymbol()` and `cudaMemcpyFromSymbol()`. +The CUDA backend has existing APIs `cudaMemcpyToSymbol()` and +`cudaMemcpyFromSymbol()` which can be used to implement these PI interfaces. diff --git a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc index a91b5ec26bd0b..1c6d7495f4d9d 100644 --- a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc @@ -99,9 +99,6 @@ is used to retrieve a pointer to a user-defined global variable _global_variable_name_ which exists in built _program_. The interpretation of _global_variable_name_ depends on how the _program_ was created: -* If the _program_ was created with *clCreateProgramWithSource* from OpenCL C - sources, *TODO* (see Issues section below). - * If the _program_ was created with *clCreateProgramWithIL* from SPIR-V, there are two cases: From f6e971daa3b171bad5b6412864b523712b1fc57d Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 15 Nov 2021 16:12:16 -0500 Subject: [PATCH 12/25] Update link to device global spec Since PR #4697 was merged, there is no need to point to that PR. Instead, just point to the extension specification for device global. --- sycl/doc/DeviceGlobal.md | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index b9b8afee174fe..ab1195e254cf1 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -191,13 +191,10 @@ could then remove the support for `sycl_global_var`. The last attribute `[[__sycl_detail__::restrictions(device_global)]]` controls error reporting for variables declared of this type. The device global extension specification places restrictions on where a `device_global` variable -can be declared as clarified in [this PR][5] against the extension -specification API. Rather than have the front-end recognize the name of the +can be declared. Rather than have the front-end recognize the name of the `device_global` type, the front-end uses this attribute to know which restrictions to enforce for this type. -[5]: - **NOTE**: The front-end does currently recognize the `specialization_id` class by its name, and it has hard-coded knowledge that variables declared with this type have certain restrictions. If we wanted to avoid having the front-end @@ -227,7 +224,7 @@ There are several changes to the device compiler front-end: `device_global` type. As described above, the front-end uses the `[[__sycl_detail__::restrictions()]]` attribute (rather than the class name) to know which set of restrictions to check. The restrictions specific to - device global variables are documented in [this PR][5]. + device global variables are documented in the [extension specification][1]. * The front-end *avoids* diagnosing an error when variables of type `device_global` are referenced in device code because the type is decorated @@ -364,10 +361,10 @@ unqualified name lookup. Furthermore, the name of the temporary variable (`__sycl_UNIQUE_STRING`) is globally unique, so it is guaranteed not to be shadowed by any other name in the translation unit. This problem with variable shadowing is also a problem for the integration footer we use for -specialization constants. See the [specialization constant design document][6] +specialization constants. See the [specialization constant design document][5] for more details on this topic. -[6]: +[5]: ### Changes to the `sycl-post-link` tool @@ -558,9 +555,9 @@ In both cases the `name` parameter is the same as the `sycl-unique-id` string that is associated with the device global variable. The Level Zero backend has existing APIs that can implement these PI -interfaces. DPC++ first calls [`zeModuleGetGlobalPointer()`][7] to get a +interfaces. DPC++ first calls [`zeModuleGetGlobalPointer()`][6] to get a device pointer for the variable and then calls -[`zeCommandListAppendMemoryCopy()`][8] to copy to or from that pointer. +[`zeCommandListAppendMemoryCopy()`][7] to copy to or from that pointer. However, the documentation (and implementation) of `zeModuleGetGlobalPointer()` needs to be extended slightly. The description currently says: @@ -589,17 +586,17 @@ This must be changed to say something along these lines: > * If `pGlobalName` identifies an imported SPIR-V variable, the module must be > dynamically linked before the variable's pointer may be queried. -[7]: -[8]: +[6]: +[7]: The OpenCL backend has a proposed extension -[`cl_intel_global_variable_pointers`][9] that can implement these PI +[`cl_intel_global_variable_pointers`][8] that can implement these PI interfaces. DPC++ first calls `clGetDeviceGlobalVariablePointerINTEL()` to get a device pointer for the variable and then calls `clEnqueueMemcpyINTEL()` to copy to or from that pointer. This DPC++ design depends upon implementation of that OpenCL extension. -[9]: +[8]: The CUDA backend has existing APIs `cudaMemcpyToSymbol()` and `cudaMemcpyFromSymbol()` which can be used to implement these PI interfaces. From ba65c99a9ce27f32254acbc857b81eaeae1aab4d Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 16 Nov 2021 18:45:19 -0500 Subject: [PATCH 13/25] Fix design for "handle::copy,memcpy" Update design for the new `handler::copy()` and `handler::memcpy()` functions to address problems identified in review. --- sycl/doc/DeviceGlobal.md | 101 ++++++++++++++++++++++----------------- 1 file changed, 58 insertions(+), 43 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index ab1195e254cf1..129be5f0599ef 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -488,11 +488,12 @@ runtime does the following: - Regardless of whether the USM buffer has already been created for the variable, the runtime initializes the `usmptr` member in the *device - instance* of the variable by using a backend-specific function which - copies data from the host to a device variable. It is a simple matter to - use this function to overwrite the `usmptr` member with the address of - the USM buffer. The details of this device-specific function are - described below. + instance* of the variable by using a new [PI interface][6] which copies + data from the host to a global variable in a `pi_program`. It is a + simple matter to use this interface to overwrite the `usmptr` member with + the address of the USM buffer. + +[6]: <#new-pi-interface-to-copy-to-or-from-a-module-scope-variable> Note that the runtime does not need to initialize the `val` member variable of device global variables that are decorated with `device_image_scope` because @@ -503,61 +504,75 @@ initialized. Each of these functions accepts a (host) pointer to a device global variable as one of its parameters, and the runtime uses this pointer to find the associated -information for this variable in the device global database. In the common -case where a kernel using this device global has already been submitted to the -target device, the database will contain all of the information for this device -global variable. - -However, in the case when the application has not previously submitted a kernel -that uses this device global, the database will contain only the address of the -variable and its unique string. In this case, the runtime must scan all -"SYCL/device globals" property sets in the application searching for an entry -with that same unique string. The runtime can then add the remaining -information about the device global variable to the database. +information for this variable in the device global database. The code in the +integration footer ensures that the database will at least contain the address +of the variable and its unique string, even if no kernel referencing this +variable has been submitted yet. + +Each of these functions is templated on the variable's underlying type `T`, so +it knows the size of this type. Each function is also templated on the +variable's property list, so it knows whether the variable has the +`device_image_scope` property. The remaining behavior depends on whether the variable is decorated with the `device_image_scope` property. If the variable is not decorated with this property, the runtime uses the database to determine if a USM buffer has been allocated yet for this variable -on this device. If not, the runtime allocates the buffer using the size from -the database and zero-initializes the buffer. Regardless, the runtime -implements the `copy` / `memcpy` function by copying to or from this USM -buffer, using the normal mechanism for copying to / from a USM pointer. +on this device. If not, the runtime allocates the buffer using `sizeof(T)` +and zero-initializes the buffer. Regardless, the runtime implements the `copy` +/ `memcpy` function by copying to or from this USM buffer, using the normal +mechanism for copying to / from a USM pointer. The runtime avoids the future cost of looking up the variable in the database by caching the USM pointer in the host instance of the variable's `usmptr` member. If the variable is decorated with the `device_image_scope` property, the -runtime uses the unique string identifier for the variable to call a -backend-specific function that copies to or from the variable. Again, the -details of this function are described below. +variable's value exists directly in the device code module, not in a USM +buffer. The runtime first uses the variable's unique string identifier to see +if there is a `pi_program` that contains the variable. If there is more than +one such `pi_program`, the runtime diagnoses an error by throwing +`errc::invalid`. If there is no such `pi_program`, the runtime scans all +"SYCL/device globals" property sets to find the device code module that +contains this variable and uses its normal mechanism for creating a +`pi_program` from this device code module. (The algorithm for creating device +code modules in the `sycl-post-link` tool ensures that there will be no more +than one module that contains the variable.) Finally, the runtime uses the +new [PI interface][6] to copy to or from the contents of the variable in this +`pi_program`. + +It is possible that a device global variable with `device_image_scope` is not +referenced by _any_ kernel, in which case the variable's unique string will not +exist in any property set. In this case, the runtime simply uses the host +instance of the `device_global` variable to hold the value and copies to or +from the `val` member. -In all cases, the runtime diagnoses invalid calls that write beyond the device -global variable's size by using the size in the database. +In all cases, the runtime uses `sizeof(T)` to determine if the copy operation +will read or write beyond the end of the device global variable's storage. If +so, the runtime diagnoses an error by throwing `errc::invalid`. -#### Back-end specific function to copy to / from a device symbol +#### New PI interface to copy to or from a module scope variable -As noted above, we need a backend-specific function to copy to / from the -device instance of a variable. All backends provide this functionality, which -is abstracted with these new PI interfaces: +As noted above, we need new PI interfaces that can copy data to or from an +instance of a device global variable in a `pi_program`. This functionality is +exposed as two new PI interfaces: ``` -pi_result piextCopyToDeviceVariable(pi_device Device, const char *name, - const void *src, size_t count, size_t offset); +pi_result piextCopyToDeviceVariable(pi_device Device, pi_program Program, + const char *name, const void *src, size_t count, size_t offset); -pi_result piextCopyFromDeviceVariable(pi_device Device, const char *name, - void *dst, size_t count, size_t offset); +pi_result piextCopyFromDeviceVariable(pi_device Device, pi_program Program, + const char *name, void *dst, size_t count, size_t offset); ``` In both cases the `name` parameter is the same as the `sycl-unique-id` string that is associated with the device global variable. The Level Zero backend has existing APIs that can implement these PI -interfaces. DPC++ first calls [`zeModuleGetGlobalPointer()`][6] to get a +interfaces. The plugin first calls [`zeModuleGetGlobalPointer()`][7] to get a device pointer for the variable and then calls -[`zeCommandListAppendMemoryCopy()`][7] to copy to or from that pointer. +[`zeCommandListAppendMemoryCopy()`][8] to copy to or from that pointer. However, the documentation (and implementation) of `zeModuleGetGlobalPointer()` needs to be extended slightly. The description currently says: @@ -586,17 +601,17 @@ This must be changed to say something along these lines: > * If `pGlobalName` identifies an imported SPIR-V variable, the module must be > dynamically linked before the variable's pointer may be queried. -[6]: -[7]: +[7]: +[8]: The OpenCL backend has a proposed extension -[`cl_intel_global_variable_pointers`][8] that can implement these PI -interfaces. DPC++ first calls `clGetDeviceGlobalVariablePointerINTEL()` to get -a device pointer for the variable and then calls `clEnqueueMemcpyINTEL()` to -copy to or from that pointer. This DPC++ design depends upon implementation of -that OpenCL extension. +[`cl_intel_global_variable_pointers`][9] that can implement these PI +interfaces. The plugin first calls `clGetDeviceGlobalVariablePointerINTEL()` +to get a device pointer for the variable and then calls +`clEnqueueMemcpyINTEL()` to copy to or from that pointer. This DPC++ design +depends upon implementation of that OpenCL extension. -[8]: +[9]: The CUDA backend has existing APIs `cudaMemcpyToSymbol()` and `cudaMemcpyFromSymbol()` which can be used to implement these PI interfaces. From 6a21bb27b6d4a159dbabb043fdd29a5a338c1289 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 17 Nov 2021 08:44:44 -0500 Subject: [PATCH 14/25] Fix spelling --- sycl/doc/DeviceGlobal.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 129be5f0599ef..ebdbb0f591421 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -265,7 +265,7 @@ class __sycl_device_global_registration { }; __sycl_device_global_registration __sycl_device_global_registerer; -} // namepsace (unnamed) +} // namespace (unnamed) } // namespace sycl::detail ``` @@ -287,7 +287,7 @@ __sycl_device_global_registration::__sycl_device_global_registration() noexcept /* same string returned from __builtin_sycl_unique_stable_id(::inner::Bar) */); } -} // namepsace (unnamed) +} // namespace (unnamed) } // namespace sycl::detail ``` @@ -351,7 +351,7 @@ __sycl_device_global_registration::__sycl_device_global_registration() noexcept /* same string returned from __builtin_sycl_unique_stable_id(::(unnamed)::FuBar) */); } -} // namepsace (unnamed) +} // namespace (unnamed) } // namespace sycl::detail ``` From 230a2bcb5b862b13efa53d4eb82f5b0f837a167d Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 17 Nov 2021 09:17:39 -0500 Subject: [PATCH 15/25] Fix list of properties propagated to SPIR-V Fix a copy/paste error in the list of properties that the `sycl-post-link` tool propagates to SPIR-V. --- sycl/doc/DeviceGlobal.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index ebdbb0f591421..958d4f326e37c 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -393,7 +393,7 @@ compile-time properties that need to be generated in SPIR-V. The global variable: * `host_access` -* `device_image_scope` +* `init_mode` * `implement_in_csr` The `host_access` property is handled specially because the SPIR-V decoration From 59852035c566e4e9b75623aece866fd98b09c534 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 22 Nov 2021 10:14:21 -0500 Subject: [PATCH 16/25] Use default initializer for "val" Use a member variable default initializer to zero-initialize `val` rather than defining a default constructor. --- sycl/doc/DeviceGlobal.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 958d4f326e37c..41713b7cc1609 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -103,16 +103,15 @@ type itself, and member functions return a reference to this value. ``` template class device_global { - T val; + T val{}; public: - device_global() : val() {} T& get() noexcept { return val; } /* other member functions */ }; ``` -Note that the default constructor "value initializes" the `val` member -variable. Since the type `T` is limited to types that are trivially +Note that the `val` member has a default initializer that causes it to be +"value initialized". Since the type `T` is limited to types that are trivially constructible, this means that `val` will be zero initialized. In both cases the member variable (either `usmptr` or `val`) must be the first @@ -497,8 +496,8 @@ runtime does the following: Note that the runtime does not need to initialize the `val` member variable of device global variables that are decorated with `device_image_scope` because -the constructor already guarantees that this member variable is zero -initialized. +the `val` default initializer already guarantees that this member variable is +zero initialized. #### Implementing the `copy` and `memcpy` functions in `queue` and `handler` From ff903a312d98d576cb3c443f26d71c8e942b7ec6 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 22 Nov 2021 10:53:15 -0500 Subject: [PATCH 17/25] Add design decisions section Add a section at the end of the document describing some of the issues that arose during review of this document. --- sycl/doc/DeviceGlobal.md | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 41713b7cc1609..4a4bd978e410a 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -614,3 +614,38 @@ depends upon implementation of that OpenCL extension. The CUDA backend has existing APIs `cudaMemcpyToSymbol()` and `cudaMemcpyFromSymbol()` which can be used to implement these PI interfaces. + + +## Design choices + +This section captures some of the discussions about aspects of the design. + +### Should the value be zero-initialized + +There was some debate about whether the value in the `device_global` should +always be zero-initialized. We decided to require this in order to be +consistent with C++ rules for global variables. We want `device_global` to +model the normal rules for global variables. Since C++ guarantees that a +global variable with a trivial constructor is zero-initialized, we want that +behavior too. + +The downside is that some applications may allocate a very large storage for +the underlying type `T` of a device global variable, and they may not want to +pay the cost of zero initializing it. We agree that this is a theoretical +problem, but we aren't sure if this will be an issue for real applications. If +it turns out to be a real problem, we propose adding a new property that +prevents initialization of the device global value. For example, we could add +a new parameter to the `init_mode` property called `none`. + +### Why not include both `val` and `usmptr` member variables + +Rather than using partial specialization to define `device_global` differently +based on the `device_image_scope` property, we could instead define both member +variables regardless of the properties. This would make the header file +implementation easier, but it would lead to wasted space in the case when the +`device_image_scope` property was not specified since the `val` member is +unused in this case. Wasting space on the host may not be such a big problem, +but the space would also be wasted on every device that reference the device +global variable, and this seems like a bigger problem. We decided that the +extra header file complexity of partial specialization is worth avoiding this +wasted memory. From 4d730be9b71a80302507f3221cf4a5131c966a2a Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 7 Dec 2021 09:40:33 -0500 Subject: [PATCH 18/25] Change the OpenCL extension After discussions with @bashbaug, @GarveyJoe, and @artemrad; we decided to change the OpenCL extension to avoid returning a pointer to the global variable. Instead, we add two new APIs that can read or write the content of the variable. This avoids the need to precisely define the semantics of the returned pointer. --- sycl/doc/DeviceGlobal.md | 9 +- .../cl_intel_global_variable_access.asciidoc | 265 ++++++++++++++++++ ...cl_intel_global_variable_pointers.asciidoc | 196 ------------- 3 files changed, 269 insertions(+), 201 deletions(-) create mode 100644 sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc delete mode 100644 sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 4a4bd978e410a..205940702612d 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -604,13 +604,12 @@ This must be changed to say something along these lines: [8]: The OpenCL backend has a proposed extension -[`cl_intel_global_variable_pointers`][9] that can implement these PI -interfaces. The plugin first calls `clGetDeviceGlobalVariablePointerINTEL()` -to get a device pointer for the variable and then calls -`clEnqueueMemcpyINTEL()` to copy to or from that pointer. This DPC++ design +[`cl_intel_global_variable_access`][9] that defines functions +`clEnqueueReadGlobalVariableINTEL()` and `clEnqueueWriteGlobalVariableINTEL()` +which can be easily used to implement these PI interfaces. This DPC++ design depends upon implementation of that OpenCL extension. -[9]: +[9]: The CUDA backend has existing APIs `cudaMemcpyToSymbol()` and `cudaMemcpyFromSymbol()` which can be used to implement these PI interfaces. diff --git a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc new file mode 100644 index 0000000000000..acf89129a0a2b --- /dev/null +++ b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc @@ -0,0 +1,265 @@ += cl_intel_global_variable_access + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + +== Name Strings + +`cl_intel_global_variable_access` + +== Contact + +Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com) + +== Contributors + +// spell-checker: disable +* Ben Ashbaugh, Intel +* Joe Garvey, Intel +* Greg Lueck, Intel +* Artem Radzikhovskyy, Intel +// spell-checker: enable + +== Notice + +Copyright (c) 2021 Intel Corporation. All rights reserved. + +== Status + +Working Draft + +This is a preview extension specification, intended to provide early access to +a feature for review and community feedback. When the feature matures, this +specification may be released as a formal extension. + +Because the interfaces defined by this specification are not final and are +subject to change they are not intended to be used by shipping software +products. If you are interested in using this feature in your software +product, please let us know! + +== Version + +Built On: {docdate} + +Revision: A + +== Dependencies + +This extension is written against the OpenCL API Specification Version 3.0.10. + +== Overview + +This extension allows host code to read or write data contained within a global +variable (aka program scope variables) that is defined in a program object. + +== New API Functions + +[source] +---- +cl_int clEnqueueReadGlobalVariableINTEL( + cl_command_queue command_queue, + cl_program program, + const char* name, + size_t size, + size_t offset, + void* ptr, + cl_uint num_events_in_wait_list, + const cl_event* event_wait_list, + cl_event* event); + +cl_int clEnqueueWriteGlobalVariableINTEL( + cl_command_queue command_queue, + cl_program program, + const char* name, + size_t size, + size_t offset, + const void* ptr, + cl_uint num_events_in_wait_list, + const cl_event* event_wait_list, + cl_event* event); +---- + +== New API Enums + +None. + +== New OpenCL C Functions + +None. + +== Modifications to the OpenCL API Specification + +Add a new subsection under Section 5, *The OpenCL Runtime* named +*Global Variable Access* with the following content: + +The following functions enqueue commands to read or write data contained within +a user-defined global variable. + +---- +cl_int clEnqueueReadGlobalVariableINTEL( + cl_command_queue command_queue, + cl_program program, + const char* name, + size_t size, + size_t offset, + void* ptr, + cl_uint num_events_in_wait_list, + const cl_event* event_wait_list, + cl_event* event); + +cl_int clEnqueueWriteGlobalVariableINTEL( + cl_command_queue command_queue, + cl_program program, + const char* name, + size_t size, + size_t offset, + const void* ptr, + cl_uint num_events_in_wait_list, + const cl_event* event_wait_list, + cl_event* event); +---- + +* _command_queue_ is a valid host command-queue in which the read / write + command will be queued. _command_queue_ and _program_ must be created with + the same OpenCL context. + +* _program_ is a program object with a successfully built executable. The + _program_ must define a global variable identified by _name_. + +* _name_ identifies the global variable to read or write. Must be non-NULL. + The interpretation depends on how _program_ was created: + +** If _program_ was created with *clCreateProgramWithIL* from SPIR-V, there are + two cases: + +*** If the SPIR-V module declares the *GlobalVariableDecorationsINTEL* + capability, the implementation looks first for an *OpVariable* that is + decorated with *HostAccessINTEL* where the _Name_ operand is the same as + _name_. + +*** The implementation next looks for an *OpVariable* that is decorated with + *LinkageAttributes* where the _Linkage Type_ is *Export* and the _Name_ + operand is the same as _name_. + +** If _program_ was created in any other way, the interpretation of + _name_ is implementation-defined. + +* _size_ tells the number of bytes to read or write. + +* _offset_ tells the offset (number of bytes) from the start of the global + variable where the data is read or written. + +* _ptr_ is a pointer to a buffer in host memory which receives the data that + is read or which contains the data that is written. + +* _event_wait_list_ and _num_events_in_wait_list_ specify events that need to + complete before this particular command can be executed. If + _event_wait_list_ is NULL, then this particular command does not wait on any + event to complete. If _event_wait_list_ is NULL, _num_events_in_wait_list_ + must be 0. If _event_wait_list_ is not NULL, the list of events pointed to + by _event_wait_list_ must be valid and _num_events_in_wait_list_ must be + greater than 0. The events specified in _event_wait_list_ act as + synchronization points. The context associated with events in + _event_wait_list_ and _command_queue_ must be the same. The memory + associated with _event_wait_list_ can be reused or freed after the function + returns. + +* _event_ returns an event object that identifies this read / write command and + can be used to query or queue a wait for this command to complete. If + _event_ is NULL or the enqueue is unsuccessful, no event will be created and + therefore it will not be possible to query the status of this command or to + wait for this command to complete. If _event_wait_list_ and _event_ are not + NULL, _event_ must not refer to an element of the _event_wait_list_ array. + +When a program object is created with more than one associated device, each +device has a distinct instance of any global variables contained by that +program object. These functions read or write the variable instance that +resides on the target device of the _command_queue_. + +These functions are non-blocking, so they may return even before data is read +or written to the global variable. The _event_ argument returns an event +object which the application can use to query the execution status of the +command. The application must not use the content of the _ptr_ buffer until +the command has completed, and the value read by +*clEnqueueReadGlobalVariableINTEL* is not available in the buffer referenced by +_ptr_ until after the command has completed. + +*clEnqueueReadGlobalVariableINTEL* and *clEnqueueWriteGlobalVariableINTEL* +return `CL_SUCCESS` if the function is executed successfully. Otherwise, they +return one of the following errors: + +* `CL_INVALID_COMMAND_QUEUE` if _command_queue_ is not a valid host + command-queue. + +* `CL_INVALID_CONTEXT` if the context associated with _command_queue_ and + _program_ are not the same or if the context associated with _command_queue_ + and events in _event_wait_list_ are not the same. + +* `CL_INVALID_PROGRAM` if _program_ is not a valid program object. + +* `CL_INVALID_PROGRAM_EXECUTABLE` if there is no successfully built program + executable available for device associated with _command_queue_. + +* `CL_INVALID_ARG_VALUE` if _name_ does not identify a global variable defined + in _program_. + +* `CL_INVALID_VALUE` if _name_ or _ptr_ are NULL. + +* `CL_INVALID_VALUE` if the region being read or written specified by + (_offset_, _size_) is not fully contained by the size of the global variable. + +* `CL_INVALID_EVENT_WAIT_LIST` if _event_wait_list_ is NULL and + _num_events_in_wait_list_ > 0, or _event_wait_list_ is not NULL and + _num_events_in_wait_list_ is 0, or if event objects in _event_wait_list_ are + not valid events. + +* `CL_INVALID_OPERATION` if *clEnqueueReadGlobalVariableINTEL* is called for a + global variable that is not readable from the host. This is the case when + _program_ is created from a SPIR-V module that declares the + *GlobalVariableDecorationsINTEL* capability and the *OpVariable* is decorated + with *HostAccessINTEL* and _Write_ or _None_ access mode. + +* `CL_INVALID_OPERATION` if *clEnqueueWriteGlobalVariableINTEL* is called for a + global variable that is not writable from the host. This is the case when + _program_ is created from a SPIR-V module that declares the + *GlobalVariableDecorationsINTEL* capability and the *OpVariable* is decorated + with *HostAccessINTEL* and _Read_ or _None_ access mode. + +* `CL_OUT_OF_RESOURCES` if there is a failure to allocate resources required by + the OpenCL implementation on the device. + +* `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required + by the OpenCL implementation on the host. + +== Issues + +. We do not have a formal definition for the _name_ of a global variable when + the program object is created from OpenCL C source code. For example, must + the variable be declared "extern", and are "constant" variables allowed? If + we want OpenCL C sources to be able to set the SPIR-V *HostAccessINTEL* + decoration, then we will need to define some new syntax for that. For now, + this specification only defines the case when the program object is created + from SPIR-V and leaves the OpenCL C source case as "implementation defined". ++ +-- +*UNRESOLVED* +-- + + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|A|2021-12-07|Greg Lueck|*Initial revision* +|======================================== diff --git a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc deleted file mode 100644 index 1c6d7495f4d9d..0000000000000 --- a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_pointers.asciidoc +++ /dev/null @@ -1,196 +0,0 @@ -= cl_intel_global_variable_pointers - -// This section needs to be after the document title. -:doctype: book -:toc2: -:toc: left -:encoding: utf-8 -:lang: en - -// Set the default source code type in this document to C++, -// for syntax highlighting purposes. This is needed because -// docbook uses c++ and html5 uses cpp. -:language: {basebackend@docbook:c++:cpp} - -== Name Strings - -`cl_intel_global_variable_pointers` - -== Contact - -Ben Ashbaugh, Intel (ben 'dot' ashbaugh 'at' intel 'dot' com) - -== Contributors - -// spell-checker: disable -* Ben Ashbaugh, Intel -* Jaroslaw Chodor, Intel -* Vyacheslav Zakharin, Intel -// spell-checker: enable - -== Notice - -Copyright (c) 2021 Intel Corporation. All rights reserved. - -== Status - -Working Draft - -This is a preview extension specification, intended to provide early access to a feature for review and community feedback. -When the feature matures, this specification may be released as a formal extension. - -Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products. -If you are interested in using this feature in your software product, please let us know! - -== Version - -Built On: {docdate} + -Revision: 0.9.0 - -== Dependencies - -This extension is written against the OpenCL API Specification Version 3.0.6. - -This extension requires support for the `cl_intel_unified_shared_memory` -extension. - -== Overview - -The extension allows programmers to obtain pointers to global variables (aka -program scope variables) of OpenCL Programs and read/write such global variables -using host APIs. - -== New API Functions - -[source] ----- -cl_int clGetDeviceGlobalVariablePointerINTEL( - cl_device_id device, - cl_program program, - const char *global_variable_name, - size_t *global_variable_size_ret, - void **global_variable_pointer_ret ); ----- - -== New API Enums - -None. - -== New OpenCL C Functions - -None. - -== Modifications to the OpenCL API Specification - -Modify Section 5.8.9, *Program Object Queries* :: -+ --- - -The function ----- -cl_int clGetDeviceGlobalVariablePointerINTEL( - cl_device_id device, - cl_program program, - const char *global_variable_name, - size_t *global_variable_size_ret, - void **global_variable_pointer_ret ); ----- -is used to retrieve a pointer to a user-defined global variable -_global_variable_name_ which exists in built _program_. The interpretation of -_global_variable_name_ depends on how the _program_ was created: - -* If the _program_ was created with *clCreateProgramWithIL* from SPIR-V, there - are two cases: - - - If the SPIR-V module declares the *GlobalVariableDecorationsINTEL* - capability, the implementation looks first for an *OpVariable* that is - decorated with *HostAccessINTEL* where the _Name_ operand is the same as - _global_variable_name_. - - - The implementation next looks for an *OpVariable* that is decorated with - *LinkageAttributes* where the _Linkage Type_ is *Export* and the _Name_ - operand is the same as _global_variable_name_. - -* If the _program_ was created in any other way, the interpretation of - _global_variable_name_ is implementation-defined. - -The returned pointer is a Unified Shared Memory pointer to the requested global -variable and is specific to given _device_. - -The lifetime of the returned pointer corresponds to the lifetime of _program_ -object which was queried. Subsequent calls to *clBuildProgram*, -*clCompileProgram* or *clLinkProgram* invalidates the result of -*clGetDeviceGlobalVariablePointerINTEL*. The returned value remains valid -between subsequent calls to *clEnqueueNDRangeKernel* or *clEnqueueTask*. - -The returned pointer can be stored to a global object inside a kernel. The -returned pointer cannot be used in program other than specified as _program_ -argument. - -* _device_ indicates which device should be used to retrieve the global variable pointer. - The Address of a global variable might be different for each device. - -* _program_ indicates the program with the _global_variable_name_ global variable. - -* _global_variable_name_ is the global variable name to query. Must be non-NULL. - -* _global_variable_size_ret_ if non-NULL, can be used to query the size of the requested global variable. - -* _global_variable_pointer_ret_ used to return the result from the function. Must be non-NULL. - -*clGetDeviceGlobalVariablePointerINTEL* returns `CL_SUCCESS` if the function executed -successfully. Otherwise, it returns one of the following errors: - -* `CL_INVALID_VALUE` if _global_variable_pointer_ret_ or _global_variable_name_ is NULL, - -* `CL_INVALID_DEVICE` if _device_ is not in the list of devices associated with _program_. - -* `CL_INVALID_PROGRAM` if _program_ is not a valid program object. - -* `CL_INVALID_PROGRAM_EXECUTABLE` if there is no successfully built executable - for _program_ for the specified _device_. - -* `CL_INVALID_ARG_VALUE` if _global_variable_name_ is not found in _program_ or it is - not possible to obtain address of _global_variable_name_. --- - -== Modifications to the OpenCL SPIR-V Environment Specification - -*TODO* - -Modify sub-section *Extensions* :: -+ --- -cl_intel_global_variable_pointers ... --- - -== Issues - -. Exactly what is the interpretation of _global_variable_name_ for a _program_ - that was created from OpenCL C sources? Must the variable be declared - "extern"? Are "constant" variables allowed? -+ --- -*UNRESOLVED* --- - -. Can a queried global variable pointer be used by a kernel from another program object? -+ --- -*UNRESOLVED* - -Currently this is disallowed. --- - - -== Revision History - -[cols="5,15,15,70"] -[grid="rows"] -[options="header"] -|======================================== -|Rev|Date|Author|Changes -|0.9.0|2019-09-14|Jaroslaw Chodor|*Initial revision* -|0.9.0|2021-09-22|Ben Ashbaugh|Added preview disclaimer, general cleanup. -|0.9.0|2021-10-18|Greg Lueck|Clarify variable name for SPIR-V case. -|======================================== From df56b04a3386a939ac6d703a8426dcd69553df92 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 7 Dec 2021 11:17:16 -0500 Subject: [PATCH 19/25] Rename [[__sycl_detail__::restrictions()]] Rename the internal C++ attribute `[[__sycl_detail__::restrictions(device_global)]]` to `[[__sycl_detail__::device_global]]`. We decided this is easier for the front-end to parse and we do not need the extra generality that `[[__sycl_detail__::restrictions()]]` provides. --- sycl/doc/DeviceGlobal.md | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/sycl/doc/DeviceGlobal.md b/sycl/doc/DeviceGlobal.md index 205940702612d..5789b3d7995a1 100644 --- a/sycl/doc/DeviceGlobal.md +++ b/sycl/doc/DeviceGlobal.md @@ -145,7 +145,7 @@ class Props::meta_value... )]] [[__sycl_detail__::global_variable_allowed]] - [[__sycl_detail__::restrictions(device_global)]] + [[__sycl_detail__::device_global]] #endif device_global> {/*...*/}; ``` @@ -187,21 +187,13 @@ could then remove the support for `sycl_global_var`. [3]: [4]: -The last attribute `[[__sycl_detail__::restrictions(device_global)]]` controls -error reporting for variables declared of this type. The device global -extension specification places restrictions on where a `device_global` variable -can be declared. Rather than have the front-end recognize the name of the +The last attribute `[[__sycl_detail__::device_global]]` controls error +reporting for variables declared of this type. The device global extension +specification places restrictions on where a `device_global` variable can be +declared. Rather than have the front-end recognize the name of the `device_global` type, the front-end uses this attribute to know which restrictions to enforce for this type. -**NOTE**: The front-end does currently recognize the `specialization_id` class -by its name, and it has hard-coded knowledge that variables declared with this -type have certain restrictions. If we wanted to avoid having the front-end -recognize the `specialization_id` class by its name, we could decorate that -class with `[[__sycl_detail__::restrictions(spec_id)]]`. Note that we would -need a different parameter name because the restrictions for -`specialization_id` are different from those for `device_global`. - #### Declaration of member functions to copy device global variables The headers are also updated to add the new `copy()` and `memcpy()` member @@ -221,7 +213,7 @@ There are several changes to the device compiler front-end: * The front-end checks for restrictions on variable declarations using the `device_global` type. As described above, the front-end uses the - `[[__sycl_detail__::restrictions()]]` attribute (rather than the class name) + `[[__sycl_detail__::device_global]]` attribute (rather than the class name) to know which set of restrictions to check. The restrictions specific to device global variables are documented in the [extension specification][1]. From 05d1c30f0b9247c728be8a50ba6c7a2c57e94628 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 10 Dec 2021 10:10:46 -0500 Subject: [PATCH 20/25] Address review comments in OpenCL extension * Add flags to allow either blocking or non-blocking read/write operations. * Add new enumerated values to query these read/write commands via `CL_EVENT_COMMAND_TYPE`. * Update format of extension revision identifier. --- .../cl_intel_global_variable_access.asciidoc | 71 ++++++++++++++++--- 1 file changed, 61 insertions(+), 10 deletions(-) diff --git a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc index acf89129a0a2b..eac35ec4df62d 100644 --- a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc @@ -49,7 +49,7 @@ product, please let us know! == Version Built On: {docdate} + -Revision: A +Revision: 0.9.0 == Dependencies @@ -89,7 +89,13 @@ cl_int clEnqueueWriteGlobalVariableINTEL( == New API Enums -None. +New return values from *clGetEventInfo* when _param_name_ is +`CL_EVENT_COMMAND_TYPE`: + +``` +#define CL_COMMAND_READ_GLOBAL_VARIABLE_INTEL /*TODO*/ +#define CL_COMMAND_WRITE_GLOBAL_VARIABLE_INTEL /*TODO*/ +``` == New OpenCL C Functions @@ -97,6 +103,8 @@ None. == Modifications to the OpenCL API Specification +=== New Section "Global Variable Access" + Add a new subsection under Section 5, *The OpenCL Runtime* named *Global Variable Access* with the following content: @@ -108,6 +116,7 @@ cl_int clEnqueueReadGlobalVariableINTEL( cl_command_queue command_queue, cl_program program, const char* name, + cl_bool blocking_read, size_t size, size_t offset, void* ptr, @@ -119,6 +128,7 @@ cl_int clEnqueueWriteGlobalVariableINTEL( cl_command_queue command_queue, cl_program program, const char* name, + cl_bool blocking_write, size_t size, size_t offset, const void* ptr, @@ -152,6 +162,9 @@ cl_int clEnqueueWriteGlobalVariableINTEL( ** If _program_ was created in any other way, the interpretation of _name_ is implementation-defined. +* _blocking_read_ and _blocking_write_ indicate if the read and write + operations are _blocking_ or _non-blocking_ (see below). + * _size_ tells the number of bytes to read or write. * _offset_ tells the offset (number of bytes) from the start of the global @@ -184,13 +197,30 @@ device has a distinct instance of any global variables contained by that program object. These functions read or write the variable instance that resides on the target device of the _command_queue_. -These functions are non-blocking, so they may return even before data is read -or written to the global variable. The _event_ argument returns an event -object which the application can use to query the execution status of the -command. The application must not use the content of the _ptr_ buffer until -the command has completed, and the value read by -*clEnqueueReadGlobalVariableINTEL* is not available in the buffer referenced by -_ptr_ until after the command has completed. +If _blocking_read_ is `CL_TRUE` i.e. the read command is blocking, +*clEnqueueReadGlobalVariableINTEL* does not return until data is read from the +global variable and copied into memory pointed to by _ptr_. + +If _blocking_read_ is `CL_FALSE` i.e. the read command is non-blocking, +*clEnqueueReadGlobalVariableINTEL* queues a non-blocking read command and +returns. The contents of the buffer that _ptr_ points to cannot be used until +the read command has completed. The _event_ argument returns an event object +which can be used to query the execution status of the read command. When the +read command has completed, the contents of the buffer that _ptr_ points to can +be used by the application. + +If _blocking_write_ is `CL_TRUE`, the write command is blocking and does not +return until the command is complete, including transfer of the data. The +memory pointed to by _ptr_ can be reused by the application after the +*clEnqueueWriteGlobalVariableINTEL* call returns. + +If _blocking_write_ is `CL_FALSE`, the OpenCL implementation will use _ptr_ to +perform a non-blocking write. As the write is non-blocking the implementation +can return immediately. The memory pointed to by _ptr_ cannot be reused by the +application after the call returns. The _event_ argument returns an event +object which can be used to query the execution status of the write command. +When the write command has completed, the memory pointed to by _ptr_ can then +be reused by the application. *clEnqueueReadGlobalVariableINTEL* and *clEnqueueWriteGlobalVariableINTEL* return `CL_SUCCESS` if the function is executed successfully. Otherwise, they @@ -221,6 +251,10 @@ return one of the following errors: _num_events_in_wait_list_ is 0, or if event objects in _event_wait_list_ are not valid events. +* `CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST` if the read and write + operations are blocking and the execution status of any of the events in + _event_wait_list_ is a negative integer value. + * `CL_INVALID_OPERATION` if *clEnqueueReadGlobalVariableINTEL* is called for a global variable that is not readable from the host. This is the case when _program_ is created from a SPIR-V module that declares the @@ -239,6 +273,23 @@ return one of the following errors: * `CL_OUT_OF_HOST_MEMORY` if there is a failure to allocate resources required by the OpenCL implementation on the host. +=== Section 5.11: Event Objects + +Add two new rows to Table 37, *List of supported event command types*: + +[cols="1,1", options="header"] +|=== +|Events Created By +|Event Command Type + +|*clEnqueueReadGlobalVariableINTEL* +|`CL_COMMAND_READ_GLOBAL_VARIABLE_INTEL` + +|*clEnqueueWriteGlobalVariableINTEL* +|`CL_COMMAND_WRITE_GLOBAL_VARIABLE_INTEL` +|=== + + == Issues . We do not have a formal definition for the _name_ of a global variable when @@ -261,5 +312,5 @@ return one of the following errors: [options="header"] |======================================== |Rev|Date|Author|Changes -|A|2021-12-07|Greg Lueck|*Initial revision* +|0.9.0|2021-12-10|Greg Lueck|*Initial revision* |======================================== From da72888a51a297bf3598a8f81d9f2c15a59cb72a Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 10 Dec 2021 14:07:40 -0500 Subject: [PATCH 21/25] Add values for new command types These were allocated by @bashbaug. --- .../DeviceGlobal/cl_intel_global_variable_access.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc index eac35ec4df62d..6da1e28a0b929 100644 --- a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc @@ -93,8 +93,8 @@ New return values from *clGetEventInfo* when _param_name_ is `CL_EVENT_COMMAND_TYPE`: ``` -#define CL_COMMAND_READ_GLOBAL_VARIABLE_INTEL /*TODO*/ -#define CL_COMMAND_WRITE_GLOBAL_VARIABLE_INTEL /*TODO*/ +#define CL_COMMAND_READ_GLOBAL_VARIABLE_INTEL 0x418E +#define CL_COMMAND_WRITE_GLOBAL_VARIABLE_INTEL 0x418F ``` == New OpenCL C Functions From 5d15892acff4e48148c867dbf00715c06726fdc1 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 10 Dec 2021 16:01:04 -0500 Subject: [PATCH 22/25] Update OCL synopsis to match description Add the `blocking_read` / `blocking_write` parameters also to the synopsis listing. --- .../DeviceGlobal/cl_intel_global_variable_access.asciidoc | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc index 6da1e28a0b929..e2b80c6ec9bb7 100644 --- a/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/cl_intel_global_variable_access.asciidoc @@ -68,6 +68,7 @@ cl_int clEnqueueReadGlobalVariableINTEL( cl_command_queue command_queue, cl_program program, const char* name, + cl_bool blocking_read, size_t size, size_t offset, void* ptr, @@ -79,6 +80,7 @@ cl_int clEnqueueWriteGlobalVariableINTEL( cl_command_queue command_queue, cl_program program, const char* name, + cl_bool blocking_write, size_t size, size_t offset, const void* ptr, From f0d951288eaa4f2d2a0244b3907231fe77cc6540 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 16 Dec 2021 13:45:12 -0500 Subject: [PATCH 23/25] Tweak wording of decorations in SPIR-V spec This addresses comments from the FPGA team. --- ...INTEL_global_variable_decorations.asciidoc | 31 ++++++++++++------- 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc index ce52704cdee00..fbe9413a76203 100644 --- a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc @@ -121,17 +121,14 @@ API's execution environment may use to identify this variable. Legal values of _Access_: * 0 [Read] - The execution environment may read the variable from the host but - will never write it. On an FPGA device, only a read memory port is exposed - by the generated IP core. + will never write it. On an FPGA device, only a read memory port is exposed. * 1 [Write] - The execution environment may write the variable from the host but will never read it. On an FPGA device, only a write memory port is - exposed by the generated IP core. + exposed. * 2 [Read/Write] - The execution environment may read or write the variable - from the host. On an FPGA device, a read/write memory port is exposed by the - generated IP core. + from the host. On an FPGA device, a read/write memory port is exposed. * 3 [None] - The execution environment may neither read nor write the variable - from the host. On an FPGA device, no memory port is exposed by the generated - IP core. + from the host. On an FPGA device, no memory port is exposed. If a global *OpVariable* is not decorated with *HostAccessINTEL*, the default behavior is [Read/Write]. @@ -145,7 +142,8 @@ _Name_ | 6148 a| *InitModeINTEL* + -Only valid on global (module scope) *OpVariable*. +Only valid on global (module scope) *OpVariable* which has an _Initializer_ +operand. This decoration only has an effect when the consumer is an FPGA or similar device. The _Trigger_ value tells how the global variable should be @@ -153,10 +151,16 @@ initialized. Legal values of _Trigger_: -* 0 [init on device reprogram] - Initialization is performed when the device is - programmed. -* 1 [init on device reset] - Initialization is performed when a reset signal is - sent to the device. +* 0 [init on device reprogram] - Initialization is performed by reprogramming + the device. This may require more frequent reprogramming but may reduce + area. +* 1 [init on device reset] - Initialization is performed by sending a reset + signal to the device. This may increase area but may reduce reprogramming + frequency. + +If a global *OpVariable* with an _Initializer_ operand is not decorated with +*InitModeINTEL*, the method by which its value is initialized is implementation +defined. 2+| Literal Number + _Trigger_ @@ -178,6 +182,9 @@ Legal values of _Value_: * 1 [True] - Access to this memory is through a CSR interface shared with the kernel arguments. +If a global *OpVariable* is not decorated with *ImplementInCSRINTEL*, the +interface for the variable is implementation defined. + 2+| Literal Number + _Value_ | *GlobalVariableDecorationsINTEL* From 76b559cee5e222ef8a4e14a9a6a6923f65e06f82 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 16 Dec 2021 14:21:20 -0500 Subject: [PATCH 24/25] One more tweak to SPIR-V spec --- .../SPV_INTEL_global_variable_decorations.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc index fbe9413a76203..44aa92d34168a 100644 --- a/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc +++ b/sycl/doc/extensions/DeviceGlobal/SPV_INTEL_global_variable_decorations.asciidoc @@ -159,8 +159,8 @@ Legal values of _Trigger_: frequency. If a global *OpVariable* with an _Initializer_ operand is not decorated with -*InitModeINTEL*, the method by which its value is initialized is implementation -defined. +*InitModeINTEL*, the method by which the variable's value is initialized is +implementation defined. 2+| Literal Number + _Trigger_ From 2581f60d6ece455cdf28d52e26f2fbc0aa546eda Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 11 Jan 2022 11:23:08 -0500 Subject: [PATCH 25/25] Add device global design to doc index --- sycl/doc/index.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/sycl/doc/index.rst b/sycl/doc/index.rst index fc92cecf5bfe3..8945b41423953 100644 --- a/sycl/doc/index.rst +++ b/sycl/doc/index.rst @@ -40,6 +40,7 @@ Developing oneAPI DPC++ Compiler OptionalDeviceFeatures SYCLInstrumentationUsingXPTI ITTAnnotations + DeviceGlobal Development BKMs ~~~~~~~~~~~~~~~~