@@ -1393,64 +1393,16 @@ submitted in its entirety for execution via
13931393
13941394== Future Direction [[future-direction]]
13951395
1396- === Memory Allocation Nodes
1396+ This section contains both features of the specification which have been
1397+ fully developed, but are not yet implemented, as well as features which are
1398+ still in development.
13971399
1398- There is no provided interface for users to define a USM allocation/free
1399- operation belonging to the scope of the graph. It would be error prone and
1400- non-performant to allocate or free memory as a node executed during graph
1401- submission. Instead, such a memory allocation API needs to provide a way to
1402- return a pointer which won't be valid until the allocation is made on graph
1403- finalization, as allocating at finalization is the only way to benefit from
1404- the known graph scope for optimal memory allocation, and even optimize to
1405- eliminate some allocations entirely.
1406-
1407- Such a deferred allocation strategy presents challenges however, and as a result
1408- we recommend instead that prior to graph construction users perform core SYCL
1409- USM allocations to be used in the graph submission. Before to coming to this
1410- recommendation we considered the following explicit graph building interfaces
1411- for adding a memory allocation owned by the graph:
1400+ Fully developed features will be moved to the main specification once they
1401+ have been implemented.
14121402
1413- 1. Allocation function returning a reference to the raw pointer, i.e. `void*&`,
1414- which will be instantiated on graph finalization with the location of the
1415- allocated USM memory.
1416-
1417- 2. Allocation function returning a handle to the allocation. Applications use
1418- the handle in node command-group functions to access memory when allocated.
1419-
1420- 3. Allocation function returning a pointer to a virtual allocation, only backed
1421- with an actual allocation when graph is finalized or submitted.
1403+ === Features Awaiting Implementation
14221404
1423- Design 1) has the drawback of forcing users to keep the user pointer variable
1424- alive so that the reference is valid, which is unintuitive and is likely to
1425- result in bugs.
1426-
1427- Design 2) introduces a handle object which has the advantages of being a less
1428- error prone way to provide the pointer to the deferred allocation. However, it
1429- requires kernel changes and introduces an overhead above the raw pointers that
1430- are the advantage of USM.
1431-
1432- Design 3) needs specific backend support for deferred allocation.
1433-
1434- === Device Specific Graph
1435-
1436- A modifiable state `command_graph` contains nodes targeting specific devices,
1437- rather than being a device agnostic representation only tied to devices on
1438- finalization. This allows the implementation to process nodes which require
1439- device information when the command group function is evaluated. For example,
1440- a SYCL reduction implementation may desire the work-group/sub-group size, which
1441- is normally gathered by the runtime from the device associated with the queue.
1442-
1443- This design also enables the future capability for a user to compose a graph
1444- with nodes targeting different devices, allowing the benefits of defining an
1445- execution graph ahead of submission to be extended to multi-device platforms.
1446- Without this capability a user currently has to submit individual single-device
1447- graphs and use events for dependencies, which is a usage model this extension is
1448- aiming to optimize. Automatic load balancing of commands across devices is not a
1449- problem this extension currently aims to solve, it is the responsibility of the
1450- user to decide the device each command will be processed for, not the SYCL
1451- runtime.
1452-
1453- === Storage Lifetimes [[storage-lifetimes]]
1405+ ==== Storage Lifetimes [[storage-lifetimes]]
14541406
14551407The lifetime of any buffer recorded as part of a submission
14561408to a command graph will be extended in keeping with the common reference
@@ -1512,7 +1464,7 @@ associated with a buffer that was created using a host data pointer will
15121464outlive any executable graphs created from a modifiable graph which uses
15131465that buffer.
15141466
1515- === Host Tasks [[future-host-tasks]]
1467+ ==== Host Tasks [[future-host-tasks]]
15161468
15171469A {host-task}[host task] is a native C++ callable, scheduled according to SYCL
15181470dependency rules. It is valid to record a host task as part of graph, though it
@@ -1533,8 +1485,6 @@ auto node = graph.add([&](sycl::handler& cgh){
15331485});
15341486----
15351487
1536- === Graph Update
1537-
15381488==== Executable Graph Update
15391489
15401490A graph in the executable state can have each nodes inputs & outputs updated
@@ -1612,6 +1562,65 @@ Exceptions:
16121562
16131563|===
16141564
1565+ === Features Still in Development
1566+
1567+ ==== Memory Allocation Nodes
1568+
1569+ There is no provided interface for users to define a USM allocation/free
1570+ operation belonging to the scope of the graph. It would be error prone and
1571+ non-performant to allocate or free memory as a node executed during graph
1572+ submission. Instead, such a memory allocation API needs to provide a way to
1573+ return a pointer which won't be valid until the allocation is made on graph
1574+ finalization, as allocating at finalization is the only way to benefit from
1575+ the known graph scope for optimal memory allocation, and even optimize to
1576+ eliminate some allocations entirely.
1577+
1578+ Such a deferred allocation strategy presents challenges however, and as a result
1579+ we recommend instead that prior to graph construction users perform core SYCL
1580+ USM allocations to be used in the graph submission. Before to coming to this
1581+ recommendation we considered the following explicit graph building interfaces
1582+ for adding a memory allocation owned by the graph:
1583+
1584+ 1. Allocation function returning a reference to the raw pointer, i.e. `void*&`,
1585+ which will be instantiated on graph finalization with the location of the
1586+ allocated USM memory.
1587+
1588+ 2. Allocation function returning a handle to the allocation. Applications use
1589+ the handle in node command-group functions to access memory when allocated.
1590+
1591+ 3. Allocation function returning a pointer to a virtual allocation, only backed
1592+ with an actual allocation when graph is finalized or submitted.
1593+
1594+ Design 1) has the drawback of forcing users to keep the user pointer variable
1595+ alive so that the reference is valid, which is unintuitive and is likely to
1596+ result in bugs.
1597+
1598+ Design 2) introduces a handle object which has the advantages of being a less
1599+ error prone way to provide the pointer to the deferred allocation. However, it
1600+ requires kernel changes and introduces an overhead above the raw pointers that
1601+ are the advantage of USM.
1602+
1603+ Design 3) needs specific backend support for deferred allocation.
1604+
1605+ ==== Device Specific Graph
1606+
1607+ A modifiable state `command_graph` contains nodes targeting specific devices,
1608+ rather than being a device agnostic representation only tied to devices on
1609+ finalization. This allows the implementation to process nodes which require
1610+ device information when the command group function is evaluated. For example,
1611+ a SYCL reduction implementation may desire the work-group/sub-group size, which
1612+ is normally gathered by the runtime from the device associated with the queue.
1613+
1614+ This design also enables the future capability for a user to compose a graph
1615+ with nodes targeting different devices, allowing the benefits of defining an
1616+ execution graph ahead of submission to be extended to multi-device platforms.
1617+ Without this capability a user currently has to submit individual single-device
1618+ graphs and use events for dependencies, which is a usage model this extension is
1619+ aiming to optimize. Automatic load balancing of commands across devices is not a
1620+ problem this extension currently aims to solve, it is the responsibility of the
1621+ user to decide the device each command will be processed for, not the SYCL
1622+ runtime.
1623+
16151624== Issues
16161625
16171626=== Simultaneous Graph Submission
0 commit comments