Allocation functions, memory transfers and context #53
Labels
memory
Memory allocations/transfers/operations
pi
DPC++ PI requirement
specification
Changes or additions to the specification
We've been investigating changing the PI interface for memory allocations and also to some extent for memory transfers, which in turns also changes some of the meaning of the PI context. A lot of the reasoning for these changes is based on how the SYCL DPC++ runtime currently works, but it would be good to consider them for the Unified Runtime.
The changes are:
pi_device
argument to buffer and image allocation entry points (piMemBufferCreate
,piMemImageCreate
). It doesn't necessarily mean that the allocation will only be usable on that device, but it's helpful for backends that don't natively support context style allocations. For the DPC++ SYCL runtime this makes a lot of sense because we already do lazy allocation so when we call these functions we always already know the exact device targeted and not just the context (the SYCLcontext_bound
property is not currently implemented in DPC++).piextGetMemoryConnection
that takes two pairs of(pi_device, pi_context)
, and returns information on how the memory can or should be handled between the two pairs. It currently has three options:PI_MEMORY_CONNECTION_NONE
: memory in the first(context, device)
pair cannot be used or migrated by the plugin into the second(context, device)
pair, copies through host are necessary.PI_MEMORY_CONNECTION_MIGRATABLE
: memory in the first(context, device)
pair cannot be used directly by the second(context, device)
pair, but the plugin can handle migrating data between the two (piEnqueueMemBufferCopy
).PI_MEMORY_CONNECTION_UNIFIED
: memory in the first(context, device)
pair is usable in the second pair.And with these two changes it means that a backend that doesn't natively support context-style allocations doesn't have to emulate them anymore, and can simply allocate for a specific device and report that the memory still needs to be migrated between devices in the same context. And a device that does support context-style allocations can ignore the
pi_device
passed to the allocation functions and then simply reportPI_MEMORY_CONNECTION_UNIFIED
when the contexts are identical, andPI_MEMORY_CONNECTION_NONE
when the contexts are different. In addition it also means that we can let plugins inform us if they can optimize memory copies between different context by reportingPI_MEMORY_CONNECTION_MIGRATABLE
, which would mean thatpiEnqueueMemBufferCopy
is supported between the two contexts and may be more efficient than doing a copy through host.And so to circle back to the initial motivation, CUDA doesn't have context-style memory allocations like OpenCL or PI, and so to support having multiple CUDA devices in the same
pi_context
we would have to roll out our own memory manager in the CUDA plugin (which I believe the LevelZero plugin also does), but since the SYCL runtime already has a memory manager, these PI plugin changes allow us to simply defer the management of memory allocations within the same context for the CUDA plugin to the SYCL runtime.You can see more discussions and initial implementations of this on the following PR:
The text was updated successfully, but these errors were encountered: