You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[RUNTIME][OPENCL] OpenCL host pointer support to acheive zero copy
OpenCL supports device memory access to host by memory mapping.
OpenCL flag "CL_MEM_ALLOC_HOST_PTR" enable this while creating a memory object.
We enable this feature via compilation setting "USE_OPENCL_ENABLE_HOST_PTR"
followed by a new API "GetNativePtr" on OpenCLWorkSpace.
This allows application directly use hardware allocated memory while preparing the input.
From user side we allocate NDArray which same size as graph input, access native memory and
finally call set_input_zero_copy to set the input.
Psudo code looks like
auto narr = tvm::runtime::NDArray::Empty(shape, {kDLFloat, 32, 1}, {kDLOpenCL, 0});
OpenCLWorkspace* workspace = OpenCLWorkspace::Global();
void *nptr = workspace->GetNativePtr(narr);
... access memory pointed by nptr up to the tensor size ...
tvm::runtime::PackedFunc set_input = mod.GetFunction("set_input_zero_copy");
set_input(i, narr);
0 commit comments