Expose Low-Level Memory Management API (with Virtual Memory reservation) #357

igchor · 2023-03-14T22:49:00Z

Rationale (from L0 doc):
If an application needs finer grained control of physical memory consumption for device allocations then it can reserve a range of the virtual address space and map this to physical memory as needed. This provides flexibility for applications to manage large dynamic data structures which can grow and shrink over time while maintaining optimal physical memory usage.

Such API is present in CUDA and level-zero:
https://developer.nvidia.com/blog/introducing-low-level-gpu-virtual-memory-management/
https://spec.oneapi.io/level-zero/latest/core/PROG.html#reserved-device-allocations

One option is to implement API similar to the one in CUDA/level-zero. Alternatively, we could also leverage UMA memory provider abstraction. Memory provider is expected to support virtual memory reservations and physical allocations anyway.

pbalcer · 2023-03-15T12:21:55Z

Is this a gap for the current PI or a nice-to-have? In general, I agree that this makes sense, but would require us to add a substantial amount of new APIs to manage pages. As Igor said, we are already working on those as part of UMA, but that won't be ready for the first release. So, if this isn't necessary for current work, my preference would be to wait until we are ready with the UMA abstractions and expose this functionality through that layer directly (which would probably be the next release).

jandres742 · 2023-03-16T01:27:40Z

if this isn't necessary for current work, my preference would be to wait until we are ready with the UMA abstractions and expose this functionality through that layer directly (which would probably be the next release).

Thanks @pbalcer . That's a good approach.

reble · 2023-03-17T15:01:38Z

Another use case for this would be to support a deferred allocation scheme in SYCL. For example, the Command Graph Proposal (intel/llvm#5626) could benefit from such a feature. Because the user expresses the command graph first, not doing the allocation right away but only reserving virtual memory would create optimization opportunities.

vinser52 · 2023-03-20T11:38:30Z

I agree with @pbalcer, but we can use this issue to track and capture related use cases.

kbenzie · 2023-03-29T14:53:50Z

Discussion in the WG call noted that a SYCL extension is being designed by @steffenlarsen to support this use case which will introduce changes to PI, UR will also need to incorporate these.

igchor · 2023-04-04T18:55:54Z

From the CUDA link above: "For now, the only supported type of memory is pinned device memory on the current device but there are more properties to come in future CUDA releases."

Do I understand correctly that this means that it's not possible (at this moment) to use this API for shared/host allocations? @jandres742 is the same true for Level0?

jandres742 · 2023-04-04T18:59:12Z

From the CUDA link above: "For now, the only supported type of memory is pinned device memory on the current device but there are more properties to come in future CUDA releases."

Do I understand correctly that this means that it's not possible (at this moment) to use this API for shared/host allocations? @jandres742 is the same true for Level0?

Correct @igchor . Same applies to L0. In L0 the feature is actually called "Reserved Device Allocations".

igchor · 2023-04-04T19:00:57Z

@jandres742 Thanks for clarification. Is there any plan to extend this API for host/shared memory? Is this even possible?

I'm wondering what should we support on the UR level.

jandres742 · 2023-04-04T19:29:59Z

@jandres742 Thanks for clarification. Is there any plan to extend this API for host/shared memory? Is this even possible?

I'm wondering what should we support on the UR level.

at least from L0 point of view, I dont see why we couldn't support reserved host and shared allocations. If we added host and shared support in the UR, likely it wouldn't be used at this moment, but it would be ready for whenever that support is available in the future.

A way of doing this is just to define it in the UR generically for "USM allocations", and to have a parameter that indicates the type of allocation being targeted. In CUDA we even have that already. The flags are reserved for future use, so we could have here HOST, DEVICE, SHARED.

cuMemCreate ( CUmemGenericAllocationHandle* handle, size_t size, const CUmemAllocationProp* prop, unsigned long long flags )
Create a CUDA memory handle representing a memory allocation of a given size described by the given properties.
Parameters
handle
- Value of handle returned. All operations on this allocation are to be performed using this handle.
size
- Size of the allocation requested
prop
- Properties of the allocation to create.
flags
- flags for future use, must be zero now.

vinser52 · 2023-04-05T13:26:22Z

@igchor @jandres742 what is the reason to support it for Host and Shared allocations? For Device it is clear - we need memory coloring to manually distribute physical pages among tiles.

jandres742 · 2023-04-05T15:21:43Z

@igchor @jandres742 what is the reason to support it for Host and Shared allocations? For Device it is clear - we need memory coloring to manually distribute physical pages among tiles.

@vinser52 that is one use-case, right. But the way I see these interfaces is to allow finer control of memory management to layers above. With these interfaces, a specific VA can be reserved, memory can be allocated/deallocated while still keeping the VA, map/unmap as needed, etc. All that applies to any type of memory as I see it. Definitely the main use is device (which is why it is currently defined and supported there), but I can see usage in host and shared (and I have received same requests from workload owners). But definitely, if only device support is added, it would cover more basis.

vinser52 · 2023-04-05T15:32:10Z

@jandres742 Yeah, I am asking because I want to make sure we are use-case driven. Probably, it makes sense to postpone adding such low-level API until we have a strong justification/use-case.

In general, we (@pbalcer, @igchor and I) have been discussing the support of VA reservation and physical page allocation/deallocation in UMA and I hope we will introduce it. But I am not sure that we need to expose it in UR API (at least now). My point is that such low-level API is required only when the underlying infrastructure cannot do implicitly what the user wants. E.g. on CPU OS do implicit page placement/migration.

jandres742 · 2023-04-05T15:55:07Z

But I am not sure that we need to expose it in UR API (at least now).

Agree this is not an urgent topic to address. And yes, you are correct on the CPU support available.

jandres742 · 2023-04-05T16:22:55Z

SYCL extension being proposed in https://github.com/intel/llvm/pull/8954/files

Fixes oneapi-src#357 by introducing a port of the PI changes from intel/llvm#8954 which add the ability to reserve virtual memory regions separately from physical memory backing allocations. * [x] Define interfaces in the spec & header * [ ] Add conformance tests exercising interfaces

Fixes oneapi-src#357 by introducing a port of the PI changes from intel/llvm#8954 which add the ability to reserve virtual memory regions separately from physical memory backing allocations. * [x] Define interfaces in the spec & header * [ ] Add conformance tests exercising interfaces - tracked in oneapi-src#525

igchor added needs-discussion This needs further discussion specification Changes or additions to the specification labels Mar 14, 2023

kbenzie removed the needs-discussion This needs further discussion label Mar 29, 2023

kbenzie self-assigned this May 16, 2023

kbenzie mentioned this issue May 18, 2023

[ur] Introduce virtual memory interfaces #525

Merged

2 tasks

kbenzie closed this as completed in #525 Jun 19, 2023

kbenzie added this to the 0.7 milestone Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose Low-Level Memory Management API (with Virtual Memory reservation) #357

Expose Low-Level Memory Management API (with Virtual Memory reservation) #357

igchor commented Mar 14, 2023

pbalcer commented Mar 15, 2023

jandres742 commented Mar 16, 2023

reble commented Mar 17, 2023

vinser52 commented Mar 20, 2023

kbenzie commented Mar 29, 2023 •

edited

Loading

igchor commented Apr 4, 2023

jandres742 commented Apr 4, 2023

igchor commented Apr 4, 2023

jandres742 commented Apr 4, 2023

vinser52 commented Apr 5, 2023

jandres742 commented Apr 5, 2023

vinser52 commented Apr 5, 2023

jandres742 commented Apr 5, 2023

jandres742 commented Apr 5, 2023

Expose Low-Level Memory Management API (with Virtual Memory reservation) #357

Expose Low-Level Memory Management API (with Virtual Memory reservation) #357

Comments

igchor commented Mar 14, 2023

pbalcer commented Mar 15, 2023

jandres742 commented Mar 16, 2023

reble commented Mar 17, 2023

vinser52 commented Mar 20, 2023

kbenzie commented Mar 29, 2023 • edited Loading

igchor commented Apr 4, 2023

jandres742 commented Apr 4, 2023

igchor commented Apr 4, 2023

jandres742 commented Apr 4, 2023

vinser52 commented Apr 5, 2023

jandres742 commented Apr 5, 2023

vinser52 commented Apr 5, 2023

jandres742 commented Apr 5, 2023

jandres742 commented Apr 5, 2023

kbenzie commented Mar 29, 2023 •

edited

Loading