-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ur] extend ur_usm_pool_limits_desc_t and rename members #324
Conversation
scripts/core/usm.yml
Outdated
@@ -132,14 +132,17 @@ members: | |||
- type: "size_t" | |||
name: maxPoolSize | |||
desc: "[in] Maximum size of a memory pool" | |||
- type: "size_t" | |||
name: maxPoolSizePerDevice | |||
desc: "[in] Maximum size of a memory pool per device" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected memPoolSize
to be per device. Otherwise, how would an implementation enforce these max pool size limits? Fail to allocate on one device because some global limit has been reached due to allocations on another device? Is this useful behavior for some applications/use cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in UR, pools are per-context, so limiting the amount of memory for the entire context might make sense (it is a set of 'somehow' related devices, after all). Yes, the implementation would fail to allocate from the pool and request the memory from the driver. I'm not entirely sure about the usefulness of this, but for L0 PI I belive there is a limit for overall memory pool size (not even per context): https://github.com/intel/llvm/blob/sycl/sycl/plugins/unified_runtime/ur/usm_allocator_config.cpp#L76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at USM allocator implementation, it seems this is used to determine whether to keep the memory around in the allocator or give it back to the driver. If I'm right, I think this needs a better name and description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, I'll try to think of some better name. One other thing - the USM allocator implementation has only one instance of this variable (common for all devices/contexts). Do you think that makes sense or we should have this per device/context like I proposed. I'm not sure if we need to support the global setting for compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this should only ever be a per-physical pool memory setting. From the user's perspective, this is intended to prevent situations where a single process is hogging all the physical memory. Having a single global limit might hamper allocation performance for one device without necessarily helping to alleviate memory pressure in the memory pool that's actually consuming too much space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. However, I'm wondering whether we should specify what a 'physical pool' is. I'm not 100% sure but I belive in some cases we could have a single physical pool per context (with multiple devices, if they have appropriate P2P capabilities), and in some cases (most of the time) we would want per-device pool. So, perhaps just leaving this as is (without specifying if it's per-device or context) is the best option? (but then, does it make sense to have the same settings for those 2 cases...?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The topology can get complex. Maybe we can somehow define this in the context of 'local'/'device' memory and 'elsewhere'/'global' memory. Meaning that a pool would hold on to only X 'local' memory, and Y 'elsewhere' memory.
Jemalloc defines the concept of 'decay' that, instead of specifying limits in terms of space, defines it in terms of time - for how long the pool is allowed to keep memory pages around. I think that's better because it wouldn't hamper performance if the pages are actually actively used, and would be meaningful for any type of memory - since the 'timer' is associated with each page/extent, not the device in the pool. So if the allocator actively prefers local memory, any 'elsewhere' memory would naturally decay and get freed once it's no longer needed.
Remove capacity field - having a single limit for all buckets does not make much sense. It's also hard to pick a correct value without any runtime allocation statistics. Rename maxPoolSize to maxUnfreedMemory to make it more clear.
9ae3b72
to
38f60b0
Compare
I've created an issue to discuss this in more detail and closed this PR for now: #369 |
No description provided.