Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ur] extend ur_usm_pool_limits_desc_t and rename members #324

Closed
wants to merge 1 commit into from

Conversation

igchor
Copy link
Member

@igchor igchor commented Mar 6, 2023

No description provided.

@@ -132,14 +132,17 @@ members:
- type: "size_t"
name: maxPoolSize
desc: "[in] Maximum size of a memory pool"
- type: "size_t"
name: maxPoolSizePerDevice
desc: "[in] Maximum size of a memory pool per device"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected memPoolSize to be per device. Otherwise, how would an implementation enforce these max pool size limits? Fail to allocate on one device because some global limit has been reached due to allocations on another device? Is this useful behavior for some applications/use cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in UR, pools are per-context, so limiting the amount of memory for the entire context might make sense (it is a set of 'somehow' related devices, after all). Yes, the implementation would fail to allocate from the pool and request the memory from the driver. I'm not entirely sure about the usefulness of this, but for L0 PI I belive there is a limit for overall memory pool size (not even per context): https://github.com/intel/llvm/blob/sycl/sycl/plugins/unified_runtime/ur/usm_allocator_config.cpp#L76

Copy link
Contributor

@pbalcer pbalcer Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at USM allocator implementation, it seems this is used to determine whether to keep the memory around in the allocator or give it back to the driver. If I'm right, I think this needs a better name and description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, I'll try to think of some better name. One other thing - the USM allocator implementation has only one instance of this variable (common for all devices/contexts). Do you think that makes sense or we should have this per device/context like I proposed. I'm not sure if we need to support the global setting for compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this should only ever be a per-physical pool memory setting. From the user's perspective, this is intended to prevent situations where a single process is hogging all the physical memory. Having a single global limit might hamper allocation performance for one device without necessarily helping to alleviate memory pressure in the memory pool that's actually consuming too much space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. However, I'm wondering whether we should specify what a 'physical pool' is. I'm not 100% sure but I belive in some cases we could have a single physical pool per context (with multiple devices, if they have appropriate P2P capabilities), and in some cases (most of the time) we would want per-device pool. So, perhaps just leaving this as is (without specifying if it's per-device or context) is the best option? (but then, does it make sense to have the same settings for those 2 cases...?)

Copy link
Contributor

@pbalcer pbalcer Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The topology can get complex. Maybe we can somehow define this in the context of 'local'/'device' memory and 'elsewhere'/'global' memory. Meaning that a pool would hold on to only X 'local' memory, and Y 'elsewhere' memory.

Jemalloc defines the concept of 'decay' that, instead of specifying limits in terms of space, defines it in terms of time - for how long the pool is allowed to keep memory pages around. I think that's better because it wouldn't hamper performance if the pages are actually actively used, and would be meaningful for any type of memory - since the 'timer' is associated with each page/extent, not the device in the pool. So if the allocator actively prefers local memory, any 'elsewhere' memory would naturally decay and get freed once it's no longer needed.

Remove capacity field - having a single limit for all buckets does not
make much sense. It's also hard to pick a correct value without any
runtime allocation statistics.

Rename maxPoolSize to maxUnfreedMemory to make it more clear.
@igchor igchor force-pushed the pool_limits branch 2 times, most recently from 9ae3b72 to 38f60b0 Compare March 13, 2023 20:55
@igchor
Copy link
Member Author

igchor commented Mar 21, 2023

I've created an issue to discuss this in more detail and closed this PR for now: #369

@igchor igchor closed this Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants