-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory consumption for buffer pool #1150
Conversation
@lroberts36 I came up with a different approach (not using a user input variable and keeping the allocations in large chunks). |
@pgrete: I think this will work but it is maybe not ideal. First, I think this counts all buffers even for variables that are not allocated, so when running with pack size = -1 this will allocate buffers for every field wether or not it is allocated. Second, the number of buffers allocated will depend on the partition of the block list since this routine operates per MeshData. One other thing I remembered is that the buffer sizes are not calculated as accurately as they could be. Rather than calculate the actual index range, it just uses a heuristic for the size that is guaranteed to be larger but is possibly too large by a factor of a couple. |
How about the updated version? Regarding the second point, I don't think this is going to be a huge issue in practice because the extreme case with pack_size=few and many (small) blocks per rank is only occurring on host runs for which memory allocation are much less of an issue than on device runs. Finally, I think that this is generally a tricky issue to optimize, which is why I'm trying to go for a default that covers most use cases and not introducing a parameter that the (inexperienced) end user might easily miss (or misconfigure). What do you think?
I don't think it's an issue. The differences should be small-ish (especially for large block sizes and for very small block sizes the overhead in terms of active to passive cells is huge anyway). |
Okay, I need more eyes/input here as I don't understand what's going on. |
Alright, I think I figured out (and understand) more now. So I updated the logic, that the pool creation remains (with a default number of buffer to be dynamically being added scaled by the number of packs per rank) and added routine to immediately add object to the pools (matching the required number of buffers for the given meshdata container). For the
In total, we have 25 separate allocation events (all reserving, non dynamic) over the runtime for How does that solution look? Finally, small related question: What's going on with this line (I understand the first part of the conditional, but not the second -- or more specifically, what are the circumstances where the second one is false): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach makes sense to me.
std::cerr << "Reserving " << new_buffers_req << " new buffers of size " << buf_size | ||
<< " to pool with " << pool.NumBuffersInPool() << " buffers because " | ||
<< nbufs.at(buf_size) << " are required in total.\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a flag to possibly suppress the std::cerr
outputs?
Yes, this is correct.
This solution looks reasonable to me. Sorry, saw the comment about removing
Multigrid logic requires blocks sending messages to themselves (since the same block can show up on two multigrid levels). This doesn't require any data transfer, so the message size can be zero. It is essentially just a flag to show that the block is done being used on one level and can be used on the next level. |
I cleaned up the PR. This is ready for a second review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable to me. I appreciate the new detailed comments.
PR Summary
Tries to reduce the memory footprint of the object pool by precalculating the number of buffers required for each type.
Also fixes an original overflow for very large buffer sizes (and the default 200 number of buffers).
PR Checklist