-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally use NumPy to allocate buffers #5750
Conversation
37f2ac3
to
4984dae
Compare
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! This is indeed a very simple change (and would be easy to make into a little module too). If it has real performance benefits, that would be great news.
Could be interesting to py-spy profile my script from #5258 (comment) with and without this change.
Yeah we use the same code in UCX. So having one standard place for it to live makes sense. Wondering if there is any value to including this in Dask as opposed to Distributed. Perhaps there are other allocations that could benefit? Yep that makes sense. Took this comment ( #5258 (comment) ) to mean a significant amount of time is spent in this allocation. Though maybe I'm missing things 😅 |
Interesting question. Nothing really comes to mind immediately?
That's correct, it's definitely what the profiles were showing. That was from a few months ago, but I'd be surprised if anything has changed since then. What I'd also be curious about is what those profiles look like with the new asyncio comms. |
Where should I be looking in the existing profiles to see this? Trying to get a sense for comparative purposes. Does it show up as a |
4984dae
to
8c11a31
Compare
Thoughts @dask/maintenance? 🙂 |
I like this: clean and simple. For just the two locations, I'm not bothered about making some compat central version. Maybe if it gets used in more places. I am curious why numpy should be faster than bytearray at all. |
Sure I did a short profile here ( #5258 (comment) ), which shows the difference. The gist is There are additional benefits of using NumPy for allocation (like using HUGEPAGES), which can also speed up allocations. Additionally one can provide custom allocators for NumPy, but that is another discussion altogether (though may be valuable in some contexts). |
Being able to record the answer to this in a comment or docstring is reason enough to me to justify putting it in a centralized place. |
9f12b23
to
6321b97
Compare
Refactor `host_array` from `distributed.comm.ucx` to `distributed.comm.utils`. Also use `host_array` to perform allocations of all host memory. Since this will use NumPy when available, this avoids the memory initialization cost that `bytearray` will otherwise pay (since it uses `calloc` to `0` initialization memory). As a result this speeds up memory allocations used for buffers in communication.
6321b97
to
824d779
Compare
Have refactored it into |
# Find the function, `host_array()`, to use when allocating new host arrays | ||
try: | ||
# Use NumPy, when available, to avoid memory initialization cost | ||
import numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd been hoping to avoid importing NumPy when it's not needed (#5729). This change feels like a fine reason to me to say "NumPy is a required import of distributed" and give up on that goal, but wanted to note it. I suppose we could defer the import into the host_array
function, but that doesn't really gain us anything. cc @crusaderky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring the import
to a function would just ensure that line is run every time we create a buffer, which adds a (small) performance hit (though larger on the first read
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. I think we should leave the import at the top-level, just wanted to point it out.
Co-authored-by: Gabe Joseph <[email protected]>
Any other thoughts on this? 🙂 |
+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to this!
Thank you both for the reviews 😄 Planning on merging EOD tomorrow if no comments |
+1 |
Thanks all! 😄 |
As NumPy can be considerably faster at allocating memory than
bytearray
( #5258 (comment) ) in addition to other benefits ( #5258 (comment) ), try to use NumPy to allocate frames to fill only falling back tobytearray
if NumPy is not an option.pre-commit run --all-files