Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally use NumPy to allocate buffers #5750

Merged
merged 8 commits into from
Feb 24, 2022

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Feb 3, 2022

As NumPy can be considerably faster at allocating memory than bytearray ( #5258 (comment) ) in addition to other benefits ( #5258 (comment) ), try to use NumPy to allocate frames to fill only falling back to bytearray if NumPy is not an option.


@jakirkham jakirkham force-pushed the opt_use_np_alloc branch 4 times, most recently from 37f2ac3 to 4984dae Compare February 3, 2022 01:24
@jakirkham
Copy link
Member Author

rerun tests

Copy link
Collaborator

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! This is indeed a very simple change (and would be easy to make into a little module too). If it has real performance benefits, that would be great news.

Could be interesting to py-spy profile my script from #5258 (comment) with and without this change.

@jakirkham
Copy link
Member Author

Yeah we use the same code in UCX. So having one standard place for it to live makes sense.

Wondering if there is any value to including this in Dask as opposed to Distributed. Perhaps there are other allocations that could benefit?

Yep that makes sense. Took this comment ( #5258 (comment) ) to mean a significant amount of time is spent in this allocation. Though maybe I'm missing things 😅

@gjoseph92
Copy link
Collaborator

Wondering if there is any value to including this in Dask as opposed to Distributed. Perhaps there are other allocations that could benefit?

Interesting question. Nothing really comes to mind immediately?

Took this comment ( #5258 (comment) ) to mean a significant amount of time is spent in this allocation.

That's correct, it's definitely what the profiles were showing. That was from a few months ago, but I'd be surprised if anything has changed since then. What I'd also be curious about is what those profiles look like with the new asyncio comms.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2022

Unit Test Results

       12 files  ±0         12 suites  ±0   6h 54m 59s ⏱️ - 5m 6s
  2 620 tests ±0    2 540 ✔️ +1    80 💤 ±0  0  - 1 
15 644 runs  ±0  14 765 ✔️ +7  879 💤  - 6  0  - 1 

Results for commit 6dd506d. ± Comparison against base commit 577ef40.

♻️ This comment has been updated with latest results.

@jakirkham
Copy link
Member Author

Took this comment ( #5258 (comment) ) to mean a significant amount of time is spent in this allocation.

That's correct, it's definitely what the profiles were showing. That was from a few months ago, but I'd be surprised if anything has changed since then. What I'd also be curious about is what those profiles look like with the new asyncio comms.

Where should I be looking in the existing profiles to see this? Trying to get a sense for comparative purposes. Does it show up as a bytearray call? Something else?

@jakirkham
Copy link
Member Author

Thoughts @dask/maintenance? 🙂

@martindurant
Copy link
Member

I like this: clean and simple. For just the two locations, I'm not bothered about making some compat central version. Maybe if it gets used in more places.

I am curious why numpy should be faster than bytearray at all.

@jakirkham
Copy link
Member Author

jakirkham commented Feb 22, 2022

Sure I did a short profile here ( #5258 (comment) ), which shows the difference.

The gist is bytearray 0 initializes the memory (using calloc) whereas numpy.empty doesn't. So this cuts out the memory initialization cost.

There are additional benefits of using NumPy for allocation (like using HUGEPAGES), which can also speed up allocations.

Additionally one can provide custom allocators for NumPy, but that is another discussion altogether (though may be valuable in some contexts).

@gjoseph92
Copy link
Collaborator

I am curious why numpy should be faster than bytearray at all.

Being able to record the answer to this in a comment or docstring is reason enough to me to justify putting it in a centralized place.

Refactor `host_array` from `distributed.comm.ucx` to
`distributed.comm.utils`. Also use `host_array` to perform allocations
of all host memory. Since this will use NumPy when available, this
avoids the memory initialization cost that `bytearray` will otherwise
pay (since it uses `calloc` to `0` initialization memory). As a result
this speeds up memory allocations used for buffers in communication.
@jakirkham
Copy link
Member Author

Have refactored it into distributed.comm.utils and added a comment to that effect

distributed/comm/utils.py Outdated Show resolved Hide resolved
distributed/comm/utils.py Outdated Show resolved Hide resolved
# Find the function, `host_array()`, to use when allocating new host arrays
try:
# Use NumPy, when available, to avoid memory initialization cost
import numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd been hoping to avoid importing NumPy when it's not needed (#5729). This change feels like a fine reason to me to say "NumPy is a required import of distributed" and give up on that goal, but wanted to note it. I suppose we could defer the import into the host_array function, but that doesn't really gain us anything. cc @crusaderky

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring the import to a function would just ensure that line is run every time we create a buffer, which adds a (small) performance hit (though larger on the first read).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I think we should leave the import at the top-level, just wanted to point it out.

@jakirkham
Copy link
Member Author

Any other thoughts on this? 🙂

@martindurant
Copy link
Member

+1
I don't much mind whether you keep it a lambda or make a documented function. It would be one way to defer importing numpy, though.

Copy link
Collaborator

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to this!

@jakirkham
Copy link
Member Author

Thank you both for the reviews 😄

Planning on merging EOD tomorrow if no comments

@crusaderky
Copy link
Collaborator

+1

@jakirkham jakirkham merged commit 5553177 into dask:main Feb 24, 2022
@jakirkham jakirkham deleted the opt_use_np_alloc branch February 24, 2022 22:38
@jakirkham
Copy link
Member Author

Thanks all! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants