Optionally use NumPy to allocate buffers #5750

jakirkham · 2022-02-03T00:49:43Z

As NumPy can be considerably faster at allocating memory than bytearray ( #5258 (comment) ) in addition to other benefits ( #5258 (comment) ), try to use NumPy to allocate frames to fill only falling back to bytearray if NumPy is not an option.

Fixes Extracting memory allocation from serialization #3970
Tests added / passed
Passes pre-commit run --all-files

jakirkham · 2022-02-03T01:34:12Z

rerun tests

gjoseph92

Cool! This is indeed a very simple change (and would be easy to make into a little module too). If it has real performance benefits, that would be great news.

Could be interesting to py-spy profile my script from #5258 (comment) with and without this change.

jakirkham · 2022-02-03T01:50:29Z

Yeah we use the same code in UCX. So having one standard place for it to live makes sense.

Wondering if there is any value to including this in Dask as opposed to Distributed. Perhaps there are other allocations that could benefit?

Yep that makes sense. Took this comment ( #5258 (comment) ) to mean a significant amount of time is spent in this allocation. Though maybe I'm missing things 😅

gjoseph92 · 2022-02-03T02:20:58Z

Wondering if there is any value to including this in Dask as opposed to Distributed. Perhaps there are other allocations that could benefit?

Interesting question. Nothing really comes to mind immediately?

Took this comment ( #5258 (comment) ) to mean a significant amount of time is spent in this allocation.

That's correct, it's definitely what the profiles were showing. That was from a few months ago, but I'd be surprised if anything has changed since then. What I'd also be curious about is what those profiles look like with the new asyncio comms.

github-actions · 2022-02-03T04:16:08Z

Unit Test Results

      12 files ±0       12 suites ±0 6h 54m 59s ⏱️ - 5m 6s
  2 620 tests ±0   2 540 ✔️ +1   80 💤 ±0 0 ❌ - 1
15 644 runs ±0 14 765 ✔️ +7 879 💤 - 6 0 ❌ - 1

Results for commit 6dd506d. ± Comparison against base commit 577ef40.

♻️ This comment has been updated with latest results.

jakirkham · 2022-02-03T23:03:47Z

Took this comment ( #5258 (comment) ) to mean a significant amount of time is spent in this allocation.

That's correct, it's definitely what the profiles were showing. That was from a few months ago, but I'd be surprised if anything has changed since then. What I'd also be curious about is what those profiles look like with the new asyncio comms.

Where should I be looking in the existing profiles to see this? Trying to get a sense for comparative purposes. Does it show up as a bytearray call? Something else?

jakirkham · 2022-02-22T20:53:56Z

Thoughts @dask/maintenance? 🙂

martindurant · 2022-02-22T23:21:10Z

I like this: clean and simple. For just the two locations, I'm not bothered about making some compat central version. Maybe if it gets used in more places.

I am curious why numpy should be faster than bytearray at all.

jakirkham · 2022-02-22T23:39:08Z

Sure I did a short profile here ( #5258 (comment) ), which shows the difference.

The gist is bytearray 0 initializes the memory (using calloc) whereas numpy.empty doesn't. So this cuts out the memory initialization cost.

There are additional benefits of using NumPy for allocation (like using HUGEPAGES), which can also speed up allocations.

Additionally one can provide custom allocators for NumPy, but that is another discussion altogether (though may be valuable in some contexts).

gjoseph92 · 2022-02-23T01:35:14Z

I am curious why numpy should be faster than bytearray at all.

Being able to record the answer to this in a comment or docstring is reason enough to me to justify putting it in a centralized place.

Refactor `host_array` from `distributed.comm.ucx` to `distributed.comm.utils`. Also use `host_array` to perform allocations of all host memory. Since this will use NumPy when available, this avoids the memory initialization cost that `bytearray` will otherwise pay (since it uses `calloc` to `0` initialization memory). As a result this speeds up memory allocations used for buffers in communication.

jakirkham · 2022-02-23T02:14:24Z

Have refactored it into distributed.comm.utils and added a comment to that effect

distributed/comm/utils.py

gjoseph92 · 2022-02-23T02:31:34Z

distributed/comm/utils.py

+# Find the function, `host_array()`, to use when allocating new host arrays
+try:
+    # Use NumPy, when available, to avoid memory initialization cost
+    import numpy


We'd been hoping to avoid importing NumPy when it's not needed (#5729). This change feels like a fine reason to me to say "NumPy is a required import of distributed" and give up on that goal, but wanted to note it. I suppose we could defer the import into the host_array function, but that doesn't really gain us anything. cc @crusaderky

Deferring the import to a function would just ensure that line is run every time we create a buffer, which adds a (small) performance hit (though larger on the first read).

Exactly. I think we should leave the import at the top-level, just wanted to point it out.

Co-authored-by: Gabe Joseph <[email protected]>

jakirkham · 2022-02-23T18:22:36Z

Any other thoughts on this? 🙂

martindurant · 2022-02-23T21:10:41Z

+1
I don't much mind whether you keep it a lambda or make a documented function. It would be one way to defer importing numpy, though.

gjoseph92

Looking forward to this!

jakirkham · 2022-02-23T23:12:53Z

Thank you both for the reviews 😄

Planning on merging EOD tomorrow if no comments

crusaderky · 2022-02-24T11:55:30Z

+1

jakirkham · 2022-02-24T22:38:31Z

Thanks all! 😄

jakirkham mentioned this pull request Feb 3, 2022

Extracting memory allocation from serialization #3970

Closed

jakirkham force-pushed the opt_use_np_alloc branch 4 times, most recently from 37f2ac3 to 4984dae Compare February 3, 2022 01:24

jakirkham mentioned this pull request Feb 3, 2022

Testing network performance #5258

Open

gjoseph92 reviewed Feb 3, 2022

View reviewed changes

jakirkham force-pushed the opt_use_np_alloc branch from 4984dae to 8c11a31 Compare February 22, 2022 20:53

jakirkham force-pushed the opt_use_np_alloc branch from 9f12b23 to 6321b97 Compare February 23, 2022 02:08

jakirkham force-pushed the opt_use_np_alloc branch from 6321b97 to 824d779 Compare February 23, 2022 02:09

gjoseph92 reviewed Feb 23, 2022

View reviewed changes

jakirkham and others added 3 commits February 22, 2022 18:47

Update comment w/Gabe's suggestions

6d974f6

Co-authored-by: Gabe Joseph <[email protected]>

Wrap line

14008b8

Space out URL

393b3da

jakirkham added 2 commits February 23, 2022 14:51

Rewrite using type annotated functions

8f33b8e

Skip type linting

d8104f8

gjoseph92 approved these changes Feb 23, 2022

View reviewed changes

Merge dask/main into jakirkham/opt_use_np_alloc

40ac932

crusaderky mentioned this pull request Feb 24, 2022

numpy is imported unconditionally #5729

Closed

Merge dask/main into jakirkham/opt_use_np_alloc

6dd506d

jakirkham merged commit 5553177 into dask:main Feb 24, 2022

jakirkham deleted the opt_use_np_alloc branch February 24, 2022 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally use NumPy to allocate buffers #5750

Optionally use NumPy to allocate buffers #5750

jakirkham commented Feb 3, 2022 •

edited

Loading

jakirkham commented Feb 3, 2022

gjoseph92 left a comment

jakirkham commented Feb 3, 2022

gjoseph92 commented Feb 3, 2022

github-actions bot commented Feb 3, 2022 •

edited

Loading

jakirkham commented Feb 3, 2022

jakirkham commented Feb 22, 2022

martindurant commented Feb 22, 2022

jakirkham commented Feb 22, 2022 •

edited

Loading

gjoseph92 commented Feb 23, 2022

jakirkham commented Feb 23, 2022

gjoseph92 Feb 23, 2022

jakirkham Feb 23, 2022

gjoseph92 Feb 23, 2022

jakirkham commented Feb 23, 2022

martindurant commented Feb 23, 2022

gjoseph92 left a comment

jakirkham commented Feb 23, 2022

crusaderky commented Feb 24, 2022

jakirkham commented Feb 24, 2022

Optionally use NumPy to allocate buffers #5750

Optionally use NumPy to allocate buffers #5750

Conversation

jakirkham commented Feb 3, 2022 • edited Loading

jakirkham commented Feb 3, 2022

gjoseph92 left a comment

Choose a reason for hiding this comment

jakirkham commented Feb 3, 2022

gjoseph92 commented Feb 3, 2022

github-actions bot commented Feb 3, 2022 • edited Loading

Unit Test Results

jakirkham commented Feb 3, 2022

jakirkham commented Feb 22, 2022

martindurant commented Feb 22, 2022

jakirkham commented Feb 22, 2022 • edited Loading

gjoseph92 commented Feb 23, 2022

jakirkham commented Feb 23, 2022

gjoseph92 Feb 23, 2022

Choose a reason for hiding this comment

jakirkham Feb 23, 2022

Choose a reason for hiding this comment

gjoseph92 Feb 23, 2022

Choose a reason for hiding this comment

jakirkham commented Feb 23, 2022

martindurant commented Feb 23, 2022

gjoseph92 left a comment

Choose a reason for hiding this comment

jakirkham commented Feb 23, 2022

crusaderky commented Feb 24, 2022

jakirkham commented Feb 24, 2022

jakirkham commented Feb 3, 2022 •

edited

Loading

github-actions bot commented Feb 3, 2022 •

edited

Loading

jakirkham commented Feb 22, 2022 •

edited

Loading