ARROW-4468: [Rust] Implement BitAnd/BitOr for &Buffer (with SIMD) #3571

paddyhoran · 2019-02-06T04:16:30Z

This PR lays the ground work for future PR's adding explicit SIMD to what was called "array_ops". My plan is to migrate "ops" into the compute sub-module as I add explicit SIMD.

This PR includes the following:

introduces the packed_simd crate
introduces the compute sub-module
adds bitwise_and and bitwise_or functions for Buffer's

I'm adding these first as they are needed when updating the bitmap in all array kernels. The functions above include compile time SIMD, .i.e you have to use RUSTFLAGS="-C target-feature=+avx2" or similar. However, even without this you should see a speed up as most modern processors will use certain instructions even if you do not explicitly ask for them (in my case the computation is completely memory bound and using larger registers would not speed this up further).

I have included a benchmark to illustrate the speed up. However, as the compute sub-module evolves I expect that the benchmarks will be for array kernels and not buffer kernels (at which point bitwise_bin_op_default and bitwise_bin_op_simd will be made private again).

If interested also please see this discussion with the maintainer of packed_simd for some background.

kszucs · 2019-02-06T10:48:25Z

@paddyhoran it looks great! I'm glad that You've started to implement the compute module with SIMD!

However providing operator kernels for buffers seems a bit odd to me. How about implementing
core::ops::BitAnd and core::ops::BitOr on arrow::bitmap::Bitmap instead?

paddyhoran · 2019-02-06T13:49:32Z

How about implementing core::ops::BitAnd and core::ops::BitOr on arrow::bitmap::Bitmap instead?

Yea, I started to do this but decided not to. I plan for these functions to evolve over time as I add the rest of the kernels. One situation that came to mind was that when implementing kernels on arrays I will likely want to process the values buffer and bitmap buffer in the same loop which will be made difficult if using BitAnd and BitOr.

As SIMD is being introduced I wanted to submit this PR as it's smaller in scope, isolated to buffers, so that people could comment on the SIMD approach. Most of what will be in the *boolean_kernels" file will be kernels for arrays and these functions may just become "helper" functions.

In short, I do plan to add BitAnd/BitOr but didn't want to back myself into a corner. I plan to have all this cleaned up for the 0.13.0 release.

siddharthteotia · 2019-02-07T19:04:39Z

rust/arrow/src/compute/boolean_kernels.rs

I just want to understand the approach here. Do we do the following:

SIMD_AND (left data, right data) -> result as bitmask
SIMD_AND(left validity, right validity) -> result as bitmask

The first SIMD would have performed the operation on garbage data as well (where validity bit is 0) but that is fine as we avoid a branch and finally use validity to get the end result

SIMD_AND (result1, result 2)

I am not sure how exactly do we use validity in the below code

Yes, that's correct.

Our BooleanArray (bit packed) and Bitmask are both backed by a Buffer. For other kernels like ADD, etc. we will reuse the SIMD_AND on the Bitmask.

siddharthteotia · 2019-02-07T19:06:30Z

@paddyhoran , this is great.

Are we adding SIMD kernels in a generic manner? In other words, can we leverage SIMD acceleration done here for similar operations on arrow vectors in other languages C++, Java etc?

paddyhoran · 2019-02-07T20:29:24Z

Are we adding SIMD kernels in a generic manner? In other words, can we leverage SIMD acceleration done here for similar operations on arrow vectors in other languages C++, Java etc?

This is one of the great value propositions of Arrow, writing computation in one language and re-using it across many. Right now, I plan to update all our compute in Rust to use SIMD by the 0.13.0 release. Once up and running we could talk about how we could expose it to other implementations.

I really want to be able to write high-performance code in Rust and expose it to pyarrow but I have not had a chance to look at it yet...

siddharthteotia · 2019-02-07T20:39:49Z

LGTM - This can be merged if you are not planning to add more changes.

Another question though -- from my prior experience with SIMD (in C/C++ land), I leveraged Intel compiler intrinsics (platform dependent) to work with underlying SIMD instructions. How are these instructions available in Rust?

Secondly, I see that in the code we check if target architecture (x86/x86_64) has support for SIMD. I am not sure where we are checking if we have to use SSE or AVX256, AVX512?

paddyhoran · 2019-02-07T21:03:07Z

Actually I do plan to refactor this a little tonight, @kszucs got me thinking..

Intrinsics are in the stdsimd crate in the standard library of Rust. There is no runtime detection in this PR, you have to choose at compile time if you want to use AVX512 for example. However, packed_simd is quite nice in that using the u8x64 type is portable, on systems without 512 bit registers it will use 2 256 bit registers or similiar.

We can look at adding runtime detection of instructions but I plan to add this later behind a feature flag as it's not always what you want to do for maximum performance. For instance, right now if I don't ask for any specific instructions it defaults to SSE. I have AVX2 available on my dev machine but there's no difference as the computation is completely memory bound.

As others will likely be building higher level libraries on Arrow I thought it best give all the options to developers (i.e. static detection + optional runtime detection behind a feature flag).

paddyhoran · 2019-02-08T03:43:21Z

@kszucs @siddharthteotia @andygrove @sunchao could you please give this a review when you have a chance.

@kszucs I re-worked the code to use traits instead while extracting what I could for re-use in other kernel implementations (where I will want to operate on the values buffer and bitmap buffer in the same loop).

I removed the compute sub module (which I will re-introduce as part of ARROW-4490).

I left the benchmark in place but renamed it to illustrate the speed up but this will be replaced by benchmarks on the array kernels in time.

paddyhoran · 2019-02-08T13:42:55Z

rust/arrow/src/buffer.rs

I should have made this generic over the operators (& and |) to clean this up :(, I'll follow up when I get a chance... Please ignore this duplication of code if reviewing.

siddharthteotia · 2019-02-08T19:19:37Z

rust/arrow/benches/bitwise_ops.rs

I wonder where are the similar functions for bitwise OR?

Eventually I plan to benchmark the array kernels not the buffer level operations. The current benchmark was just to demonstrate that the PR was in fact speeding things up so I just added the AND version almost as an example. I'll add the OR version also.

xhochy · 2019-02-08T21:30:48Z

Rebased.

siddharthteotia · 2019-02-12T18:34:23Z

LGTM. @paddyhoran , are you planning to add more changes or this is good to go?

paddyhoran · 2019-02-12T19:19:05Z

This is good to go IMHO. I'd like to get it merged soon as I'm almost ready to submit another PR that builds on it.

siddharthteotia · 2019-02-12T20:01:45Z

Merged. Thanks @paddyhoran

siddharthteotia reviewed Feb 7, 2019

View reviewed changes

paddyhoran changed the title ~~ARROW-4468: [Rust] Implement AND/OR kernels for Buffer (with SIMD)~~ ARROW-4468: [Rust] Implement BitAnd/BitOr for &Buffer (with SIMD) Feb 8, 2019

paddyhoran commented Feb 8, 2019

View reviewed changes

siddharthteotia reviewed Feb 8, 2019

View reviewed changes

siddharthteotia approved these changes Feb 8, 2019

View reviewed changes

xhochy force-pushed the master branch from 29f76df to 2b9155a Compare February 8, 2019 21:00

paddyhoran added 4 commits February 8, 2019 22:30

Added (SIMD) bitwise AND/OR for Buffer

fe2cda5

Added benchmark

6a731b7

Removed boolean kernels in favor of traits.

26d58db

Updated typo.

a8534b1

xhochy force-pushed the boolean-buffer-kernels branch from db732c1 to a8534b1 Compare February 8, 2019 21:30

paddyhoran added 2 commits February 9, 2019 13:21

Cleaned up code.

07ccbc5

Updated benchmark to include bitwise OR op

3476ace

siddharthteotia merged commit 3ceef46 into apache:master Feb 12, 2019

paddyhoran deleted the boolean-buffer-kernels branch February 13, 2019 20:33

asfimport mentioned this pull request Feb 12, 2019

[Rust] Implement BitAnd/BitOr for &Buffer (with SIMD) #21025

Closed

ARROW-4468: [Rust] Implement BitAnd/BitOr for &Buffer (with SIMD) #3571

ARROW-4468: [Rust] Implement BitAnd/BitOr for &Buffer (with SIMD) #3571

Uh oh!

Conversation

paddyhoran commented Feb 6, 2019

Uh oh!

kszucs commented Feb 6, 2019

Uh oh!

paddyhoran commented Feb 6, 2019

Uh oh!

siddharthteotia Feb 7, 2019

Choose a reason for hiding this comment

Uh oh!

paddyhoran Feb 7, 2019

Choose a reason for hiding this comment

Uh oh!

siddharthteotia Feb 7, 2019

Choose a reason for hiding this comment

Uh oh!

siddharthteotia commented Feb 7, 2019

Uh oh!

paddyhoran commented Feb 7, 2019

Uh oh!

siddharthteotia commented Feb 7, 2019

Uh oh!

paddyhoran commented Feb 7, 2019

Uh oh!

paddyhoran commented Feb 8, 2019

Uh oh!

paddyhoran Feb 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddharthteotia Feb 8, 2019

Choose a reason for hiding this comment

Uh oh!

paddyhoran Feb 8, 2019

Choose a reason for hiding this comment

Uh oh!

siddharthteotia Feb 12, 2019

Choose a reason for hiding this comment

Uh oh!

xhochy commented Feb 8, 2019

Uh oh!

siddharthteotia commented Feb 12, 2019

Uh oh!

paddyhoran commented Feb 12, 2019

Uh oh!

siddharthteotia commented Feb 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

paddyhoran Feb 8, 2019 •

edited

Loading