Skip to content

Conversation

@seibs
Copy link
Contributor

@seibs seibs commented Dec 14, 2017

I took a swing at improving the CopyBitmap performance (benchmarks below). I'm a C/C++ novice, so I thought I'd get some feedback before I went too much further.

Starting Point

Run on (4 X 2208 MHz CPU s)
12/13/17 21:15:18
Benchmark                                        Time           CPU Iterations
------------------------------------------------------------------------------
BM_CopyBitmap/97.6563k/0/min_time:1.000       4779 us       4758 us        289   20.0445MB/s
BM_CopyBitmap/976.563k/0/min_time:1.000      47740 us      47476 us         26   20.0875MB/s
BM_CopyBitmap/97.6563k/4/min_time:1.000       4858 us       4866 us        289   19.5991MB/s
BM_CopyBitmap/976.563k/4/min_time:1.000      48117 us      47953 us         29   19.8879MB/s

Using stanford bithacks for SetBitTo

Run on (4 X 2208 MHz CPU s)
12/13/17 21:22:05
Benchmark                                        Time           CPU Iterations
------------------------------------------------------------------------------
BM_CopyBitmap/97.6563k/0/min_time:1.000       1647 us       1649 us        815   57.8415MB/s
BM_CopyBitmap/976.563k/0/min_time:1.000      16368 us      16397 us         81   58.1629MB/s
BM_CopyBitmap/97.6563k/4/min_time:1.000       1599 us       1610 us        815   59.2186MB/s
BM_CopyBitmap/976.563k/4/min_time:1.000      16026 us      16011 us         81   59.5644MB/s

memcpy + shifting
This solution provides varying performance depending on whether or not the bit offset is a multiple of 8

Run on (4 X 2208 MHz CPU s)
12/13/17 21:23:44
Benchmark                                        Time           CPU Iterations
------------------------------------------------------------------------------
BM_CopyBitmap/97.6563k/0/min_time:1.000          5 us          5 us     280000   18.9651GB/s
BM_CopyBitmap/976.563k/0/min_time:1.000         62 us         61 us      22400   15.1721GB/s
BM_CopyBitmap/97.6563k/4/min_time:1.000        171 us        170 us       6892   560.872MB/s
BM_CopyBitmap/976.563k/4/min_time:1.000       1639 us       1639 us        896   581.782MB/s

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you for taking a look at this. There are two code formatting issues reported by Travis. Can you fix these two?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you keep this an int64_t?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove these reindentations?

@seibs seibs force-pushed the ARROW-764 branch 2 times, most recently from e5dd129 to 74e4219 Compare December 22, 2017 20:11
@seibs
Copy link
Contributor Author

seibs commented Dec 22, 2017

Thanks for taking a look @xhochy.

My latest push fixes the formatting issues and also adds a test+fix for a few corner cases my first upload didn't handle correctly.

Benchmark with recent changes

Run on (4 X 2208 MHz CPU s)
12/22/17 15:14:31
Benchmark                                        Time           CPU Iterations
------------------------------------------------------------------------------
BM_CopyBitmap/97.6563k/0/min_time:1.000          5 us          5 us     270310   17.9019GB/s
BM_CopyBitmap/976.563k/0/min_time:1.000         65 us         65 us      21854   14.3143GB/s
BM_CopyBitmap/97.6563k/4/min_time:1.000        153 us        152 us       8960   628.592MB/s
BM_CopyBitmap/976.563k/4/min_time:1.000       1534 us       1535 us        896   621.449MB/s

w.r.t. the comments above in CMakeLists.txt, I actually have a separate pull request open for them at #1406.

@xhochy
Copy link
Member

xhochy commented Dec 30, 2017

@seibs Can resolve the two comments in the other pull request? I would then merge both :)

@wesm wesm changed the title WIP ARROW-764: [C++] Improves performance of CopyBitmap and adds benchmarks ARROW-764: [C++] Improves performance of CopyBitmap and adds benchmarks Jan 2, 2018
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, rebased. Will merge once build passes

@wesm
Copy link
Member

wesm commented Jan 5, 2018

Rebased again to see if build passes...

@siddharthteotia
Copy link
Contributor

What are the general performance concerns we are addressing here? I am just trying to understand if there is something similar to be done on JAVA side for bitvector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should write down explicit types for these autos

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the autos with types.

@wesm
Copy link
Member

wesm commented Jan 7, 2018

@siddharthteotia there's a couple places in the codebase where we copy bits out of a bitmap, e.g. after slicing vectors

RETURN_NOT_OK(CopyBitmap(pool, input->data(), offset, length, buffer));
. Depending on the slice offset (whether it's a multiple of 8) we have to either copy 1 bit at a time or do a memcpy

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Build failing due to some standard flakiness

@wesm wesm closed this in 9eae508 Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants