Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit Roaring Bitmaps in C #534

Merged
merged 11 commits into from
Jan 8, 2024
Merged

64-bit Roaring Bitmaps in C #534

merged 11 commits into from
Jan 8, 2024

Conversation

SLieve
Copy link
Contributor

@SLieve SLieve commented Dec 30, 2023

This copies the idea from the Java version of Roaring Bitmaps, using an Adaptive Radix Trie (ART) for the high 48 bits of Roaring Bitmap entries, and the existing 16-bit containers for the low 16 bits.

This PR includes an implementation of an ART slightly specialized for Roaring Bitmaps. The performance of roaring64 seems to be roughly twice as slow as the 32-bit implementation, according to the microbenchmarks on my machine, but there's likely plenty of room for optimization.

I normally write C++ code, so please point out any c-isms that I've missed!

TODO:

  • Make the keys used in the ART endian-agnostic and OS-agnostic. Right now I rely on byteswap.h, which assumes Linux + little endian.
  • Improve the public API of roaring64.h. Right now art_t, art_val_t, and leaf_t are exposed.
  • Error on failed allocations.
  • Iteration. I would like to create a container iterator which can be used across 32 and 64 bit bitmaps.
  • Iteration-based functions like roaring64_bitmap_intersect_with_range.
  • Serialization / deserialization.
  • Copy on write.

src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
@lemire
Copy link
Member

lemire commented Jan 1, 2024

We have some support for big endian systems. As far as I can tell, we have no big-endian user, but we ensure that it runs nonetheless, except for serialization and deserialization. You could just declare that you don't support big endian systems for this particular feature, that's fine. My expectation is that the few people who need big endian support should chip in and help provide it.

This being said, there is no harm in supporting big endian systems from the get go. A macro is available...

#if CROARING_IS_BIG_ENDIAN

You can roll your own byte reversal routines if you need them...

uint16_t swap_bytes(const uint16_t word) {
return (word >> 8) | (word << 8);
}

(Same for any word size.) It will compile to efficient code.

I don't think you need system-specific functions.

@lemire
Copy link
Member

lemire commented Jan 1, 2024

This is great work. We will be happy to merge this!!!

@SLieve SLieve force-pushed the art branch 9 times, most recently from 5f340b5 to 4716345 Compare January 1, 2024 21:53
@lemire
Copy link
Member

lemire commented Jan 2, 2024

We are getting CI failures because the new tests take a very long time. More than 1500s for a single test sequence is quite long (25 minutes). I recommend making the tests faster. It is always possible to have optional parameters to allow longer tests for specialized purposes.

image

@SLieve
Copy link
Contributor Author

SLieve commented Jan 2, 2024

Removed the slightly larger tests. They only performed 300 iterations, which seemed like it would've been small enough! In the Ubuntu-Debug-Sanitized-CI workflow the test only took 0.1s for comparison, so likely something was off with the test.

A question before merging: how do you feel about art_t, art_val_t, and leaf_t being exposed in roaring64.h? In roaring_types.h there are some macros to hide the container type from the public API. This is slightly harder for roaring64.h as the relevant types are directly included as fields rather than through pointers.

@lemire lemire requested a review from Dr-Emann January 2, 2024 21:44
@lemire
Copy link
Member

lemire commented Jan 2, 2024

@SLieve I personally do not care very much as to whether the internal are visible though it can be documented more carefully if you'd like (indicate that it is not part of the public interface and subject to change?).

@Dr-Emann : Would you review?

@Dr-Emann
Copy link
Member

Dr-Emann commented Jan 2, 2024

I have a little worry about exposing names quite so generic, I think it's not impossible that someone wanting to use CRoaring might already have a leaf_t type for some other purpose already.

@SLieve SLieve force-pushed the art branch 2 times, most recently from 7c952c1 to fe5ebb1 Compare January 2, 2024 22:56
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
include/roaring/art/art.h Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
src/art/art.c Outdated Show resolved Hide resolved
include/roaring/art/art.h Outdated Show resolved Hide resolved
@SLieve
Copy link
Contributor Author

SLieve commented Jan 3, 2024

Thanks for the review @Dr-Emann, the code is better for it. I also realized we can entirely hide roaring64_bitmap_t and therefore all ART types too, which fixes that issue.

Copy link
Member

@Dr-Emann Dr-Emann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we need to add this to the amalgamation script as well

include/roaring/roaring64.h Outdated Show resolved Hide resolved
src/roaring64.c Outdated Show resolved Hide resolved
src/roaring64.c Outdated Show resolved Hide resolved
@SLieve
Copy link
Contributor Author

SLieve commented Jan 4, 2024

I'll create a separate PR to add it to the amalgamation script.

@SLieve
Copy link
Contributor Author

SLieve commented Jan 4, 2024

Looks like the test added in #540 is causing failures in sanitized tests.

@lemire
Copy link
Member

lemire commented Jan 4, 2024

Yes. I introduced a memory leak.

@lemire
Copy link
Member

lemire commented Jan 4, 2024

See #541

This will be used to store the high 48 bits of 64 bit roaring bitmap entries.
For more details, see include/roaring/art/art.h
SLieve added 2 commits January 7, 2024 21:22
* Change art_compare_prefix to take a single length rather than two.
* Use art_compare_keys where possible.
* Use art_val_t directly (through an alias) rather than defining a separate
  type.
* Remove key_chunk from art_indexed_child_t.
* Use node->base where possible.
* Merge art_create_iterator into art_init_iterator.
@SLieve
Copy link
Contributor Author

SLieve commented Jan 8, 2024

Are we good to merge?

@Dr-Emann
Copy link
Member

Dr-Emann commented Jan 8, 2024

I think I'd be okay to merge, and add the todos in separate PRs (especially Error on failed allocations, Serialization / deserialization and adding to the amalgamation are the most important in my view), but that definitely is a question for @lemire more than me.

I'm excited to add this to the croaring-rs bindings for rust, as well.

@madscientist
Copy link
Contributor

Question: is there any thought to switching the existing 64bit C++ support to use this new facility? Today IIRC 64bit support in C++ is implemented very simply by having a std::map<> which maps the high-order 32 bits to a 32bit Roaring for the lower 32bits.

@lemire
Copy link
Member

lemire commented Jan 8, 2024

Question: is there any thought to switching the existing 64bit C++ support to use this new facility? Today IIRC 64bit support in C++ is implemented very simply by having a std::map<> which maps the high-order 32 bits to a 32bit Roaring for the lower 32bits.

That would be an obvious next step. It would also impact other wrappers. This could become more widely available.

@lemire
Copy link
Member

lemire commented Jan 8, 2024

Another obvious todo would be to add serialization and deserialization, ideally in a way that is compatible with the Java code.

See https://github.com/RoaringBitmap/RoaringFormatSpec#extension-for-64-bit-implementations

@lemire
Copy link
Member

lemire commented Jan 8, 2024

@SLieve There is a bunch of work remaining, @Dr-Emann and @madscientist pointed at some things, but I think that there is agreement that we can merge.

Can you confirm that you are fine with merging at this time? I want you to look at the last comments before we merge for real. :-)

When I do so, I will have to add a bunch of new issues to complete the work. (Not that you did not do a lot, you did... there is just more to do...)

@SLieve
Copy link
Contributor Author

SLieve commented Jan 8, 2024

Yes, I'm fine merging this code as it is functional as-is. I intend to continue work on this, so feel free to assign the new issues to me. C++ 64-bit support will have to wait until this implementation has feature parity with the existing one.

@lemire
Copy link
Member

lemire commented Jan 8, 2024

Merging.

@lemire lemire merged commit 0225f21 into RoaringBitmap:master Jan 8, 2024
25 checks passed
@lemire
Copy link
Member

lemire commented Jan 8, 2024

@SLieve Please check your email, as I have sent you an invite for the org. I have created two new issues but I can't assign them to you because you are not part of the org at this time.

@SLieve
Copy link
Contributor Author

SLieve commented Jan 8, 2024

I've joined the org now and assigned the issues to myself.

@baibaichen
Copy link

Hi, Is there any benchmark compare art version against std::map version?

@SLieve
Copy link
Contributor Author

SLieve commented Feb 4, 2024

The only comparative benchmark with the C++ 64-bit version is currently random_access64 and random_access64_cpp here. You can run these on the various benchmark cases as described here. Note that the benchmark data is made for 32-bit bitmaps, so some of the benchmark cases differ mostly or only in the lower 16 bits.

@lemire
Copy link
Member

lemire commented Feb 4, 2024

The following blog post could be relevant:

https://lemire.me/blog/2016/09/15/the-memory-usage-of-stl-containers-can-be-surprising/

An std::set used something like 32 bytes per entry while an std::unordered_set used 36 bytes per entry. This was for 4-byte payloads but the results could only be worse for 8-byte payloads.

@madscientist
Copy link
Contributor

I agree that "benchmarks" is too abstract a term: what are you measuring? Pure lookup speed? Insert speed? Delete speed? Size? All of these will be important in different ways to different workloads. @lemire (or anyone) have you done any experimentation with Roaring vs. specifically STL maps/sets/vectors of bool, rather than int, in C++? If I recall correctly at least some of the current STL implementations have specializations of the standard data structures for bool which are supposed to be more compact, at least. But I haven't looked into it lately. I will definitely be interested in comparisons once we get to a place where the new C 64bit support is reflected in the C++ implementation, both with the STL and also with the current C++ 64bit support in Roaring.

@lemire
Copy link
Member

lemire commented Feb 4, 2024

have you done any experimentation with Roaring vs. specifically STL

Yes. In the 32-bit case, we have written a paper that covers various alternatives way back in 2018...

It covers uncompressed bitsets, std::set, and so forth.

Since then, there has been great many other benchmarks published, including different implementations of 64-bit Roaring.

Of course, the newly contributed 64-bit Roaring bitmaps have not been benchmarked.

maps/sets/vectors of bool

A set of bool it can be empty, have true, fast, just true or false. So you can represent it using 2 bits.

You can use std::vector<bool> which should provide a specialized implementation which uses 1 bit per element in the universe but you are probably better off in our context with std::bitset. This should provide performance comparable to the uncompressed bitset implementations we have in CRoaring. The paper includes such alternatives in the 32-bit case.

Benchmarking thoroughly the new 64-bit bitmaps (using a rich set of metrics) would be important and I think that we will eagerly consider a pull request that provides these benchmarks. For tracking memory usage, what we did in the paper was to implement a custom allocator which allows us to track what is allocated... that's not perfect because it only measures virtual memory, not actual physical memory usage, but in our context, I think it is sufficient (calling shrink_to_fit() is useful).

I stress that it is not a simple endeavour to use microbenchmarks to assess such data structures. It requires some care.

Comment on lines +20 to +23
#elif defined(__has_include) && \
__has_include(<byteswap.h>) // CROARING_IS_BIG_ENDIAN
#include <byteswap.h>
#define htobe64(x) __bswap_64(x)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to break Android NDK r26 builds.

FAILED: src/CMakeFiles/roaring.dir/roaring64.c.o 
/vcpkg/android-ndk-r26b/toolchains/llvm/prebuilt/linux-x86_64/bin/clang --target=armv7-none-linux-androideabi21 --sysroot=/vcpkg/android-ndk-r26b/toolchains/llvm/prebuilt/linux-x86_64/sysroot  -I/mnt/vcpkg-ci/b/roaring/src/v3.0.0-d8b12d4628.clean/include -I/mnt/vcpkg-ci/b/roaring/src/v3.0.0-d8b12d4628.clean/include/roaring -I/mnt/vcpkg-ci/b/roaring/src/v3.0.0-d8b12d4628.clean/cpp -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -march=armv7-a -mthumb -Wformat -Werror=format-security  -fPIC      -Wall -Wmissing-braces -Wextra -Wsign-compare -Wshadow -Wwrite-strings -Wpointer-arith -Winit-self   -fno-limit-debug-info    -std=gnu11 -fPIC -MD -MT src/CMakeFiles/roaring.dir/roaring64.c.o -MF src/CMakeFiles/roaring.dir/roaring64.c.o.d -o src/CMakeFiles/roaring.dir/roaring64.c.o -c /mnt/vcpkg-ci/b/roaring/src/v3.0.0-d8b12d4628.clean/src/roaring64.c
/mnt/vcpkg-ci/b/roaring/src/v3.0.0-d8b12d4628.clean/src/roaring64.c:63:20: error: call to undeclared function '__bswap_64'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    uint64_t tmp = croaring_htobe64(key);
                   ^
/mnt/vcpkg-ci/b/roaring/src/v3.0.0-d8b12d4628.clean/include/roaring/portability.h:442:29: note: expanded from macro 'croaring_htobe64'
#define croaring_htobe64(x) __bswap_64(x)
                            ^

What I find in that header is:

#define bswap_64(x) __swap64(x)

(Does it need to use __ prefixed identifier?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is incorrect. Thanks for reporting the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants