Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support native serialization format #106

Merged
merged 6 commits into from
Jun 28, 2023

Conversation

andreas
Copy link
Contributor

@andreas andreas commented Jun 12, 2023

This PR adds support for the native serialization format from CRoaring. It builds on #105 to get support for roaring_bitmap_deserialize_safe. I'm admittedly an absolute newcomer to Rust, so I'd appreciate feedback on how to do this best. I tried a few approaches and found this struck the best balance in terms of aligning with the serialization approach used for TreeMap, not breaking backwards compatibility, and avoiding a proliferation of (de-)serialization functions on Bitmap. If this approach is acceptable, I'm happy to add tests.


use std::ffi::{c_char, c_void};

pub struct PortableSerializer {}
Copy link
Member

@Dr-Emann Dr-Emann Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend using a trait here, something like

pub trait Serializer {
    fn serialize_into<'a>(bitmap: &Bitmap, dst: &'a mut Vec<u8>) -> &'a [u8];
    fn get_serialized_size_in_bytes(bitmap: &Bitmap) -> usize;
    fn try_deserialize(buffer: &[u8]) -> Option<Bitmap>;
}

// enum rather than struct, make it impossible to construct a value
// and unsuffixed, expected to be used as e.g `serialization::Portable`
pub enum Portable {}

impl Serializer for Portable { ... }

Also, for @saulius, thoughts on using a breaking change here, and introduce something like

impl Bitmap {
    pub fn serialize_into<S: Serialize>(&self, dst: &'a mut Vec<u8>) -> &'a [u8] { ... }
    pub fn serialized_size<S: Serialize>(&self) -> usize { ... }
    pub fn try_deserialize<S: Serialize>(buffer: &[u8]) -> Option<Self> { ... }
}

We also could bring in BitmapView in a similar way

pub trait SerializeView {
    unsafe fn deserialize_view(data: &[u8]) -> BitmapView<'_>;
}

impl SerializeView for Portable { ... }
impl SerializeView for Frozen { ... }

impl<'a> BitmapView<'a> {
    pub unsafe fn new<S: SerializeView>(data: &'a [u8]) -> Self { ... }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Emann, thanks for weighing in 🙏🏻

I've tried implementing your suggestion in 61e8433 -- is that what you had in mind? I decided to introduce three traits: Serializer, Deserializer and ViewDeserializer. This is to capture that Frozen only supports deserializing to a BitmapView and not to a Bitmap:

Format Serializer Deserializer ViewDeserializer
Portable
Native
Frozen

Note that tests and documentation is still missing, but I was hoping to get the implementation right first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's what I was thinking, yeah. @saulius, I personally think this is worth the breaking change (e.g. a call to bitmap.get_serialized_size_in_bytes() will no longer compile because it needs a Serializer type argument), but I'd like your opinion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer this way to the way serialization is done in treemap, but we should probably make them consistent.

Copy link
Member

@saulius saulius Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, indeed. I also agree with you that this is worth a breaking change.

So far my line of thinking is this. Release current master as 0.9.0 - the last release under saulius. Then we can can merge this, work on treemap rework and transfer over to RoaringBitmap org (or the other way around). Newly refactored serialization interfaces can be a part of 1.0.0 released from under RoaringBitmap. How does that sound? If there are any other breaking-worthy changes you both have in mind, that would also be a good time to incorporate.

@andreas
Copy link
Contributor Author

andreas commented Jun 13, 2023

I'll rebase on master when #107 is merged.

@saulius
Copy link
Member

saulius commented Jun 20, 2023

Sorry @andreas, really struggling to find proper time to review the code, but I'm here :)

Can you please rebase on #107?

@andreas
Copy link
Contributor Author

andreas commented Jun 20, 2023

Thanks for the update, @saulius!

I've rebased on master now. Tests are still missing, but I added/updated docs. I'm happy to add tests if the general approach is considered right 🙂

@saulius
Copy link
Member

saulius commented Jun 21, 2023

Thanks for a great change @andreas , I agree with @Dr-Emann that this looks like a much Rust-ier interface to serialization. Addition of tests for Native would be welcomed, of course.

@andreas
Copy link
Contributor Author

andreas commented Jun 24, 2023

Alright, I've fixed tests and added rudimentary tests for the native format, so I think the changes are ready to be reviewed.


/// The `Portable` format is meant to be compatible with other roaring bitmap libraries, such as Go or Java.
/// It's defined here: https://github.com/RoaringBitmap/RoaringFormatSpec
pub enum Portable {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it might make sense to pull Portable, Frozen, and Native out of crate::bitmap::serialize into a crate::serialize, since they're not bitmap specific (we use the same types for treemap serialization)

Just a thought, definitely not worth holding up the PR for.

ReadBitmapOp::GetSerializedSizeInBytes => {
b.get_serialized_size_in_bytes();
ReadBitmapOp::GetPortableSerializedSizeInBytes => {
b.get_serialized_size_in_bytes::<Portable>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to import Portable/Native/Frozen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- fixed in a455945. I wonder why make fuzz did not fail for me locally 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like make fuzz just runs the fuzz_ops fuzzer

@Dr-Emann Dr-Emann mentioned this pull request Jun 25, 2023
3 tasks
@saulius
Copy link
Member

saulius commented Jun 28, 2023

Thanks, @andreas, this looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants