Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce DecodeFrom and EncodeReader #3

Merged
merged 1 commit into from
Oct 9, 2021
Merged

Conversation

EladGabay
Copy link
Contributor

The buckets byte slice is the biggest part of the the memory used by the filter,
and might be several of GBs.

Common usage of a filter is in an environment with limited RAM size
based on the filter size, load it to memory on startup and dump it to
disk on teardown.
Currently the Encode and Decode methods duplicates the byte slice,
which makes the memory usage at the loading and dumping time to be (at
least) twice the filter size.

This commit introduces a new method for dumping the filter using a reader
of the internal byte slice, and a method for loading the filter based on
already fetched encoded bytes (from disk, network) and use them
internaly instead of making a copy.

  • formatting.

@linvon
Copy link
Owner

linvon commented Oct 6, 2021

Nice idea by the way

@EladGabay
Copy link
Contributor Author

Nice idea by the way

Would you like to merge it :)?

@linvon
Copy link
Owner

linvon commented Oct 7, 2021

Nice idea by the way

Would you like to merge it :)?

I think metaDataSize should be remove out of SizeInBytes, can you fix it?

@EladGabay
Copy link
Contributor Author

EladGabay commented Oct 7, 2021

SizeInBytes should reflect the size of encoded filter (metadata + data), this way the user can prepare the required memory for encoding\decoding, and this is the actual size in bytes used by the filter.
In addition, it's necessary to be the exact number of bytes for creating the bytes slice in Encode before ReadFull, otherwise we'll need to do ReadAll and pay with re-allocations and copies.

Added a new commit that makes it aligned in the filter object.

@linvon
Copy link
Owner

linvon commented Oct 8, 2021

i'd like to merge the Reader part, we can discuss the Size part in the future, can you split this into two MR?

@EladGabay
Copy link
Contributor Author

I suggest to keep the SizeInBytes without the metadata part and introduce EncodedSizeInBytes method that returns the size including the metadata. Sounds good?

@linvon
Copy link
Owner

linvon commented Oct 9, 2021

EncodedSizeInBytes

this is okay too

The buckets byte slice is the biggest part of the the memory used by the filter,
and might be several of GBs.

Common usage of a filter is in an environment with limited RAM size
based on the filter size, load it to memory on startup and dump it to
disk on teardown.
Currently the Encode and Decode methods duplicates the byte slice,
which makes the memory usage at the loading and dumping time to be (at
least) twice the filter size.

This commit introduces a new method for dumping the filter using a reader
of the internal byte slice, and a method for loading the filter based on
already fetched encoded bytes (from disk, network) and use them
internaly instead of making a copy.

+ formatting.
@EladGabay
Copy link
Contributor Author

Now reader returns also the size, so we can prepare the memory in advance.

@linvon linvon merged commit 92f5275 into linvon:main Oct 9, 2021
@EladGabay
Copy link
Contributor Author

Would you like to create a new tag?

@linvon
Copy link
Owner

linvon commented Oct 10, 2021

Would you like to create a new tag?

sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants