Enhance Support for Larger Datasets and Buckets in Encoding #11

EladGabay · 2023-08-15T11:23:43Z

This commit improves encoding by enabling the handling of number of items and buckets exceeding max(uint32). Formerly, the encoding used uint32 for counts, but the filter structure already supported larger values using uint. Until now, the filter partially supported larger datasets, not all the buckets were utilized, note to the change in generateIndexTagHash, altIndex and indexHash.

Now, all references to bucket indices and item counts explicitly use uint64. A new encoding format accommodates larger filter. To distinguish between legacy (up to max(uint32) items) and the new format, a prefix marker is introduced.

Decoding seamlessly supports both formats.
The encode method takes a legacy boolean parameter for gradual adoption.

This commit improves encoding by enabling the handling of number of items and buckets exceeding max(uint32). Formerly, the encoding used uint32 for counts, but the filter structure already supported larger values using uint. Until now, the filter partially supported larger datasets, not all the buckets were utilized, note to the change in `generateIndexTagHash`, `altIndex` and `indexHash`. Now, all references to bucket indices and item counts explicitly use uint64. A new encoding format accommodates larger filter. To distinguish between legacy (up to max(uint32) items) and the new format, a prefix marker is introduced. Decoding seamlessly supports both formats. The encode method takes a legacy boolean parameter for gradual adoption.

EladGabay · 2023-08-18T07:30:20Z

@linvon would you like to take a look? 😊

linvon · 2023-08-21T04:13:16Z

@linvon would you like to take a look? 😊

Sorry, busy with work, but I will find some time to handle this

EladGabay · 2023-09-06T10:43:12Z

Hi, @linvon , let me know if you need any help :)

EladGabay · 2023-09-28T08:24:17Z

@linvon gentle ping

EladGabay · 2024-01-05T13:42:12Z

Hi @linvon do you think it's going to be merged soon? 🙏

EladGabay force-pushed the main branch from 59b4837 to 656bb19 Compare August 16, 2023 13:43

EladGabay force-pushed the main branch from 656bb19 to f8cedef Compare August 16, 2023 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Support for Larger Datasets and Buckets in Encoding #11

Enhance Support for Larger Datasets and Buckets in Encoding #11

EladGabay commented Aug 15, 2023

EladGabay commented Aug 18, 2023

linvon commented Aug 21, 2023

EladGabay commented Sep 6, 2023

EladGabay commented Sep 28, 2023

EladGabay commented Jan 5, 2024

Enhance Support for Larger Datasets and Buckets in Encoding #11

Are you sure you want to change the base?

Enhance Support for Larger Datasets and Buckets in Encoding #11

Conversation

EladGabay commented Aug 15, 2023

EladGabay commented Aug 18, 2023

linvon commented Aug 21, 2023

EladGabay commented Sep 6, 2023

EladGabay commented Sep 28, 2023

EladGabay commented Jan 5, 2024