Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filters created with version 1.3.4 and exported to json do not work when imported to version 1.3.9 #49

Closed
stbrody opened this issue Dec 30, 2021 · 14 comments
Assignees
Labels

Comments

@stbrody
Copy link

stbrody commented Dec 30, 2021

We use this library in as part of the Ceramic Network. We are publishing batches of work done on the network along with a JSON-serialized bloom filter to make it easier to look up what happened in that batch. Our hosted service making these batches and publishing these bloom filters seems to be using version 1.3.4 of the bloom-filters package. Importing those bloom filters with the current 1.3.9 version of the library causes the .has() method of the reconstituted bloom filter to always return false, even for entries that are definitely present in the filter. Reverting to version 1.3.4 and rebuilding the bloom filter from the same input data causes it to behave correctly. It seems that there was a format change somewhere between version 1.3.4 and version 1.3.9 that causes filters created on the old version, exported with saveAsJSON(), and then imported with fromJSON() on the new version to not build the bloom filter correctly.

Code that builds the filter: https://github.com/ceramicnetwork/ceramic-anchor-service/blob/9ff9e1a20e46c65036fcbed600a0abf1b09d0bec/src/merkle/merkle-objects.ts#L247-L249

@stbrody
Copy link
Author

stbrody commented Dec 30, 2021

This is pretty severe and critical for us. We rely on the bloom filters we export to json to remain valid and usable for the forseeable future. If you are changing the internal representation of the filters in a way that makes them not backwards-compatible with the json serialization from old versions, please add a versioning system into the json serialization and continue to support old formats!!!

@folkvir
Copy link
Collaborator

folkvir commented Dec 31, 2021

Problem comes from fix #36, hashes are done in a different way and therefore backward compatibility is not maintained between the two versions. We should have majored the version. Sorry for the inconvenience.

@Callidon do you agree this analysis?

@folkvir folkvir added bug and removed investigating labels Dec 31, 2021
@folkvir
Copy link
Collaborator

folkvir commented Dec 31, 2021

Because of the hashing, getDistinctIndices could in your version cause infinite loop in some circonstance. Such behavior is removed since version 1.3.7.

As a replacement we could add a BloomFilterV136 class with all the old code in the class for backward compatibility. But such solution must be maintained and shared across all the new versions. Plus you have the bug risks.
Plus, since getDistinctIndices and double hashing has changed in the next versions, all the structures using it will have the same problem than you encounter and therefore we must share *VXX classes for any breaking changes. We surely won't do that.

For the moment I just recommend to use the 1.3.4 in your situation. If you really want to use the next versions then you will have to recreate all your filters from start.

I will add a note on the README for compatibility problem between versions before releasing the 2.0.0.

@stbrody
Copy link
Author

stbrody commented Dec 31, 2021

For the moment I just recommend to use the 1.3.4 in your situation

Then we're stuck on 1.3.4 forever. We want to upgrade to the new version and be able to take advantage of the bug fixes and perf improvements you all are making. We make new filters several times a day and would be happy for the newly created filters to always be on the most up-to-date version with all the latest fixes and improvements. The problem is we can't lose the ability to load the previously created versions. These bloom filters are being written to a blockchain, so the history is immutable - there's no way to recreate and upgrade old filters after they are made and exported.

The ideal solution I'd like to see is to add a _version field to the JSON that is exported when calling filter.saveAsJSON() (alongside the _filter, _length, _seed, etc fields) that is updated every time there is a breaking format change. Then when loading a filter with BloomFilter.fromJSON, it would automatically load the correct implementation to interpret that version. Yes this would mean keeping support for old versions around in the code base, but given they would never change I would expect the maintenance costs would be very low to nonexistent.

I'd be happy to put together a PR to do this if this approach is acceptable to you all.

@Callidon
Copy link
Owner

Callidon commented Jan 2, 2022

Hi 👋

After discussion with @folkvir, we've agreed that it was a mistake on our side to tag the changes to the hash algorithm as a minor version. It should have been tagged as a major version, due to the breaking changes regarding JSON serialization. I apologize for the error, and the inconvenience following it. Since the next release is going to be a major one, it is going to resolve itself on this matter.

However, I'm not in favor of implementing a backward compatibility, because the changes that we made to the hashing algorithm are important bug fixes that prevented some filters to properly work. So, while I agree that your case is problematic, I don't want to support a version of the library that use an incorrect implementation.

Therefore, and I'm sorry for that, I strongly disagree with your proposed solution. You can implement it in your application if you wish, and that's may be a good idea, but I will be against implementing it in the bloom-filters library. However, if you have another idea to resolve that issue, feel free to propose it!

@stbrody
Copy link
Author

stbrody commented Jan 4, 2022

Okay, we will work around this on our side then.

In the meantime, would you mind doing a 1.3.10 release that restores compatibility of importing data created with version 1.3.4 before you do the 2.0.0 release with the bug fix and breaking format change? So that way the old major version sequence continues to be backwards compatible and we don't risk upgrading to a version that breaks compatibility?

@folkvir
Copy link
Collaborator

folkvir commented Jan 5, 2022

Let me try something, then I'll ask you to try the solution on your side. If it is working we will discuss of a possible integration with @Callidon.

@folkvir
Copy link
Collaborator

folkvir commented Jan 5, 2022

@stbrody Back, I pushed something in the develop branch. The package will be the 2.0.0 with or without the fix. As I said we will discuss it with @Callidon to integrate it or not. Meanwhile, you can try the develop branch.

The new BloomFilter integrate a new structure to store the data with a Bit Set.

  • You should be able to import it from your version.
  • You won't be able to export it to your current version. Only to the new one.
const bl = BloomFilter.fromJSON(yourData)
bl._getIndexes = bl_._deprecated_getDistinctIndices
bl.has(...) 

See (https://github.com/Callidon/bloom-filters/blob/develop/test/compatibility-test.js) for a complete compatibility test.

@stbrody
Copy link
Author

stbrody commented Jan 5, 2022

Hmm... @folkvir, if I'm understanding this right, it looks like I still need to know whether or not my filter was created with the old version or not so I know whether or not to switch to the old implementation of getIndexes. If the library doesn't handle version detection and compatibility internally, then that means my application still needs to be responsible for keeping track of which version of the library was used every time a filter is created, and then at read-time select the appropriate implementation to use based on the version that was used when the filter was first created.

Having this fix you added does make things a little easier to work with, as I can switch filter versions on the fly without having to re-install the package which is nice, but the bigger problem is having to do the version detection and handling of old versions ourself.

@folkvir
Copy link
Collaborator

folkvir commented Jan 6, 2022

Yes exactly you need to make the versionning yourself. Here the problem is that changes made on how indexes are generated cannot be backward compatible. It does not affect only the BloomFilter but every single class using the old getDistinctIndices function in version <1.3.7. Plus, We have no clue of the version used by people in the exported filter. So the only way for people to continue to use the old system before 1.3.7 is to manually add information about the package version when the filter was created and then must use the _deprecated_getDistinctIndices instead of _getIndexes just after the import.

@Callidon and me disagree to include a versionning system because of this lack of clues and because of the futur code proliferation. But we are ok with a pseudo-backward compatibility of the BloomFilter import because of the optimizations made on the internal storage in the develop branch. Thus, with the old code added in the BaseFilter classe it should be enough for people to use the old index generation while using the new internal structure.

@stbrody
Copy link
Author

stbrody commented Jan 7, 2022

I wonder if a reasonable compromise would be to include the version used in the json metadata when exporting a filter, but not necessarily commit to automatically supporting all previous versions. But you could at least throw a useful error message when trying to re-create a filter from a version that is no longer supported, rather than just silently returning incorrect data.

That error could then be a single to a dev to swap in the _deprecated_getDistinctIndices

Callidon pushed a commit that referenced this issue Feb 18, 2022
Author: Arnausd Grall (@folkvir)

* update deps and add gts

* h64 only + fix #43 + gts

* mocha: parallel execution

* utils: rename *Indices to *Indexes, add better documentation

* tests: show difference when using getDistinctIndexes w/wo hashing at each iteration

* utils: rename *Indices to *Indexes

* update ref pdf of CMS

* double hashing: use the enhanced technique

* getDistinctIndexes: implement #43 suggestions + allow to switch to XXH 32/64 if needed

* update changelog

* add .DS_Store in the list of ignored files

* use typescript 4.X.X

* fix typo of #46

* fix error when setting 32/64 bits xxh functions, and allow for overriding the serialize function (#48)

* add code coverage

* lint

* update eslint ignore for faster parsing

* add branch update_outdated to the tested branches

* fix wrong github workflow branch name

* workflow: use 16.x and remove 10.x

* update README and remove useless deps

* tests: fix error with describe/it rules overriding data

* implemnting xor filter for the next release

* update all files to the new project standard, set v2.0.0

* xor filters are working

* testing the xor filter

* xor filter #29

* #29

* documentation

* code ql / tests on develop

* keywords

* fix workflow

* move all hash related functions to the BaseFilter class

* modify tests according to: 12417e7

* parallel mocha tests

* use lts/* with setup-node@v2

* update README.md

* add badge to the readme

* export BaseFilter in the entry

* badge refers to the last action build

* update compatibility readme

* update @types/node version to be the latest minor release

* prettier auto line endings

* update @types/node version to be the latest minor release

* #49 add compatibility with 1.3.4

* #49 compatibility for BloomFilters import only is working

* Use AutoExportable feature for simplfying the export process, bitsets size set to a multiple of bitwords

* create Hashing classes

* remove nyc, use eslint + prettier instead of gts, enforce type checking, make all propeties public, no readonly

* update yarn.lock

* ci: remove node 15

* fix missing dependency and fix eslint errors

* update readme

* fix conflicts with master, eslint is now a rule

* add typedoc-plugin-missing-exports and fix non-overrided doubleHashing function
@Callidon
Copy link
Owner

👋 Hi @stbrody
We've finally released the 2.0.0 version of the package. You should now be able to have backward compatibility for import/export. I invite you to check out the new version and more precisely this test case that should help you here.
Let me know if it improves the issue in any how.

@stbrody
Copy link
Author

stbrody commented Mar 11, 2022

Thanks for letting me know @Callidon. We've decided though to write down the exact package version number alongside every bloom filter we create so when we need to read them later we can always use the exact version they were created with. I don't think the approach implemented here really helps us as you still need to know what version the filter was created with so that you know whether or not to install the old index function. If we already have to do the work to keep track of what version was used to create the filter and change behavior based on that version, then we might as well go all the way and just use the exact version that matches how it was created.

@Callidon
Copy link
Owner

Okay, thanks for your feedback. I'm closing the issue, since we have both fixed it on your sides. Feel free to re-open it if you find any more issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants