-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filters created with version 1.3.4 and exported to json do not work when imported to version 1.3.9 #49
Comments
This is pretty severe and critical for us. We rely on the bloom filters we export to json to remain valid and usable for the forseeable future. If you are changing the internal representation of the filters in a way that makes them not backwards-compatible with the json serialization from old versions, please add a versioning system into the json serialization and continue to support old formats!!! |
Because of the hashing, As a replacement we could add a BloomFilterV136 class with all the old code in the class for backward compatibility. But such solution must be maintained and shared across all the new versions. Plus you have the bug risks. For the moment I just recommend to use the 1.3.4 in your situation. If you really want to use the next versions then you will have to recreate all your filters from start. I will add a note on the README for compatibility problem between versions before releasing the 2.0.0. |
Then we're stuck on 1.3.4 forever. We want to upgrade to the new version and be able to take advantage of the bug fixes and perf improvements you all are making. We make new filters several times a day and would be happy for the newly created filters to always be on the most up-to-date version with all the latest fixes and improvements. The problem is we can't lose the ability to load the previously created versions. These bloom filters are being written to a blockchain, so the history is immutable - there's no way to recreate and upgrade old filters after they are made and exported. The ideal solution I'd like to see is to add a I'd be happy to put together a PR to do this if this approach is acceptable to you all. |
Hi 👋 After discussion with @folkvir, we've agreed that it was a mistake on our side to tag the changes to the hash algorithm as a minor version. It should have been tagged as a major version, due to the breaking changes regarding JSON serialization. I apologize for the error, and the inconvenience following it. Since the next release is going to be a major one, it is going to resolve itself on this matter. However, I'm not in favor of implementing a backward compatibility, because the changes that we made to the hashing algorithm are important bug fixes that prevented some filters to properly work. So, while I agree that your case is problematic, I don't want to support a version of the library that use an incorrect implementation. Therefore, and I'm sorry for that, I strongly disagree with your proposed solution. You can implement it in your application if you wish, and that's may be a good idea, but I will be against implementing it in the |
Okay, we will work around this on our side then. In the meantime, would you mind doing a 1.3.10 release that restores compatibility of importing data created with version 1.3.4 before you do the 2.0.0 release with the bug fix and breaking format change? So that way the old major version sequence continues to be backwards compatible and we don't risk upgrading to a version that breaks compatibility? |
Let me try something, then I'll ask you to try the solution on your side. If it is working we will discuss of a possible integration with @Callidon. |
@stbrody Back, I pushed something in the develop branch. The package will be the 2.0.0 with or without the fix. As I said we will discuss it with @Callidon to integrate it or not. Meanwhile, you can try the develop branch. The new BloomFilter integrate a new structure to store the data with a Bit Set.
const bl = BloomFilter.fromJSON(yourData)
bl._getIndexes = bl_._deprecated_getDistinctIndices
bl.has(...) See (https://github.com/Callidon/bloom-filters/blob/develop/test/compatibility-test.js) for a complete compatibility test. |
Hmm... @folkvir, if I'm understanding this right, it looks like I still need to know whether or not my filter was created with the old version or not so I know whether or not to switch to the old implementation of getIndexes. If the library doesn't handle version detection and compatibility internally, then that means my application still needs to be responsible for keeping track of which version of the library was used every time a filter is created, and then at read-time select the appropriate implementation to use based on the version that was used when the filter was first created. Having this fix you added does make things a little easier to work with, as I can switch filter versions on the fly without having to re-install the package which is nice, but the bigger problem is having to do the version detection and handling of old versions ourself. |
Yes exactly you need to make the versionning yourself. Here the problem is that changes made on how indexes are generated cannot be backward compatible. It does not affect only the BloomFilter but every single class using the old @Callidon and me disagree to include a versionning system because of this lack of clues and because of the futur code proliferation. But we are ok with a pseudo-backward compatibility of the BloomFilter import because of the optimizations made on the internal storage in the develop branch. Thus, with the old code added in the BaseFilter classe it should be enough for people to use the old index generation while using the new internal structure. |
I wonder if a reasonable compromise would be to include the version used in the json metadata when exporting a filter, but not necessarily commit to automatically supporting all previous versions. But you could at least throw a useful error message when trying to re-create a filter from a version that is no longer supported, rather than just silently returning incorrect data. That error could then be a single to a dev to swap in the |
Author: Arnausd Grall (@folkvir) * update deps and add gts * h64 only + fix #43 + gts * mocha: parallel execution * utils: rename *Indices to *Indexes, add better documentation * tests: show difference when using getDistinctIndexes w/wo hashing at each iteration * utils: rename *Indices to *Indexes * update ref pdf of CMS * double hashing: use the enhanced technique * getDistinctIndexes: implement #43 suggestions + allow to switch to XXH 32/64 if needed * update changelog * add .DS_Store in the list of ignored files * use typescript 4.X.X * fix typo of #46 * fix error when setting 32/64 bits xxh functions, and allow for overriding the serialize function (#48) * add code coverage * lint * update eslint ignore for faster parsing * add branch update_outdated to the tested branches * fix wrong github workflow branch name * workflow: use 16.x and remove 10.x * update README and remove useless deps * tests: fix error with describe/it rules overriding data * implemnting xor filter for the next release * update all files to the new project standard, set v2.0.0 * xor filters are working * testing the xor filter * xor filter #29 * #29 * documentation * code ql / tests on develop * keywords * fix workflow * move all hash related functions to the BaseFilter class * modify tests according to: 12417e7 * parallel mocha tests * use lts/* with setup-node@v2 * update README.md * add badge to the readme * export BaseFilter in the entry * badge refers to the last action build * update compatibility readme * update @types/node version to be the latest minor release * prettier auto line endings * update @types/node version to be the latest minor release * #49 add compatibility with 1.3.4 * #49 compatibility for BloomFilters import only is working * Use AutoExportable feature for simplfying the export process, bitsets size set to a multiple of bitwords * create Hashing classes * remove nyc, use eslint + prettier instead of gts, enforce type checking, make all propeties public, no readonly * update yarn.lock * ci: remove node 15 * fix missing dependency and fix eslint errors * update readme * fix conflicts with master, eslint is now a rule * add typedoc-plugin-missing-exports and fix non-overrided doubleHashing function
👋 Hi @stbrody |
Thanks for letting me know @Callidon. We've decided though to write down the exact package version number alongside every bloom filter we create so when we need to read them later we can always use the exact version they were created with. I don't think the approach implemented here really helps us as you still need to know what version the filter was created with so that you know whether or not to install the old index function. If we already have to do the work to keep track of what version was used to create the filter and change behavior based on that version, then we might as well go all the way and just use the exact version that matches how it was created. |
Okay, thanks for your feedback. I'm closing the issue, since we have both fixed it on your sides. Feel free to re-open it if you find any more issues. |
We use this library in as part of the Ceramic Network. We are publishing batches of work done on the network along with a JSON-serialized bloom filter to make it easier to look up what happened in that batch. Our hosted service making these batches and publishing these bloom filters seems to be using version
1.3.4
of thebloom-filters
package. Importing those bloom filters with the current1.3.9
version of the library causes the.has()
method of the reconstituted bloom filter to always return false, even for entries that are definitely present in the filter. Reverting to version1.3.4
and rebuilding the bloom filter from the same input data causes it to behave correctly. It seems that there was a format change somewhere between version1.3.4
and version1.3.9
that causes filters created on the old version, exported withsaveAsJSON()
, and then imported withfromJSON()
on the new version to not build the bloom filter correctly.Code that builds the filter: https://github.com/ceramicnetwork/ceramic-anchor-service/blob/9ff9e1a20e46c65036fcbed600a0abf1b09d0bec/src/merkle/merkle-objects.ts#L247-L249
The text was updated successfully, but these errors were encountered: