Skip to content

Conversation

@alimanfoo
Copy link
Member

@alimanfoo alimanfoo commented Nov 21, 2017

This PR adds a new LMDBStore class.

Also some other improvements to the storage module, including making ZipStore thread safe (resolves #194).

@alimanfoo alimanfoo added the in progress Someone is currently working on this label Nov 21, 2017
@alimanfoo alimanfoo added this to the v2.2 milestone Nov 21, 2017
@alimanfoo alimanfoo added enhancement New features or improvements release notes done Automatically applied to PRs which have release notes. labels Nov 21, 2017
@alimanfoo
Copy link
Member Author

Some very simple benchmarks suggest this is worth adding, performance is better than DirectoryStore or DBMStore with BerkeleyDB, and close if not better than in-memory store (dict).

@alimanfoo
Copy link
Member Author

Updated benchmarks with larger arrays and some simple dask workflows. Generally confirm that lmdb performs better than bdb.

@alimanfoo
Copy link
Member Author

I've updated the benchmark notebook again to include zip store and dask examples. Also demonstrates that zip store is thread safe.

@alimanfoo
Copy link
Member Author

Another benchmark update including ndbm and bdb btree versus hash.

@alimanfoo
Copy link
Member Author

cc @jeromekelleher, if you're already getting good performance with Berkeley DB then this probably won't make much difference, but FWIW it looks like LMDB is quite a bit faster, almost as fast as memory, probably because the whole database is memory-mapped.

@alimanfoo
Copy link
Member Author

FTR I've added some locks to DictStore and DBMStore as well. These don't seem to impact on performance at all even under parallel workloads. They may not be strictly necessary for DictStore because of the GIL, and may not be necessary for DBMStore because at least some of the DBM implementations (GDBM, Berkeley) claim to do their own locking, but seemed worth the extra caution pending some deeper analysis.

@alimanfoo
Copy link
Member Author

I think this is ready to go.

@jeromekelleher
Copy link
Member

jeromekelleher commented Nov 23, 2017

Read/write performance really isn't important to me as the vast majority of the time is spend in de/compression. LMDB sounds good though. Definitely nice to hear about the extra locking, I was a little worried about letting it all happen at the DB layer.

@jakirkham
Copy link
Member

Have you played with the number of threads Zarr’s Blosc uses for decompression? Also there may be other tricks to play with like filtering before compression.

@jeromekelleher
Copy link
Member

I've played around with it a bit all right, but it's not a bottleneck for me. I'm doing compression/decompression in worker threads while the rest of the program is doing other stuff, so I'm not particularly sensitive to the performance.

@alimanfoo alimanfoo merged commit e694497 into master Nov 24, 2017
@alimanfoo alimanfoo removed the in progress Someone is currently working on this label Nov 24, 2017
@alimanfoo alimanfoo deleted the lmdb branch November 24, 2017 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New features or improvements release notes done Automatically applied to PRs which have release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ZipStore is not thread-safe

4 participants