Replies: 1 comment
-
@vedantroy Sorry for the slow response, somehow missed this earlier. I do believe this would work well with multi-terabyte datasets. I have consistently used this with 100GB+ datasets; not quite as big as multi-terabyte, but still significantly larger than the RAM available on the servers it was running on, and had great results. Naturally, just like any other database, when the database is larger than available memory, a lot more operations (random lookups) will require slower disk access, but LMDB is still very efficient with this compared to other databases. Symas has published some benchmarks on this as well (the read benchmarks are probably the most relevant, and lmdb-js has vastly better write performance due to batching than a write-per-txn model). One change in strategy that may be appropriate with very large databases is you may consider using asynchronous gets via the |
Beta Was this translation helpful? Give feedback.
-
I used this earlier for a web development project, and the experience was pretty smooth, so I was thinking of using it for an ML project!
Would this be good for handling multi-terabyte datasets?
E.g, a simple schema of:
id => (img_bytes, img_caption)
or something similar.Or do you think performance would degrade and random look-ups would be very slow?
Beta Was this translation helpful? Give feedback.
All reactions