Reduce memory footprint of a log instance & improve load time for orbit-db stores #136

mistakia · 2018-07-13T15:24:30Z

Currently, when a log is loaded, all the data is kept in memory on log._entryIndex. Also, constructing a log requires passing in all the entries, which in the case of OrbitDB Stores that means reading all the data from disk (or just the N latest entries).

This is not very ideal as the load operation is O(n) and the memory footprint is an issue in mobile environments.

What are your thoughts on making the following changes:

allow constructor to require Entries and/or Heads with a nextsIndex (aka a entryHashIndex)

The following methods would also change:

log.values would read from disk when entryIndex does not exist
log.get(hash) would read from disk when entryIndex does not exit
log.has(hash) would use nextsIndex when entryIndex does not exist
log.toJSON / log.toSnapshot would include nextsIndex

The following methods would be added:

log.audit (or similar) to go through and rebuild/validate the nextsIndex

This would allow using just a cache of the head and a nextsIndex to create a functional instance of Log. Also, using the nextsIndex one could navigate and load any portion of the log.

Happy to take this on and explore better ways to accomplish this but would like some feedback to make sure I'm not missing anything obvious.

The text was updated successfully, but these errors were encountered:

haadcode · 2018-07-14T06:08:31Z

@mistakia thank you for opening this discussion! This is indeed something we should look at and see how we can improve the current implementation. Great thinking here @mistakia as to how to approach improving it! 👍More than happy to help if you want to work on this, and it is much appreciated! ❤️

I'll try to provide some insight, so couple of things worth to emphasize before starting work on this:

There are trade-offs here that will effect reads and writes. In order to pull in a PR, we should understand what the trade-offs are and how they effect the performance (either as raw throughput or as differences in O-complexity)
Given ^, it would serve the implementor well to write benchmarks for each operation that this touches (.values, get(hash) , has(hash) etc) before changing the code as to have a clear comparison between old vs. new implementation. Further, given the O-complexities involved, I would highly recommend to write the benchmarks based on the existing ones: observing and measuring the throughput over time is hugely important to make sure you don't accidentally add complexity (or can prove that the algorithm is certain complexity).
In the future, we should create a design for introducing the concepts of cold/hot storage in OrbitDB more generally, but that should not hold back on exploring and implementing this! (ie. go ahead and work on this, that's awesome! 👍😄)

I would also recommend to look (and base the implementation on) https://github.com/orbitdb/ipfs-log/blob/local/refs-tests/src/log.js (note the branch) as this branch and code contains upcoming changes to the way the log is loaded, improving performance quite a bit. And I think you'll find the traverse() function helpful for the reads. There's also a little nifty tool to visualize the log loading in that branch, which I've found really useful for comparing between in-memory and on-disk performance as well as for general performance profiling and debugging (through chrome devtools).

Hope this helps to get started!

Also, thank you for all the PRs and work you're doing! I'm slowly going through all of them, bear with me.

mistakia · 2018-07-14T17:09:23Z

nice work on that branch - loading is indeed MUCH faster. 💪💨

Also, I'm now considering whether it might be better to implement these changes inside an OrbitDB store, where a store keeps a "light" oplog in memory containing just the entry hashes (head + nexts). Changes may still be needed to ipfs-log (potentially amending log.append to take nexts?)

What would be the current behavior with OrbitDB stores if you don't call load() when a local log exists and then append entries? Would the log entries be appended properly?

mistakia · 2018-07-14T22:17:29Z

From what I gathered, it appears that the Store will join the logs but the side effect will be the creation of another head.

mistakia mentioned this issue Jul 14, 2018

tests: add benchmarks for log.values, log.has(), log.get() #137

Closed

mistakia closed this as completed Jul 14, 2018

mistakia reopened this Jul 14, 2018

mistakia mentioned this issue Jul 14, 2018

test: add benchmark for log.fromEntryHash() #138

Merged

mistakia mentioned this issue Jul 16, 2018

Add benchmarks runner and incorporate performance testing with CI/PRs #140

Closed

aphelionz mentioned this issue Dec 13, 2018

Memory Footprint #203

Open

mistakia mentioned this issue Apr 22, 2020

Feat: improve memory usage / allow for in-memory storage of all entries to be optional (WIP) #301

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory footprint of a log instance & improve load time for orbit-db stores #136

Reduce memory footprint of a log instance & improve load time for orbit-db stores #136

mistakia commented Jul 13, 2018

haadcode commented Jul 14, 2018

mistakia commented Jul 14, 2018 •

edited

Loading

mistakia commented Jul 14, 2018

Reduce memory footprint of a log instance & improve load time for orbit-db stores #136

Reduce memory footprint of a log instance & improve load time for orbit-db stores #136

Comments

mistakia commented Jul 13, 2018

haadcode commented Jul 14, 2018

mistakia commented Jul 14, 2018 • edited Loading

mistakia commented Jul 14, 2018

mistakia commented Jul 14, 2018 •

edited

Loading