Reduce memory consumption for prune operation #329

pdf · 2018-01-14T01:39:46Z

This is a first pass at fixing prune memory consumption. I'm not as familiar with the codebase as I would like, but I think this is in a state that's okay for review. I couldn't get any of the tests to work locally though, so I'm not sure how to test further except to run against valid data (which I've done for the standard prune, but not for exhaustive/exclusive).

Implements non-caching variants of GetSnapshotChunks/LoadChunks. All existing call sites are migrated to use CacheSnapshotChunks and CacheChunks, which are the renamed (previously existing) caching implementations. This allows individual callers to be updated to use the non-caching version where that is deemed appropriate. This is a breaking change for any third-party tooling, as the return signature for LoadChunks has changed, and GetSnapshotChunks no longer populates snapsnot.ChunkHashes.

Updates PruneSnapshots to use the non-caching chunk fetching functions. For non-exhaustive prune, considers only target chunks instead of mapping all chunks in repository to memory.

Fixes #146

PS - I should also note that there are a lot of unhandled errors in the existing code-base, plus variable shadowing, etc. In the code that I've touched here, I've made sure to at least log errors where I wasn't 100% certain that failing was correct, and fixed lint where found. I'd highly recommend incorporating some linters into your workflow to catch these sorts of things.

pdf · 2018-01-14T05:21:32Z

I apologise for the somewhat hard to follow git diff in the second commit - it comes mostly from refactoring out the actual prune operations into pruneSnapshots and pruneSnapshotsExhaustive, rather than keeping them inside the huge if/else statement (and making that conditional even larger, with the alternative snapshot traversals implemented here for each prune type).

pdf · 2018-01-14T13:27:46Z

I've done some further testing and profiling. Looks like mostly by virtue of using the non-caching chunk methods, exhaustive prune memory usage tops out at around 80MB, and standard prune at around 65MB for my dataset. The latter at least is down from gigabytes on master.

If we determine that this code is correct, it's probably worth examining some of the other usages to determine where chunk caching is actually beneficial, as there are likely other places where significant savings could me made, just by switching to the non-caching methods.

gilbertchen

I can make these changes if you want (and if I have the permission to update this PR).

gilbertchen · 2018-01-16T01:23:02Z

src/duplicacy_snapshotmanager.go

-// GetSnapshotChunks returns all chunks referenced by a given snapshot.
-func (manager *SnapshotManager) GetSnapshotChunks(snapshot *Snapshot) (chunks []string) {
+// CacheSnapshotChunks returns all chunks referenced by a given snapshot.
+func (manager *SnapshotManager) CacheSnapshotChunks(snapshot *Snapshot) (chunks []string) {


I think instead of keeping two versions of GetSnapshotChunks and LoadChunks, it is possible to add an argument to the original GetSnapshotChunks that indicates if snapshot.ChunkHashes needs to be cleared at the end of the call, like:

func (manager *SnapshotManager) GetSnapshotChunks(snapshot *Snapshot, clearChunkHashes bool) (chunks []string) {

The only place where clearChunkHashes should be false is inside ShowStatisticsTabular where GetSnapshotChunks is called twice on all snapshots. In all other places I believe it is safe to set clearChunkHashes to true.

I considered both approaches (parameter vs separate methods).

The logic for GetSnapshotChunks using a parameter is not too complicated, I'd probably name the parameter something like cacheChunks and reverse the logic though, since the zero value for bool is false, and the default behaviour sounds like it should be not to cache the chunks in memory.

The problem for me is with usage of LoadChunks, since this is used directly in a few places. If this is retained in its original state, storing data on the snapshot object, it becomes the responsibility of every calling site to nil that field if they don't want to cache the chunks, which is just asking for unexpected memory usage. LoadChunks also doesn't really lend itself well to operating on a parameter for caching, since it needs to return the chunks for the non-caching version, but also store them on the snapshot when caching.

If you would prefer not to retain two implementations, one option would be to replace the caching variant of LoadChunks with the implementation in this PR, which will return the chunks without storing them, then store them on the snapshot inside GetSnapshotChunks, based on a parameter. The reason I didn't consider this route is because I wasn't sure how LoadChunks was used in places like DownloadSnapshotSequence, and I didn't immediately have time to work back up the call chain(s) for these other methods.

What I thought was to keep LoadChunks as it is. Other than GetSnapshotChunks, it is only called by DownloadSnapshotSequence, which in turn is called by DownloadSnapshotContents. It should be okay to keep snapshot.ChunkHashes in ``DownloadSnapshotContents` I think.

Feel free to reverse the logic for clearChunkHashes and give it a different name; just avoid using cache in its name. This would create some confusion with the snapshot cache.

gilbertchen · 2018-01-16T01:25:21Z

src/duplicacy_snapshotmanager.go

-			logFile.Close()
+			cerr := logFile.Close()
+			if cerr != nil {
+				LOG_ERROR("LOG_FILE", "Could not close log file %s: %v", logFileName, cerr)


This should be LOG_WARN, because this defer func can be called when a panic occurs and LOG_ERROR would cause a new panic.

gilbertchen · 2018-01-16T01:28:08Z

src/duplicacy_snapshotmanager.go


-	referencedChunks := make(map[string]bool)
+	var success bool


Thank you for splitting this into separate functions! I have been too lazy and too afraid of breaking something. Don't worry about the git diff results. I'm sure a manual diff of the relevant parts should be good.

No problem - it made particular sense to do it now, as the two paths now also differ in how they walk the snapshots.

pdf · 2018-01-16T02:35:45Z

If you'd prefer to run with this yourself, you're quite welcome, maintainer edits are enabled on the PR. However, I'm happy to keep working on this based on your feedback if you'd like me to, as time allows.

gilbertchen · 2018-01-16T03:15:55Z

Ok, please keep working on this. I really appreciate your help!

pdf · 2018-01-19T21:41:44Z

Updated based on feedback, ready for review again.

pdf · 2018-01-19T21:50:39Z

I need to do some more profiling on this - I'm not certain that this results in the same level of memory reduction as the previous implementation.

EDIT: Seems to be close enough in my tests, considering the ratios we're talking about.

pdf · 2018-01-20T22:26:11Z

This code is not sufficient - there are locations that assume stored hashes that are not handled here, I'll take another pass.

For non-exhaustive prune, consider only target chunks instead of mapping all chunks in repository.

pdf · 2018-01-21T03:59:30Z

Okay, I think it's just *BackupManager.Backup that requires keepChunkHashes enabled, but I'm out of time trying to back-track all the references to *Snapshot.ChunkHashes. It might be safer just to enable keeping chunks everywhere initially, until someone who understands all the code-paths better than me can critically review and disable caching for each call to GetSnapshotChunks?

In the mean time, I've enabled keeping for chunks only for Backup and ShowStatisticsTabular.

leftytennis · 2018-01-25T20:52:35Z

I'm all for any improvements that can be made obviously, but pruning is such an important function, I hope this is heavily tested before it's committed to master... :)

Has any testing been done on two diff repositories to ensure that the results of prune operations using the current master code and with these modifications results in the exact same results?

Thanks

gilbertchen · 2018-03-20T18:32:06Z

@pdf Thank you for your contribution! This is a good piece of work and may have fixed a previously unknown bug!

@jt70471 I tested on my own repository and made sure they pruned the same sets of chunks.

kairisku · 2018-03-27T06:27:24Z

It seems the emptying of ChunkHashes causes backup statistics to show incorrect data (0 total):

File chunks: 0 total, 19,552M bytes; 102 new, 82,075K bytes, 49,195K bytes uploaded

gilbertchen · 2018-03-28T03:36:08Z

The 0 file chunk bug should be fixed by 5d2242d.

pdf mentioned this pull request Jan 14, 2018

Prune excessive memory consumption #146

Closed

pdf force-pushed the prune_memory branch 4 times, most recently from 2291a28 to f66bb9d Compare January 14, 2018 02:00

gilbertchen reviewed Jan 16, 2018

View reviewed changes

pdf force-pushed the prune_memory branch from f66bb9d to 0df4f4e Compare January 19, 2018 21:40

pdf added 2 commits January 21, 2018 10:11

Add keepChunkHashes flag to GetSnapshotChunks, allowing reduced memory

bd5a689

Reduce memory consumption for prune operation

57082cd

For non-exhaustive prune, consider only target chunks instead of mapping all chunks in repository.

pdf force-pushed the prune_memory branch from 0df4f4e to 57082cd Compare January 20, 2018 23:12

gilbertchen merged commit 13fffc2 into gilbertchen:master Mar 20, 2018

gilbertchen mentioned this pull request Mar 20, 2018

Prune failed - chunk not found #386

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory consumption for prune operation #329

Reduce memory consumption for prune operation #329

pdf commented Jan 14, 2018 •

edited

Loading

pdf commented Jan 14, 2018 •

edited

Loading

pdf commented Jan 14, 2018 •

edited

Loading

gilbertchen left a comment

gilbertchen Jan 16, 2018

pdf Jan 16, 2018 •

edited

Loading

gilbertchen Jan 16, 2018

gilbertchen Jan 16, 2018

pdf Jan 16, 2018

gilbertchen Jan 16, 2018

pdf Jan 16, 2018

pdf commented Jan 16, 2018

gilbertchen commented Jan 16, 2018

pdf commented Jan 19, 2018

pdf commented Jan 19, 2018 •

edited

Loading

pdf commented Jan 20, 2018

pdf commented Jan 21, 2018

leftytennis commented Jan 25, 2018

gilbertchen commented Mar 20, 2018

kairisku commented Mar 27, 2018

gilbertchen commented Mar 28, 2018

Reduce memory consumption for prune operation #329

Reduce memory consumption for prune operation #329

Conversation

pdf commented Jan 14, 2018 • edited Loading

pdf commented Jan 14, 2018 • edited Loading

pdf commented Jan 14, 2018 • edited Loading

gilbertchen left a comment

Choose a reason for hiding this comment

gilbertchen Jan 16, 2018

Choose a reason for hiding this comment

pdf Jan 16, 2018 • edited Loading

Choose a reason for hiding this comment

gilbertchen Jan 16, 2018

Choose a reason for hiding this comment

gilbertchen Jan 16, 2018

Choose a reason for hiding this comment

pdf Jan 16, 2018

Choose a reason for hiding this comment

gilbertchen Jan 16, 2018

Choose a reason for hiding this comment

pdf Jan 16, 2018

Choose a reason for hiding this comment

pdf commented Jan 16, 2018

gilbertchen commented Jan 16, 2018

pdf commented Jan 19, 2018

pdf commented Jan 19, 2018 • edited Loading

pdf commented Jan 20, 2018

pdf commented Jan 21, 2018

leftytennis commented Jan 25, 2018

gilbertchen commented Mar 20, 2018

kairisku commented Mar 27, 2018

gilbertchen commented Mar 28, 2018

pdf commented Jan 14, 2018 •

edited

Loading

pdf commented Jan 14, 2018 •

edited

Loading

pdf commented Jan 14, 2018 •

edited

Loading

pdf Jan 16, 2018 •

edited

Loading

pdf commented Jan 19, 2018 •

edited

Loading