don't cache entire mfs tree on add finalize #3027

whyrusleeping · 2016-08-02T00:59:02Z

This should prevent memory bloat during the finalize call of add

License: MIT
Signed-off-by: Jeromy [email protected]

License: MIT Signed-off-by: Jeromy <[email protected]>

Kubuxu · 2016-08-03T07:17:44Z

LGTMed in person.

schomatis · 2018-12-19T14:55:55Z

@Stebalien (or anyone with background on this PR) Could you expand on why there was a memory bloat in Finalize in the first place and how did removing new/updated entries from the MFS helped improve that?

(Related to ipfs/go-mfs#36.)

Stebalien · 2018-12-19T17:29:16Z

MFS caches directory entries on access. Without Uncache, there was no way to clear this cache so Finalize (which uses MFS), would cache the entire directory tree in memory when finishing an add operation.

schomatis · 2018-12-19T17:52:00Z

MFS caches directory entries on access.

Thanks! I wasn't aware of that (another thing that should be documented about the cache mechanics - ipfs/go-mfs#37).

Then we should provide a Child method that doesn't cache the entry and maybe even just provide a method at the MFS layer that lists all the entries in a Directory (without modifying the cache), but I'll definitively don't want the user to access the cache directly. Does that sound fair to you?

Stebalien · 2018-12-19T18:07:10Z

Ideally, we'd clear the cache when it reaches some size. I'm not sure about adding another method, that feels like pushing a concern to the user that they shouldn't have to think about (better than Uncache but at least that's sufficiently bad that nobody could argue for keeping it).

Although, for this case, an option when constructing the MFS instance would also suffice (for now).

schomatis · 2018-12-19T18:17:41Z

that feels like pushing a concern to the user that they shouldn't have to think about

Yes, that's why I suggested (but not very clearly, sorry) just to provide a method like ListEntries that instead of returning names like in ListNames it returns the FSNodes contained in that directory, and internally we can do that without the cache (but that will be transparent to the user) which would also simplify the code in coreunix (that won't have to convert names to nodes). WDYT?

I think I need to give the current cache some more thought, we are using it for (maybe too) many things, ideally I want a cache with dirty states that when I flush/sync the directory i's wiped out (something that I think we're not doing at the moment), and separately a read cache with a maximum capacity of N and a LRU system. That second cache doesn't need to be cleaned up, just assigned a maximum size to keep memory use in line.

Stebalien · 2018-12-20T02:29:36Z

WDYT?

It makes sense to have a method that returns both the names and the entries atomically. However, as a user, I'd expect that to cache the same way Child does. That is, I'd expect to be able to use both interfaces interchangeably (modulo the atomicity guarantees).

I think I need to give the current cache some more thought

Thinking about this a bit, the "cache" isn't really a cache; it's a list of open files. If we clear it, we invalidate any open directories. I think we might need to (a) track open files/directories separately from the "cache" and (b) add a Close function to FSNode.

If we do that, we'll probably want some kind of ListEntires method that returned directory entries instead of open file objects (so the user doesn't have to close each one).

schomatis · 2018-12-20T03:43:40Z

That is, I'd expect to be able to use both interfaces interchangeably (modulo the atomicity guarantees).

Sorry I'm lost, what atomicity are referring to?

Thinking about this a bit, the "cache" isn't really a cache; it's a list of open files.

Why opened? I think it's any entry we access when retrieving a child or adding one.

If we clear it, we invalidate any open directories.

Why? If the information there is saved in the lower UnixFS layer when doing something like GetNode I'm not following how would that invalidate open directories if we clear the cache list of persisted entries.

If we do that, we'll probably want some kind of ListEntires method that returned directory entries instead of open file objects (so the user doesn't have to close each one).

I meant to use ListEntires to avoid the logic

https://github.com/ipfs/go-ipfs/blob/3f6925dfce3df5bfa6d1fd3bc92b02d9d3c2fe62/core/coreunix/add.go#L251-L269

which attempts to list everything not just opened file objects.

(So basically at this point I'm pretty sure that I missed some important point in the conversation, my confusion is mostly related to the shift from avoiding Uncache to the subjects about atomicity and opened files in the cache.)

Stebalien · 2018-12-20T04:02:20Z

Sorry I'm lost, what atomicity are referring to?

(this was a technically correct unnecessary detail, you can ignore it)

I'd expect ListEntries to give me an atomic snapshot of the directory. If I use ListNames and then look up each name, the directory could have been edited under my feet (lookups may fail, etc.). Basically, I'd expect ListNames followed multiple calls to Child to be identical except when the directory is being modified concurrently. But that difference isn't really relevant here.

Why opened? I think it's any entry we access when retrieving a child or adding one.

Because of how we lock files open for reading/writing. If we open a file for writing, clear the "cache", and then try to reopen the file for writing, it'll work because we'll be using a different FSNode.

schomatis · 2018-12-20T04:10:32Z

I'd expect ListEntries to give me an atomic snapshot of the directory. If I use ListNames and then look up each name, the directory could have been edited under my feet (lookups may fail, etc.).

Yes, exactly, providing a ListEntries method from the MFS layer would have also the added benefit that I'll be able to lock access to the directory while generating the list (something that the user can't do). I'll prepare a PoC PR so we have something tangible to discuss further.

don't cache entire mfs tree on add finalize

ac3f9ac

License: MIT Signed-off-by: Jeromy <[email protected]>

whyrusleeping merged commit 1c41cac into master Aug 3, 2016

whyrusleeping deleted the fix/add-finalize-mem branch August 3, 2016 00:54

schomatis mentioned this pull request Dec 19, 2018

Various refactorings ipfs/go-mfs#36

Merged

schomatis mentioned this pull request Dec 21, 2018

provide a ListEntries method ipfs/go-mfs#48

Closed

schomatis mentioned this pull request Jan 10, 2019

Review the MFS "cache" (it's not a cache) ipfs/go-mfs#54

Closed

schomatis mentioned this pull request Mar 31, 2021

Fix various TODO items ipfs/go-mfs#85

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't cache entire mfs tree on add finalize #3027

don't cache entire mfs tree on add finalize #3027

whyrusleeping commented Aug 2, 2016

Kubuxu commented Aug 3, 2016

schomatis commented Dec 19, 2018

Stebalien commented Dec 19, 2018

schomatis commented Dec 19, 2018

Stebalien commented Dec 19, 2018

schomatis commented Dec 19, 2018 •

edited

Loading

Stebalien commented Dec 20, 2018

schomatis commented Dec 20, 2018

Stebalien commented Dec 20, 2018

schomatis commented Dec 20, 2018

don't cache entire mfs tree on add finalize #3027

don't cache entire mfs tree on add finalize #3027

Conversation

whyrusleeping commented Aug 2, 2016

Kubuxu commented Aug 3, 2016

schomatis commented Dec 19, 2018

Stebalien commented Dec 19, 2018

schomatis commented Dec 19, 2018

Stebalien commented Dec 19, 2018

schomatis commented Dec 19, 2018 • edited Loading

Stebalien commented Dec 20, 2018

schomatis commented Dec 20, 2018

Stebalien commented Dec 20, 2018

schomatis commented Dec 20, 2018

schomatis commented Dec 19, 2018 •

edited

Loading