Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't cache entire mfs tree on add finalize #3027

Merged
merged 1 commit into from
Aug 3, 2016

Conversation

whyrusleeping
Copy link
Member

This should prevent memory bloat during the finalize call of add

License: MIT
Signed-off-by: Jeromy [email protected]

@whyrusleeping whyrusleeping merged commit 1c41cac into master Aug 3, 2016
@whyrusleeping whyrusleeping deleted the fix/add-finalize-mem branch August 3, 2016 00:54
@Kubuxu
Copy link
Member

Kubuxu commented Aug 3, 2016

LGTMed in person.

@schomatis
Copy link
Contributor

@Stebalien (or anyone with background on this PR) Could you expand on why there was a memory bloat in Finalize in the first place and how did removing new/updated entries from the MFS helped improve that?

(Related to ipfs/go-mfs#36.)

@Stebalien
Copy link
Member

MFS caches directory entries on access. Without Uncache, there was no way to clear this cache so Finalize (which uses MFS), would cache the entire directory tree in memory when finishing an add operation.

@schomatis
Copy link
Contributor

MFS caches directory entries on access.

Thanks! I wasn't aware of that (another thing that should be documented about the cache mechanics - ipfs/go-mfs#37).

Then we should provide a Child method that doesn't cache the entry and maybe even just provide a method at the MFS layer that lists all the entries in a Directory (without modifying the cache), but I'll definitively don't want the user to access the cache directly. Does that sound fair to you?

@Stebalien
Copy link
Member

Ideally, we'd clear the cache when it reaches some size. I'm not sure about adding another method, that feels like pushing a concern to the user that they shouldn't have to think about (better than Uncache but at least that's sufficiently bad that nobody could argue for keeping it).

Although, for this case, an option when constructing the MFS instance would also suffice (for now).

@schomatis
Copy link
Contributor

schomatis commented Dec 19, 2018

that feels like pushing a concern to the user that they shouldn't have to think about

Yes, that's why I suggested (but not very clearly, sorry) just to provide a method like ListEntries that instead of returning names like in ListNames it returns the FSNodes contained in that directory, and internally we can do that without the cache (but that will be transparent to the user) which would also simplify the code in coreunix (that won't have to convert names to nodes). WDYT?


I think I need to give the current cache some more thought, we are using it for (maybe too) many things, ideally I want a cache with dirty states that when I flush/sync the directory i's wiped out (something that I think we're not doing at the moment), and separately a read cache with a maximum capacity of N and a LRU system. That second cache doesn't need to be cleaned up, just assigned a maximum size to keep memory use in line.

@Stebalien
Copy link
Member

WDYT?

It makes sense to have a method that returns both the names and the entries atomically. However, as a user, I'd expect that to cache the same way Child does. That is, I'd expect to be able to use both interfaces interchangeably (modulo the atomicity guarantees).

I think I need to give the current cache some more thought

Thinking about this a bit, the "cache" isn't really a cache; it's a list of open files. If we clear it, we invalidate any open directories. I think we might need to (a) track open files/directories separately from the "cache" and (b) add a Close function to FSNode.

If we do that, we'll probably want some kind of ListEntires method that returned directory entries instead of open file objects (so the user doesn't have to close each one).

@schomatis
Copy link
Contributor

That is, I'd expect to be able to use both interfaces interchangeably (modulo the atomicity guarantees).

Sorry I'm lost, what atomicity are referring to?

Thinking about this a bit, the "cache" isn't really a cache; it's a list of open files.

Why opened? I think it's any entry we access when retrieving a child or adding one.

If we clear it, we invalidate any open directories.

Why? If the information there is saved in the lower UnixFS layer when doing something like GetNode I'm not following how would that invalidate open directories if we clear the cache list of persisted entries.

If we do that, we'll probably want some kind of ListEntires method that returned directory entries instead of open file objects (so the user doesn't have to close each one).

I meant to use ListEntires to avoid the logic

https://github.com/ipfs/go-ipfs/blob/3f6925dfce3df5bfa6d1fd3bc92b02d9d3c2fe62/core/coreunix/add.go#L251-L269

which attempts to list everything not just opened file objects.

(So basically at this point I'm pretty sure that I missed some important point in the conversation, my confusion is mostly related to the shift from avoiding Uncache to the subjects about atomicity and opened files in the cache.)

@Stebalien
Copy link
Member

Sorry I'm lost, what atomicity are referring to?

(this was a technically correct unnecessary detail, you can ignore it)

I'd expect ListEntries to give me an atomic snapshot of the directory. If I use ListNames and then look up each name, the directory could have been edited under my feet (lookups may fail, etc.). Basically, I'd expect ListNames followed multiple calls to Child to be identical except when the directory is being modified concurrently. But that difference isn't really relevant here.

Why opened? I think it's any entry we access when retrieving a child or adding one.

Because of how we lock files open for reading/writing. If we open a file for writing, clear the "cache", and then try to reopen the file for writing, it'll work because we'll be using a different FSNode.

@schomatis
Copy link
Contributor

I'd expect ListEntries to give me an atomic snapshot of the directory. If I use ListNames and then look up each name, the directory could have been edited under my feet (lookups may fail, etc.).

Yes, exactly, providing a ListEntries method from the MFS layer would have also the added benefit that I'll be able to lock access to the directory while generating the list (something that the user can't do). I'll prepare a PoC PR so we have something tangible to discuss further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants