Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

RFC: Dentry cleanup #2321

Closed
pwmarcz opened this issue Apr 19, 2021 · 12 comments
Closed

RFC: Dentry cleanup #2321

pwmarcz opened this issue Apr 19, 2021 · 12 comments

Comments

@pwmarcz
Copy link
Contributor

pwmarcz commented Apr 19, 2021

Master issue: #1803

I started looking at the filesystem and want to do the following cleanup. It is intended to be a clear improvement, but a much smaller effort than a complete rewrite of all filesystems.

This issue is me asking for a sanity check: is there anything that sounds like a bad idea, or will turn out to be impossible on closer look?

Summary

  • Get rid of the hashes from the API, at least for now. This is an (only partially implemented) optimization that can be added later.

  • Introduce proper locking discipline for the data stored in dentry: dcache_lock for modifying the actual tree structure, dent->lock for operations on a file. Keep it simple; we might think about optimizations later, but for now let's make sure this is correct.

  • Trim down the list of flags: DENTRY_ISLINK, DENTRY_ISDIRECTORY, and DENTRY_MOUNTPOINT are reduntant with the already-present dentry fields (type and mounted).

  • Do not split off an inode structure yet; this might be useful, but is too big a change yet. I am aware this will complicate implementation of several features (hardlinks, renaming, unlinking an open file).

  • Refactor the "lookup" API, and change it so that it distinguishes between "file does not exist but can be created" (return 0 and a negative dentry) and "file does not exist and cannot be created" (return -ENOENT).

  • Make sure the reference counting is correct, and we at least delete unused non-positive dentries. Later we might want to also delete positive dentries, but the tmpfs filesystem stores data in them, so they need to persist for now.

  • Keep the current mount semantics for now (creating synthetic directories whenever necessary), but make sure the mount points actually appear; currently readdir("/") doesn't show /lib, /proc, etc.

    Later, we might want to get rid of the synthetic directories, and instead make the root-level filesystem a tmpfs. This will bring the semantics closer to normal Linux.

  • Fix lookup flags: there is LOOKUP_FOLLOW (disabled most of the time), but actually it means following symlinks along the path to target, whereas Linux O_NOFOLLOW and lstat are only concerned with whether the target file (at the end of the path) is a symlink or not

API draft

/*
 * Directory entries. This object currently serves two purposes:
 *
 * 1. Information about file names, to help with path lookups (dentry in Linux).
 * 2. Information about files themselves (inode in Linux).
 *
 * This has some disadvantages (e.g. hardlinks, or renaming, is harder to implement), so later we
 * might want to create a separate `shim_inode` struct.
 */
struct shim_dentry {
    /*
     * Locking rules for fields in this struct:
     *
     * 1. Immutable properties, do not change after creating:
     *    - fs
     *    - name
     *    - rel_path
     *    - parent
     *
     * 2. Mutable properties, need to take `dent->lock` to read/write:
     *    - state
     *    - mounted
     *    - type
     *    - mode
     *    - data
     *
     * 3. Dentry cache properties, need to take `dcache_lock` to read/write:
     *    - children
     *    - siblings
     */

    /* Lock for all the mutable properties, see above. */
    struct shim_lock lock;

    /*
     * Reference count. Any calls returning a dentry should increase `ref_count`, and the caller
     * should decrease it when done.
     *
     * A dentry may be deleted when from the dentry cache when it has no external references, no
     * children, and is invalid/negative.
     */
    REFTYPE ref_count;

    /*
     * Flags:
     *
     * - DENTRY_VALID: if the flag is not set, the filesystem has not been queried yet, and most
     *   operations should disregard this dentry
     *
     * - DENTRY_NEGATIVE: cached negative result; the file with this name has been determined not to
     *   exist (fs,
     *
     * - DENTRY_LISTED: the directory has been listed; the list of children is guaranteed to be
     *   complete (so `readdir` can just check all children that are valid and non-negative)
     *
     * - DENTRY_ANCESTOR: this is a synthetic directory created to make sure a mount point exist
     */
    unsigned int state;

    struct shim_mount* fs;

    /* If non-null, this is a mountpoint: most operations should transparently retrieve
     * `mounted->root` instead of this dentry */
    struct shim_mount* mounted;

    struct shim_str name;
    struct shim_qstr rel_path;

    /* Dentry cache info (file tree) */
    struct shim_dentry* parent;
    LISTP_TYPE(shim_dentry) children;
    LIST_TYPE(shim_dentry) siblings;

    /* File type (S_IFREG, S_IFDIR, S_IFCHR, S_IFLNK */
    mode_t type;

    /* Mode (PERM_rwxrwxrwx, etc.) */
    mode_t mode;

    /* Filesystem-specific data */
    void* data;
};

/*
 * The following operations accept a dentry. Unless otherwise specified, the dentries passed to them
 * are valid (already initialized by `lookup()` and locked (caller holds `lock`).
 */
struct shim_d_ops {
    /* Look up a file and initialize a dentry (filling `type`, `mode`, `data`).  The `dent` is newly
     * created by `get_new_dentry()` and not valid yet. */
    int (*lookup)(struct shim_dentry* dent);

    /* Uninitialize a dentry (deallocate `data`). Will be called before destroying the dentry. */
    int (*dput)(struct shim_dentry* dent);

    /* Open an existing file (`dentry` is non-negative) */
    int (*open)(struct shim_handle* hdl, struct shim_dentry* dent, int flags);

    /* Create a new file (`dentry` is negative) */
    int (*creat)(struct shim_handle* hdl, struct shim_dentry* dent, int flags);

    /* Other filesystem operations: */
    int (*unlink)(struct shim_dentry* dir, struct shim_dentry* dent);
    int (*mkdir)(struct shim_dentry* dir, struct shim_dentry* dent, mode_t mode);
    int (*stat)(struct shim_dentry* dent, struct stat* buf);
    int (*follow_link)(struct shim_dentry* dent, struct shim_qstr* link);
    int (*set_link)(struct shim_dentry* dent, const char* link);
    int (*chmod)(struct shim_dentry* dent, mode_t mode);
    int (*chown)(struct shim_dentry* dent, int uid, int gid);
    int (*rename)(struct shim_dentry* old, struct shim_dentry* new);
    int (*readdir)(struct shim_dentry* dent, struct shim_dirent** dirent);
};

/* Lock for the dentry tree */
extern struct shim_lock dcache_lock;

/* Root of the dentry tree */
extern struct shim_dentry* dentry_root;

/*
 * Create a new dentry under `parent`. This will initialize the dentry as having the right
 * filesystem and position in the tree, but still invalid (`state == 0`). Assumes `dcache_lock` is
 * held.
 */
struct shim_dentry* get_new_dentry(struct shim_dentry* parent, const char* name, int namelen);

/* Increment the reference count on `dent` */
void get_dentry(struct shim_dentry* dent);

/* Decrement the reference count on `dent` */
void put_dentry(struct shim_dentry* dent);

/* Decrement reference count, and attempt to delete `dent` from the tree. Must be called with
 * `dentry_cache` held. */
void put_dentry_maybe_delete(struct shim_dentry* dent);

/*
 * Look up a path. Assumes `dcache_lock` is held. This will:
 * - query the filesystems for all intermediate directories, creating dentries,
 * - check permissions and file types,
 * - put a new dentry in `*dent` if possible.
 *
 * The flags are:
 * - LOOKUP_NOFOLLOW: open the link instead of file it points to
 * - LOOKUP_DIRECTORY: require a directory
 *
 * The following outcomes are possible:
 * - file found: the function will return 0 and set `*dent` to a valid, non-negative dentry,
 * - file not found, but the parent directory exists: the function will return 0 and set `*dent` to
 *   a valid, negative dentry; the caller might create the file, transforming the dentry to
 *   non-negative,
 * - lookup error (ENOENT, ENODIR, EACCES, ...) otherwise; `*dent` will not be set.
 */
int path_lookupat(struct shim_dentry* start, const char* name, int flags, struct shim_dentry** dent);
@dimakuv
Copy link

dimakuv commented Apr 19, 2021

>     struct shim_str name;
>     struct shim_qstr rel_path;

Isn't rel_path an optimization over name + parent->name + parent->parent->name + ...? Do you want to keep rel_path or we can remove it and construct the name dynamically?

I agree with all your other points. (Though I'm not very well-versed in Graphene's FS subsystem.)

@boryspoplawski
Copy link
Contributor

+1 to removing rel_path. Why do we even need it?

DENTRY_LISTED - what is this flag for? I've never really understood. If we handle listing purely in Graphene we should never query host fs and we use host fs to obtain directory entries, then we need to do it always (to notice other processes' changes), don't we?
DENTRY_ANCESTOR - probably needs a better name. This is like a virtual dentry, that lives purely in Graphene and has nothing backing it on the host fs, right?

@mkow
Copy link
Member

mkow commented Apr 21, 2021

Refactor the "lookup" API, and change it so that it distinguishes between "file does not exist but can be created" (return 0 and a negative dentry) and "file does not exist and cannot be created" (return -ENOENT).

Hmm, I think I don't get this point. Could you show an example of both cases? Also, isn't such a check inherently racy? How will you lock on this "can be created" property?

Make sure the reference counting is correct, and we at least delete unused non-positive dentries. Later we might want to also delete positive dentries, but the tmpfs filesystem stores data in them, so they need to persist for now.

A bit off-topic, but could we change this "negative"/"positive" naming for dentries to something more intuitive? Each time I stumble upon this I have no idea what's the meaning of them both in a context of a dentry.

@pwmarcz
Copy link
Contributor Author

pwmarcz commented Apr 21, 2021

Isn't rel_path an optimization over name + parent->name + parent->parent->name + ...? Do you want to keep rel_path or we can remove it and construct the name dynamically?

+1 to removing rel_path. Why do we even need it?

Agreed, I will look closer into it. Usually we need the absolute path anyway, not rel_path (rel_path is from current filesystem root, not global root).

DENTRY_LISTED - what is this flag for? I've never really understood. If we handle listing purely in Graphene we should never query host fs

The dentries are a cache. They are not guaranteed to contain all files, even for filesystems "purely in Graphene" (such as /proc), but the entries are created during operations such as lookup or listing the directory. So for a given directory, we either can be sure that we have all the dentries, or we need to query underlying fs (whether it's host, pseudo-FS, or whatever else).

and we use host fs to obtain directory entries, then we need to do it always (to notice other processes' changes), don't we?

Right now we assume that the filesystem doesn't change without us noticing. And of course, that assumption is reasonable for OS with total control over its filesystem, but not for Graphene. I think that later we should introduce some cache invalidation (which would clear DENTRY_LISTED). But that probably doesn't mean we have to list the directory always: if so, it would also mean we can never cache path lookups, etc.

On the other hand, maybe removing DENTRY_LISTED right now is still a good idea, if it makes things simpler. Path lookups would still be cached, only readdir/getdents would never be.

DENTRY_ANCESTOR - probably needs a better name. This is like a virtual dentry, that lives purely in Graphene and has nothing backing it on the host fs, right?

Yes. It's needed for mounts because mount semantics are kind of bad. Later, I want to remove it and require that a mountpoint always exists. For now, maybe a better name would be DENTRY_SYNTHETIC?

Refactor the "lookup" API, and change it so that it distinguishes between "file does not exist but can be created" (return 0 and a negative dentry) and "file does not exist and cannot be created" (return -ENOENT).

Hmm, I think I don't get this point. Could you show an example of both cases?

Imagine open("/foo/bar/baz", O_CREAT). If /foo/bar does not exist (or is not a directory, etc.), then the internal lookup should fail with -ENOENT. There is no way to open the file, and open also fails.

If /foo/bar is a directory, but /foo/bar/baz does not exist, the internal lookup shouldn't fail. It should succeed and yield a negative ("empty") dentry that can be then filled. (It's not reflected in the draft, but that should probably be a flag like LOOKUP_CREATE - we want this behaviour only if we actually want to create a file).

The current semantics of the lookup functions is that they return -ENOENT but also yield a negative dentry, if possible. So there are several places checking if the error code is -ENOENT and if the dentry is there (despite the error). The comments in these places universally complain about "terrible semantics", and I agree.

Also, isn't such a check inherently racy? How will you lock on this "can be created" property?

Yes, I imagine there should be a lock. Probably the negative dentry itself has to be returned locked? And whoever wants to create the file, has to lock its dentry first.

But to be honest, I'm not sure about how to do locking yet. The current code depends on a global lock (dcache_lock) for traversing the dentry tree, filesystem lookups, and also cases like above (finding a new file). However, many places also access the retrieved dentries without taking any lock, which doesn't seem right.

My initial idea (see the draft) was that basically filesystem operations on dentries have to be protected by a more fine-grained lock on the dentry itself. However, that sounds complicated: what if we want to operate on more than one file? We first need to look up both of them somehow, without causing a deadlock...

A bit off-topic, but could we change this "negative"/"positive" naming for dentries to something more intuitive? Each time I stumble upon this I have no idea what's the meaning of them both in a context of a dentry.

How about a "empty" (and non-empty)? The flag would be called DENTRY_EMPTY.

@dimakuv
Copy link

dimakuv commented Apr 21, 2021

For now, maybe a better name would be DENTRY_SYNTHETIC?

I like this name, but also could be DENTRY_MOCK, DENTRY_PSEUDO, DENTRY_FAKE.

The flag would be called DENTRY_EMPTY.

I was actually fine with "negative", but "empty" also sounds fine.

However, that sounds complicated: what if we want to operate on more than one file? We first need to look up both of them somehow, without causing a deadlock...

Do you mean cases like rename()? First of all, I guess you can just continue your refactorings with the global lock for now. Second, yes, we'll need some hierarchical locking for these cases.

@mkow
Copy link
Member

mkow commented Apr 21, 2021

Imagine open("/foo/bar/baz", O_CREAT). If /foo/bar does not exist (or is not a directory, etc.), then the internal lookup should fail with -ENOENT. There is no way to open the file, and open also fails.
[...]

Ok, I get it now. So, this point IMO makes sense.

Yes, I imagine there should be a lock. Probably the negative dentry itself has to be returned locked? And whoever wants to create the file, has to lock its dentry first.

Yup, something like this, I guess. But it still be tricky with multiple processes which may remove a directory in the meantime. Or maybe you mean taking a multi-process (IPC/sync-engine-based) lock for this?

How about a "empty" (and non-empty)? The flag would be called DENTRY_EMPTY.

Sounds much better!

I like this name, but also could be DENTRY_MOCK, DENTRY_PSEUDO, DENTRY_FAKE.

If we want to go this way then I'd prefer something like DENTRY_PLACEHOLDER.

@boryspoplawski
Copy link
Contributor

Agreed, I will look closer into it. Usually we need the absolute path anyway, not rel_path (rel_path is from current filesystem root, not global root).

What, why? Where do we need the absolute path and why? Doesn't sound right...

Right now we assume that the filesystem doesn't change without us noticing. And of course, that assumption is reasonable for OS with total control over its filesystem, but not for Graphene. I think that later we should introduce some cache invalidation (which would clear DENTRY_LISTED). But that probably doesn't mean we have to list the directory always: if so, it would also mean we can never cache path lookups, etc.

My point was that we don't even notice changes from other Graphene processes, so this flag sounds super useless. I guess it could make some sense, but that would require dentry invalidation.

For now, maybe a better name would be DENTRY_SYNTHETIC?

Sounds good.

The flag would be called DENTRY_EMPTY.

I was actually fine with "negative", but "empty" also sounds fine.

Actually I've never had problems with "negative" in dentries context as well and I prefer the old name than the "empty".

@pwmarcz
Copy link
Contributor Author

pwmarcz commented Apr 21, 2021

What, why? Where do we need the absolute path and why? Doesn't sound right...

See all the usages of dentry_get_path. Many of these are for internal logging, but there is also implementation of getcwd, information available in /proc, internal socket addresses, and such. And of course all the path lookups recognize absolute paths.

But yes, I see the mount-relative path is also needed in some places, mostly related to specific filesystems. So maybe it won't be so easy to remove.

@boryspoplawski
Copy link
Contributor

Many of these are for internal logging

These we can ignore

but there is also implementation of getcwd

This should probably traverse fs on each call

information available in /proc

What kind of information? Other than cwd etc (which fall in the above category)

internal socket addresses

What does that have to do with fs? Or if you mean named unix domain sockets, why we need a full path for them?

And of course all the path lookups recognize absolute paths.

But the path comes from caller not from dentry.

@pwmarcz
Copy link
Contributor Author

pwmarcz commented Apr 21, 2021

So what's the point here? There are places in Graphene that need an absolute path, they use dentry_get_path (which does traverse the tree on each call), and they probably should stay that way. I'm not suggesting to put the path in dentry, if that's what this is about - sorry if it sounded that way.

@boryspoplawski
Copy link
Contributor

Sorry, I've misunderstood you want to change the rel_path into absolute_path (i.e. keep cached path in dentry).

@pwmarcz
Copy link
Contributor Author

pwmarcz commented Jun 29, 2021

This is now mostly implemented, and the rest is outdated, so I'm closing this issue. Thanks for all the feedback.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants