Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[LibOS] Don't store dentry paths #2342

Merged
merged 2 commits into from
May 12, 2021
Merged

[LibOS] Don't store dentry paths #2342

merged 2 commits into from
May 12, 2021

Conversation

pwmarcz
Copy link
Contributor

@pwmarcz pwmarcz commented Apr 30, 2021

This is the next step in FS rewrite, after #2333. See also the issue for dentry cleanup (#2321).

This change removes shim_dentry.rel_path and shim_handle.path fields. Instead, the path is computed whenever necessary. This also means that the length limit for relative path is lifted.

To obtain a path for a dentry, use dentry_abs_path or dentry_rel_path. The constructed path is allocated on heap and must be
freed afterwards.

Because computing the path can fail when out of memory, the functions for hashing dentries do not compute the path explicitly, but follow the dentry chain to the root.

(There were many places in the code was responsible for maintaining the stored path, or enforcing the length. It looked pretty brittle. So I think this is a positive change).

Next step: rewrite mount_fs and make mount semantics more sane (as mentioned in #2324). More specifically, I want mounting to not replace a dentry, but attach a new tree under it. This way, both the original mountpoint and the mounted directory can exist in the dentry tree.

  • if a dentry is marked as a mountpoint, you're supposed to go to dentry->mounted->root,
  • dentry->parent can be null not only for the global root, but also for the filesystem root, in which case you can go to dentry->fs->mountpoint.

This change is Reviewable

@pwmarcz pwmarcz force-pushed the pawel/rel-path branch 5 times, most recently from 63002ed to 6beb06e Compare May 5, 2021 18:33
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 22 of 22 files at r1.
Reviewable status: all files reviewed, 13 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)

a discussion (no related file):

Because computing the path can fail when out of memory, the functions
for hashing dentries do not compute the path explicitly, but follow the
dentry chain to the root.

This explanation in the commit message is not understandable. I will check what the code actually does and will propose to reword it.



LibOS/shim/include/shim_fs.h, line 496 at r1 (raw file):

 * \returns pointer to the path, or NULL if out of memory
 *
 * This function computes a path for dentry, allocating a new buffer for it.  The returned string

You have double space after ., please remove one space.


LibOS/shim/include/shim_fs.h, line 499 at r1 (raw file):

 * should be freed using `free`.
 *
 * An absolute path is a combination of all names up to the root (not including the root), separated

I don't understand the remark not including the root. Don't you include the root simply by the fact that the path starts with /?


LibOS/shim/include/shim_fs.h, line 513 at r1 (raw file):

char* dentry_rel_path(struct shim_dentry* dent, size_t* sizep);

char* dentry_abs_path_into_qstr(struct shim_dentry* dent, struct shim_qstr* str);

BTW, I hate this qstr trick. At some point, I would prefer to replace it with normal char* strings.


LibOS/shim/include/shim_fs.h, line 580 at r1 (raw file):

 */

int init_hash(void);

What is this function? I don't see it anywhere.


LibOS/shim/src/bookkeep/shim_handle.c, line 793 at r1 (raw file):

        hdl->fs->fs_ops->checkin(hdl);

#if DEBUG_RESUME == 1

Is this macro really useful? If you haven't used it, maybe just delete this macro and corresponding logic?


LibOS/shim/src/bookkeep/shim_vma.c, line 1341 at r1 (raw file):

    }

#if DEBUG_RESUME == 1

ditto


LibOS/shim/src/fs/shim_dcache.c, line 128 at r1 (raw file):

    int64_t count = REF_DEC(dent->ref_count);
#ifdef DEBUG_REF
    const char* path = dentry_abs_path(dent);

Forgot to free(path) afterwards?


LibOS/shim/src/fs/shim_dcache.c, line 266 at r1 (raw file):

}

static char* dentry_path_into_buf(struct shim_dentry* dent, bool relative, char* buf, size_t size) {

It's worth mentioning that this function returns a pointer inside buf where the constructed path starts (not necessarily the beginning of buf), because it constructs the path from the end


LibOS/shim/src/fs/shim_dcache.c, line 274 at r1 (raw file):

    /* Add names, starting from the last one, until we encounter root */
    while (dent && dent->parent) {
        /* If the path is relative, also finish when encountering a mountpoint */

So in the context of our FS subsystem, "relative" means "ending at the first encountered mountpoint", right? It has nothing to do with the conventional meaning of "ending at the CWD"?


LibOS/shim/src/fs/shim_dcache.c, line 478 at r1 (raw file):

    }

#if DEBUG_RESUME == 1

ditto


LibOS/shim/src/fs/shim_fs.c, line 444 at r1 (raw file):

     * rehash the path; this should be handled by get_new_dentry, or already be
     * hashed if mount_root exists.  I'm going to leave this line here for now
     * as documentation in case there is a problem later.

Oh these comments from the past "I don't know what's this but let's keep it", I will miss you :)


LibOS/shim/src/fs/shim_fs.c, line 562 at r1 (raw file):

    dent->state |= DENTRY_MOUNTPOINT;
    if ((ret = _path_lookupat(g_dentry_root, mount_point, lookup_flags, &dent2)) < 0) {
        dent->state &= ~DENTRY_MOUNTPOINT;

Why does it even matter? Aren't we supposed to destroy this dentry on failure to look it up? On the other hand, you will rewrite this function in a next PR, so may ignore this comment.


LibOS/shim/src/fs/chroot/fs.c, line 26 at r1 (raw file):

#include "stat.h"

#define READDIR_BUF_SIZE 4096

Don't you want to move it to some header file? You use it already twice.


LibOS/shim/src/sys/shim_epoll.c, line 550 at r1 (raw file):

        CP_REBASE(epoll_item->list);

#if DEBUG_RESUME == 1

ditto

@pwmarcz pwmarcz requested a review from dimakuv May 6, 2021 09:18
Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 18 of 22 files reviewed, 13 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @pwmarcz)


LibOS/shim/include/shim_fs.h, line 496 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

You have double space after ., please remove one space.

Fixed. And I changed the setting in my Emacs that does this :)


LibOS/shim/include/shim_fs.h, line 499 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I don't understand the remark not including the root. Don't you include the root simply by the fact that the path starts with /?

I wanted to give a definition that closely describes what the code does. Consider a path like /foo/bar. The dentries on the path are name_0 = "" (the root), name_1 = "foo", and name_2 = "bar".

By my definition, the absolute path is a combination of "foo" and "bar", separated by /, and beginning with /: "/" + name_1 + "/" + name_2.

I could also define the absolute path as "a combination of all names up to the root; separated by /", and note that because the last name is empty, the path always begins with /. The above path would be then defined as name_0 + "/" + name_1 + "/" + name_2.

However, I don't like it because:

  • the fact that an absolute path starts with root becomes an incidental property instead of essential part of the definition,
  • the root itself (/) becomes an exception, because by above definition the absolute path would be empty.

Does that make sense? Should I reword the comment to make this clearer? Maybe "An absolute path is a combination of all names up to the root (not including the root, which by convention has an empty name), separated by / and beginning with /".


LibOS/shim/include/shim_fs.h, line 513 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

BTW, I hate this qstr trick. At some point, I would prefer to replace it with normal char* strings.

Agreed, I'm trying not to use it in new code, and we can replace the remaining places that use it.

I think it was taken from Linux filesystem code, which uses it (among other things) to store a hash together with a string? But we never did that consistently.


LibOS/shim/include/shim_fs.h, line 580 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What is this function? I don't see it anywhere.

Right. Removed.

(It's a leftover from when I briefly wanted to use CRC64 for the hashes, and the function initialized a lookup table. I decided not to bother, I think a proper solution wouldn't use path hashes anyway.)


LibOS/shim/src/bookkeep/shim_handle.c, line 793 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Is this macro really useful? If you haven't used it, maybe just delete this macro and corresponding logic?

I haven't used it but I never did anything with the checkpoint/restore system. I expect it might be useful for debugging the checkpoints, so I'm not comfortable deleting that code while making an unrelated (to some degree) change.


LibOS/shim/src/fs/shim_dcache.c, line 128 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Forgot to free(path) afterwards?

Fixed.

By the way, unlike DEBUG_RESUME, I did use these messages at one point.


LibOS/shim/src/fs/shim_dcache.c, line 266 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

It's worth mentioning that this function returns a pointer inside buf where the constructed path starts (not necessarily the beginning of buf), because it constructs the path from the end

Done.


LibOS/shim/src/fs/shim_dcache.c, line 274 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

So in the context of our FS subsystem, "relative" means "ending at the first encountered mountpoint", right? It has nothing to do with the conventional meaning of "ending at the CWD"?

Yes.

I guess calling it "mount-relative" would be less confusing, but it looks pretty bad in the code, and the name (rel_path etc.) is pretty well established in the code already.


LibOS/shim/src/fs/shim_fs.c, line 562 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why does it even matter? Aren't we supposed to destroy this dentry on failure to look it up? On the other hand, you will rewrite this function in a next PR, so may ignore this comment.

Yes, I decided this function is beyond repair, it will work differently anyway. This PR prepares the ground, and fortunately, the next PR is finally the moment I can rewrite it.


LibOS/shim/src/fs/chroot/fs.c, line 26 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Don't you want to move it to some header file? You use it already twice.

Okay, moved to shim_fs.h. I left it that way because it's an implementation detail of these two readdir implementations (and from the looks of it, they're not similar), but I guess having the definition in one place would be less confusing anyway.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Because computing the path can fail when out of memory, the functions
for hashing dentries do not compute the path explicitly, but follow the
dentry chain to the root.

This explanation in the commit message is not understandable. I will check what the code actually does and will propose to reword it.

OK, the commit message actually makes sense. The hash functions iterate through dentry->name and update the hash based on this, instead of concatenating names into one string and then taking a hash over it. Unblocking.



LibOS/shim/include/shim_fs.h, line 499 at r1 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

I wanted to give a definition that closely describes what the code does. Consider a path like /foo/bar. The dentries on the path are name_0 = "" (the root), name_1 = "foo", and name_2 = "bar".

By my definition, the absolute path is a combination of "foo" and "bar", separated by /, and beginning with /: "/" + name_1 + "/" + name_2.

I could also define the absolute path as "a combination of all names up to the root; separated by /", and note that because the last name is empty, the path always begins with /. The above path would be then defined as name_0 + "/" + name_1 + "/" + name_2.

However, I don't like it because:

  • the fact that an absolute path starts with root becomes an incidental property instead of essential part of the definition,
  • the root itself (/) becomes an exception, because by above definition the absolute path would be empty.

Does that make sense? Should I reword the comment to make this clearer? Maybe "An absolute path is a combination of all names up to the root (not including the root, which by convention has an empty name), separated by / and beginning with /".

I like your proposed rewording, please add that part about "by convention"


LibOS/shim/src/bookkeep/shim_handle.c, line 793 at r1 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

I haven't used it but I never did anything with the checkpoint/restore system. I expect it might be useful for debugging the checkpoints, so I'm not comfortable deleting that code while making an unrelated (to some degree) change.

OK, fair enough.


LibOS/shim/src/fs/shim_dcache.c, line 128 at r1 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

Fixed.

By the way, unlike DEBUG_RESUME, I did use these messages at one point.

Yes, DEBUG_REF is sometimes useful :)


LibOS/shim/src/fs/shim_dcache.c, line 274 at r1 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

Yes.

I guess calling it "mount-relative" would be less confusing, but it looks pretty bad in the code, and the name (rel_path etc.) is pretty well established in the code already.

OK. I like the naming scheme, just wanted to make sure.

@pwmarcz pwmarcz requested a review from dimakuv May 6, 2021 10:26
Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 21 of 22 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


LibOS/shim/include/shim_fs.h, line 499 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I like your proposed rewording, please add that part about "by convention"

Done.

dimakuv
dimakuv previously approved these changes May 6, 2021
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 18 of 22 files at r1, 3 of 4 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 3 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)

a discussion (no related file):
@boryspoplawski wanted to take a look at this PR because he tried to do the same some time ago. So, I'm doing just a quick review and will rely on his judgements ;)



LibOS/shim/src/fs/shim_fs_pseudo.c, line 376 at r3 (raw file):

int pseudo_follow_link(struct shim_dentry* dent, struct shim_qstr* link,
                       const struct pseudo_ent* root_ent) {
    char* rel_path = dentry_rel_path(dent, /*sizep=*/NULL);

Missing check for retval?


LibOS/shim/src/ipc/shim_ipc_pid.c, line 306 at r3 (raw file):

            struct shim_dentry* dent = NULL;
            switch(msgin->code) {

missing space after switch

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @mkow)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

@boryspoplawski wanted to take a look at this PR because he tried to do the same some time ago. So, I'm doing just a quick review and will rely on his judgements ;)

OK, thanks!



LibOS/shim/src/fs/shim_fs_pseudo.c, line 376 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Missing check for retval?

Fixed.


LibOS/shim/src/ipc/shim_ipc_pid.c, line 306 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

missing space after switch

Fixed.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)

yamahata
yamahata previously approved these changes May 8, 2021
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 18 of 22 files at r1, 3 of 4 files at r2, 1 of 1 files at r3, 2 of 2 files at r4.
Reviewable status: all files reviewed, 1 unresolved discussion, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)

a discussion (no related file):

Previously, pwmarcz (Paweł Marczewski) wrote…

OK, thanks!

(small correction, for the record: I misread one of Borys' messages, he didn't work on this previously, but just wanted to take a look before merging)


Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 22 files at r1, 2 of 4 files at r2, 1 of 1 files at r3, 2 of 2 files at r4.
Reviewable status: all files reviewed, 16 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)


LibOS/shim/include/shim_fs.h, line 217 at r4 (raw file):

    struct shim_d_ops* d_ops;

    /* TODO: this is unused right now (see `mount_point`) */

Why not remove it for now then?


LibOS/shim/include/shim_fs.h, line 494 at r4 (raw file):

 * \param[out] sizep if not NULL, will be set to path size, including null terminator
 *
 * \returns pointer to the path, or NULL if out of memory

Isn't the keyword \return?


LibOS/shim/include/shim_fs.h, line 494 at r4 (raw file):

 * \param[out] sizep if not NULL, will be set to path size, including null terminator
 *
 * \returns pointer to the path, or NULL if out of memory

Could you return the calues with out-params and errors directly from the function? We recently switched to that convention, at least in PAL API, but IMO we should go with it everywhere.


LibOS/shim/include/shim_fs.h, line 505 at r4 (raw file):

 * mountpoint), separated by `/`. A relative path never begins with `/`.
 *
 * Instead of using this function directly, you should use `dentry_abs_path` or `dentry_rel_path`,

Why this function is even exported then? It's not used anywhere outside of dentry_*_path and as you suggested shouldn't be.


LibOS/shim/include/shim_fs.h, line 513 at r4 (raw file):

char* dentry_rel_path(struct shim_dentry* dent, size_t* sizep);

char* dentry_abs_path_into_qstr(struct shim_dentry* dent, struct shim_qstr* str);

Could you just return int errors here? No call places uses the return value for anything except checking for errors.


LibOS/shim/src/shim_rtld.c, line 873 at r4 (raw file):

                      "case Graphene requires a specially-crafted memory layout. You can enable it "
                      "by adding 'sgx.nonpie_binary = 1' to the manifest.\n",
                      path ? path : "(unknown)");

Hmm, wouldn't be a host path better to print in this place?
Not sure, just an idea.


LibOS/shim/src/bookkeep/shim_handle.c, line 89 at r4 (raw file):

            return ret;
        }
        if (exec->dentry->fs != fs) {

Couldn't we check that exec->dentry points to a real file (not a mountpoint, not a directory etc)? Like check the type or something.


LibOS/shim/src/fs/shim_dcache.c, line 128 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yes, DEBUG_REF is sometimes useful :)

Can confirm, used DEBUG_REF a few times, but never DEBUG_RESUME :)


LibOS/shim/src/fs/shim_dcache.c, line 172 at r4 (raw file):

    if (!qstrsetstr(&dent->name, name, name_len))
        return NULL;

Shouldn't we cleanup here? I.e. put_mount(fs) and free_dentry(dent).


LibOS/shim/src/fs/shim_dcache.c, line 254 at r4 (raw file):

            break;

        if (!first)

Not blocking, but I think we could remove first, always add 1 here and change if (!relative) to if (relative && size > 1) { size--; } after the loop. Not sure if that would be cleaner though.


LibOS/shim/src/fs/shim_dcache.c, line 255 at r4 (raw file):

        if (!first)
            size++;

Could you add a quick comment this accounts for / separating two path segments?


LibOS/shim/src/fs/shim_dcache.c, line 262 at r4 (raw file):

    if (!relative)
        size++;

Could you add a quick comment this accounts for leading /?


LibOS/shim/src/fs/shim_dcache.c, line 271 at r4 (raw file):

static char* dentry_path_into_buf(struct shim_dentry* dent, bool relative, char* buf, size_t size) {
    bool first = true;
    size_t pos = size - 1;

Sounds like a good thing to assert (size > 0) or even explicitly check.


LibOS/shim/src/fs/shim_dcache.c, line 276 at r4 (raw file):

    /* Add names, starting from the last one, until we encounter root */
    while (dent && dent->parent) {

Why is dent == NULL allowed to be passed to this function?


LibOS/shim/src/fs/shim_fs.c, line 560 at r4 (raw file):

    /*Now go ahead and do a lookup so the dentry is valid */
    dent->state |= DENTRY_MOUNTPOINT;

Why do you mark this dent as mountpoint before lookup?
Update: nevermind, this function is all weird, I won't even try to understand it, waiting for the rewrite in the future :)


LibOS/shim/src/fs/shim_fs_hash.c, line 40 at r4 (raw file):

}

HASHTYPE hash_abs_path(struct shim_dentry* dent) {

This function is not equivalent to the old version, why did you modify it?
It would be nice if you purged it and introduced something sane (probably not in this PR) and I just wonder why change it now.


LibOS/shim/src/fs/shim_fs_pseudo.c, line 332 at r4 (raw file):

    return 0;

err:

NP: maybe join these two return paths into one and just have if (ret == 0) { *dirent = buf; }?


LibOS/shim/src/sys/shim_socket.c, line 291 at r4 (raw file):

    if (size > ARRAY_SIZE(un->sun_path)) {

Unneeded empty line.

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 14 of 22 files reviewed, 16 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, @pwmarcz, and @yamahata)


LibOS/shim/include/shim_fs.h, line 217 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why not remove it for now then?

I'd rather not, because there is checkpointing code that handles this field. So if I remove it here, in the next branch I would add the same exact code again.


LibOS/shim/include/shim_fs.h, line 494 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Isn't the keyword \return?

Doxygen allows both, so I was using \returns so that this reads better. But I see that we consistently use return. Updated.


LibOS/shim/include/shim_fs.h, line 494 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Could you return the calues with out-params and errors directly from the function? We recently switched to that convention, at least in PAL API, but IMO we should go with it everywhere.

Do you mean that these functions should look like this?

int dentry_abs_path(struct shim_dentry* dent, char** path, size_t* sizep);
int dentry_rel_path(struct shim_dentry* dent, char** path, size_t* sizep);

I can do that, but I want to be sure first - that's a lot of call sites to rewrite, for extra verbosity and (to me) no clear benefit. I get that it's more uniform, and could allow you to return other errors than ENOMEM, but it's also less convenient to use.

Also, how far does this convention extend? Do you generally want all functions that return a pointer to use a two-star out-parameter instead? There are many functions that allocate/create an object, and where returning a pointer (same as malloc) seems more convenient to me.


LibOS/shim/include/shim_fs.h, line 505 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why this function is even exported then? It's not used anywhere outside of dentry_*_path and as you suggested shouldn't be.

I wanted to avoid duplicating the comment, but yeah, it's hacky.

I removed the _dentry_path function and added comments for dentry_abs_path and dentry_rel_path instead. I don't like how there is a lot of duplicated text, but maybe that's okay.


LibOS/shim/include/shim_fs.h, line 513 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Could you just return int errors here? No call places uses the return value for anything except checking for errors.

Makes sense. Done.


LibOS/shim/src/shim_rtld.c, line 873 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Hmm, wouldn't be a host path better to print in this place?
Not sure, just an idea.

I agree. Changed to URI.


LibOS/shim/src/bookkeep/shim_handle.c, line 89 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Couldn't we check that exec->dentry points to a real file (not a mountpoint, not a directory etc)? Like check the type or something.

I could check for DENTRY_MOUNTPOINT, but that's only because right now DENTRY_MOUNTPOINT actually means root of another filesystem. In the next branch, these are two different things (and the actual mountpoint is never returned by lookup, just traversed). That's why checking dentry->fs makes more sense to me.

I did add a check for other "irregular" cases (a directory, and a synthetic node).


LibOS/shim/src/fs/shim_dcache.c, line 172 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Shouldn't we cleanup here? I.e. put_mount(fs) and free_dentry(dent).

Done (free_dentry does put_mount already).


LibOS/shim/src/fs/shim_dcache.c, line 254 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Not blocking, but I think we could remove first, always add 1 here and change if (!relative) to if (relative && size > 1) { size--; } after the loop. Not sure if that would be cleaner though.

I want this function to work like the one computing the path, so that it's clear the size is correct (and also I want relative and absolute cases to be reasonably similar). The path is constructed starting from the end, so I can't adjust it this way after the loop.


LibOS/shim/src/fs/shim_dcache.c, line 255 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Could you add a quick comment this accounts for / separating two path segments?

I added the same comments as in the next function.


LibOS/shim/src/fs/shim_dcache.c, line 262 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Could you add a quick comment this accounts for leading /?

Done.


LibOS/shim/src/fs/shim_dcache.c, line 271 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Sounds like a good thing to assert (size > 0) or even explicitly check.

Makes sense. This function returns NULL when size is insufficient, so I added a check for zero.


LibOS/shim/src/fs/shim_dcache.c, line 276 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why is dent == NULL allowed to be passed to this function?

It's not. I think checking dent was necessary in some iteration of this function, but now it's redundant. Fixed.


LibOS/shim/src/fs/shim_fs.c, line 560 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Why do you mark this dent as mountpoint before lookup?
Update: nevermind, this function is all weird, I won't even try to understand it, waiting for the rewrite in the future :)

To make sure the relative path is correct already :)

Yes, the next version does not do this.


LibOS/shim/src/fs/shim_fs_hash.c, line 40 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

This function is not equivalent to the old version, why did you modify it?
It would be nice if you purged it and introduced something sane (probably not in this PR) and I just wonder why change it now.

Because now that path can be arbitrarily large, the previous version (take a path and compute a hash based on it) can fail on allocation. Instead of adding error handling everywhere, I decided to write a version that doesn't allocate memory.

As for "something sane": this is mostly for inventing inode values, so hashing is just not correct here (because two paths might resolve to the same hash). I wonder how important this is - if we don't care, I could use a better function (CRC64?), if we do, then probably we need to allocate the numbers in a different way.


LibOS/shim/src/fs/shim_fs_pseudo.c, line 332 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

NP: maybe join these two return paths into one and just have if (ret == 0) { *dirent = buf; }?

OK. Done.


LibOS/shim/src/sys/shim_socket.c, line 291 at r4 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Unneeded empty line.

Fixed.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r4, 8 of 8 files at r5.
Reviewable status: all files reviewed, 17 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @pwmarcz)


LibOS/shim/include/shim_fs.h, line 494 at r4 (raw file):

We recently switched to that convention, at least in PAL API

I don't think we should go to this extreme... For internal functions, most of the time a NULL/non-NULL distinction is sufficient.


LibOS/shim/src/bookkeep/shim_handle.c, line 95 at r5 (raw file):

        }
        if (exec->dentry->fs != fs) {
            log_error("init_exec_handle: cannot find executable in filesystem,"

Why did you remove a space at the end? It was needed.


LibOS/shim/src/fs/shim_dcache.c, line 128 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Can confirm, used DEBUG_REF a few times, but never DEBUG_RESUME :)

Let's remove it in some other new PR, this will be simple.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 22 files at r1, 7 of 8 files at r5, 1 of 1 files at r6.
Reviewable status: all files reviewed, 17 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @boryspoplawski and @pwmarcz)


LibOS/shim/include/shim_fs.h, line 494 at r4 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

Doxygen allows both, so I was using \returns so that this reads better. But I see that we consistently use return. Updated.

Didn't know it's allowed, I like returns much more! But let's change and unify this in another PR.


LibOS/shim/include/shim_fs.h, line 494 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We recently switched to that convention, at least in PAL API

I don't think we should go to this extreme... For internal functions, most of the time a NULL/non-NULL distinction is sufficient.

Yeah, I think if the only possible failure is ENOMEM then returning NULL should be fine.


LibOS/shim/src/shim_rtld.c, line 873 at r4 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

I agree. Changed to URI.

Hmm, not sure if this is a good idea. What if we start a binary from tmpfs? (after its fixed to be shared between processes) And conceptually, Graphene is starting files from its internal "mount namespace", not from the host.


LibOS/shim/src/fs/shim_fs_hash.c, line 40 at r4 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

Because now that path can be arbitrarily large, the previous version (take a path and compute a hash based on it) can fail on allocation. Instead of adding error handling everywhere, I decided to write a version that doesn't allocate memory.

As for "something sane": this is mostly for inventing inode values, so hashing is just not correct here (because two paths might resolve to the same hash). I wonder how important this is - if we don't care, I could use a better function (CRC64?), if we do, then probably we need to allocate the numbers in a different way.

I think we care about this, inode collisions may be a security issue, so I'd allocate them properly.

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 18 of 22 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @yamahata)


LibOS/shim/include/shim_fs.h, line 502 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1 though it's obvious that 0 on success, negative error code otherwise. But we typically write it, so please add.

Done.


LibOS/shim/include/shim_fs.h, line 519 at r8 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Described the return value here as well.


LibOS/shim/src/fs/shim_dcache.c, line 323 at r8 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

+1. I prefer without these p suffixes.

I did it this way because this function already uses path as size for actual values. I renamed them to _path and _size - if we already have a convention for such cases, I can change.


LibOS/shim/src/fs/shim_dcache.c, line 340 at r8 (raw file):
OK, I changed the code not to touch the path on failure, and updated the call sites that needed to null-initialize the path.

These two return paths are so similar, please join them.

This not relevant anymore, but for future reference: how did you want to join? If we're setting two different values, I'm not seeing a good option.

I'm considering something like:

*pathp = (ret == 0 ? path : NULL);
if (sizep)
    *sizep = (ret == 0 ? size : 0);

Or:

if (ret == 0) {
    *pathp = path;
    if (sizep)
       *sizep = size;
} else {
    *pathp = NULL;
    if (sizep)
       *sizep = 0;
}

Both of these look worse to me than having a separate error path.


LibOS/shim/src/fs/chroot/fs.c, line 158 at r8 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Please realign.

Fixed.


LibOS/shim/src/sys/shim_getcwd.c, line 40 at r8 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

I would prefer not to as well, but then duplicating cleanup code is even worse. Maybe good old goto is needed?

Changed to use goto.

mkow
mkow previously approved these changes May 12, 2021
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r9.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski)

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r9.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)


LibOS/shim/src/fs/shim_dcache.c, line 340 at r8 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

OK, I changed the code not to touch the path on failure, and updated the call sites that needed to null-initialize the path.

These two return paths are so similar, please join them.

This not relevant anymore, but for future reference: how did you want to join? If we're setting two different values, I'm not seeing a good option.

I'm considering something like:

*pathp = (ret == 0 ? path : NULL);
if (sizep)
    *sizep = (ret == 0 ? size : 0);

Or:

if (ret == 0) {
    *pathp = path;
    if (sizep)
       *sizep = size;
} else {
    *pathp = NULL;
    if (sizep)
       *sizep = 0;
}

Both of these look worse to me than having a separate error path.

Generally speaking in other places we do not set return values on errors, because usually it makes no sense (if we had something to return there wouldn't be an error). In such case you could set return values before the goto label and just do cleanup + return value after it.


LibOS/shim/src/sys/shim_getcwd.c, line 37 at r9 (raw file):

    unlock(&g_process.fs_lock);

    char* path;

Please initialize path. We do not touch it on errors now so there is no real problem here, but now that the API does not clean it, it would be better not to worry if it's initialized in later parts of the code.

boryspoplawski
boryspoplawski previously approved these changes May 12, 2021
Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r10.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners


LibOS/shim/src/fs/shim_dcache.c, line 340 at r8 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Generally speaking in other places we do not set return values on errors, because usually it makes no sense (if we had something to return there wouldn't be an error). In such case you could set return values before the goto label and just do cleanup + return value after it.

Okay. I agree with that reasoning, and it works that way now, but I understood "these paths are similar, join them" as refactoring the code, not completely changing what the error path does :)


LibOS/shim/src/sys/shim_getcwd.c, line 37 at r9 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Please initialize path. We do not touch it on errors now so there is no real problem here, but now that the API does not clean it, it would be better not to worry if it's initialized in later parts of the code.

Done.

mkow
mkow previously approved these changes May 12, 2021
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r10.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners

dimakuv
dimakuv previously approved these changes May 12, 2021
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r9, 1 of 1 files at r10.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners

The `root` field was intended to be an actual root of the filesystem,
but in practice, it was initialized completely outside of the dentry
tree (while the actual root was kept in the `mountpoint` field).

I'm planning to change mount semantics so that `root` is used again, but
for now, I'm removing the code that uses it, because it worked only by
accident and interferes with refactoring.

As a side-effect, this breaks the regression test that runs an
executable called "dev", because the /dev filesystem is mounted over it.
(Previously, the dentry for the executable was created under the
detached `shim_mount.root` dentry). I'm renaming the executable to
"devfs" instead, and adding a check that a dentry for executable is not
shadowed by a mount.

Signed-off-by: Paweł Marczewski <[email protected]>
This change removes `shim_dentry.rel_path` and `shim_handle.path`
fields. Instead, the path is computed whenever necessary. This also
means that the length limit for relative path is lifted.

To obtain a path for a dentry, use `dentry_abs_path` or
`dentry_rel_path`. The constructed path is allocated on heap and must be
freed afterwards.

Because computing the path can fail when out of memory, the functions
for hashing dentries do not compute the path explicitly, but follow the
dentry chain to the root.

Signed-off-by: Paweł Marczewski <[email protected]>
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r11.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: ITL)

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r11.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dimakuv
Copy link

dimakuv commented May 12, 2021

Jenkins, retest Jenkins-20.04 please (kill11 LTP test failed)

@dimakuv
Copy link

dimakuv commented May 12, 2021

Jenkins, retest Jenkins-SGX-20.04-apps please (Cannot contact <jenkins worker>, looks like Jenkins glitches)

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r11.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@boryspoplawski
Copy link
Contributor

Jenkins, retest Jenkins-SGX-20.04-apps please (Jenkins worker glitches)

1 similar comment
@boryspoplawski
Copy link
Contributor

Jenkins, retest Jenkins-SGX-20.04-apps please (Jenkins worker glitches)

@mkow mkow merged commit 601cd46 into master May 12, 2021
@mkow mkow deleted the pawel/rel-path branch May 12, 2021 15:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants