-
Notifications
You must be signed in to change notification settings - Fork 260
Conversation
8a90cf3
to
fb31495
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r1.
Reviewable status: all files reviewed, 14 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)
a discussion (no related file):
(@dimakuv you asked about this recently - this commit does not implement lseek, but it makes it relatively simple to do so)
Thanks! Looks like adding directory handling to lseek will be trivial after this PR -- we can simply set dir_handle.pos
to the requested value.
LibOS/shim/include/shim_fs.h, line 455 at r1 (raw file):
* \param hdl a directory handle * * This function initializes the `hdl->dir_info` for a structure, so that the directory can be
initializes ... for a structure
? This doesn't sound like correct English.
LibOS/shim/include/shim_fs.h, line 457 at r1 (raw file):
* This function initializes the `hdl->dir_info` for a structure, so that the directory can be * listed using `getdents/getdents64` syscalls. It should only be called once, i.e. only if * `hdl->dir_info->dents` is NULL.
Why only called once
? What if there is a file created in this directory by Graphene at runtime? If this is not supported, we should add a TODO/explanation.
Or maybe it doesn't list
the directory (as the name suggests), but only inits some metadata? Then I suggest a rename of this function.
LibOS/shim/src/bookkeep/shim_handle.c, line 685 at r1 (raw file):
/* we don't checkpoint children dentries of a directory dentry, so need to list * directory again in child process; mark handle to indicate no cached dentries */ hdl->dir_info.dents = NULL;
How did this survive our tests?! (Hm, or maybe it's just a memory leak.)
You must update new_hdl
, not hdl
. Right now you're "destroying" the information in the parent's directory handle. But what you want to do is to mark new_hdl
as an "uninitialized" directory.
It looks like there was this same bug in the old code as well...
LibOS/shim/src/fs/shim_fs_pseudo.c, line 227 at r1 (raw file):
/*! * \brief Create and populate buffer with dirents.
The description is wrong now, there is no buffer.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 231 at r1 (raw file):
* Generic function for pseudo-filesystems. Example usage for the `/proc` FS is * `pseudo_readdir("/proc/3", callback, arg, proc_root_ent)` -- this calls `callback` on "root", * "cwd", "exe", "fd", etc.
I got lost here. So what does this callback do? How do we get the list of dir entries using this function?
UPDATE: Ok, after reading the implementation, the callback is typically (always?) used to create a temporary list of child entries, and arg
is the pointer to this list. Please add a comment in the header file about this typical usage.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 234 at r1 (raw file):
* * \param[in] dent Dentry with path to the requested directory. * \param[out] dirent Pointer to newly created buffer with dirents.
There is no dirent
, now you have a callback.
LibOS/shim/src/fs/shim_namei.c, line 579 at r1 (raw file):
LISTP_FOR_EACH_ENTRY(ent, &ents, list) { struct shim_dentry* child; int ret = lookup_dentry(dent, ent->name, ent->name_len, &child);
Why do we need to double-check what readdir
returned? What is the case when readdir
lies to us and this additional loop over children detects?
LibOS/shim/src/fs/shim_namei.c, line 579 at r1 (raw file):
LISTP_FOR_EACH_ENTRY(ent, &ents, list) { struct shim_dentry* child; int ret = lookup_dentry(dent, ent->name, ent->name_len, &child);
You have a redefinition of int ret
here. Not important though.
LibOS/shim/src/fs/shim_namei.c, line 601 at r1 (raw file):
assert(locked(&hdl->lock)); assert(hdl->dentry); assert(!dirhdl->dents);
Why do we have this assert? I still don't understand why we do this only once. I would expect if (dirhdl->dents) return 0
instead.
LibOS/shim/src/fs/shim_namei.c, line 608 at r1 (raw file):
int ret; lock(&g_dcache_lock);
This looks brittle. The caller first grabs the lock over hdl
, and only then we grab the global lock g_dcache_lock
. Shouldn't it be the other way around? Or at least shouldn't both these operations happen in the caller?
LibOS/shim/src/fs/shim_namei.c, line 613 at r1 (raw file):
goto out; size_t capacity = hdl->dentry->nchildren + 1;
Maybe add a quick comment that +1
is .
?
LibOS/shim/src/fs/shim_namei.c, line 627 at r1 (raw file):
if (hdl->dentry->parent) { get_dentry(hdl->dentry->parent); assert(count < capacity);
This assert is trivial, I suggest to remove it. Or do you do this just for readability?
LibOS/shim/src/fs/proc/ipc-thread.c, line 299 at r1 (raw file):
for (p = pid; p; p /= 10) { ptr->name[l--] = p % 10 + '0'; }
BTW, we still find these cute snippets of code in Graphene :)
LibOS/shim/src/sys/shim_open.c, line 317 at r1 (raw file):
/* we are grabbing the lock because the handle content is actually updated */
Looks like we got to the really old code of Graphene, with nonsense comments :)
LibOS/shim/src/sys/shim_open.c, line 347 at r1 (raw file):
if (is_getdents64) { ent_size = ALIGN_UP(sizeof(struct linux_dirent64) + name_len + 1, alignof(struct linux_dirent));
Shouldn't it be alignof(struct linux_dirent64)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome cleanup!
Reviewed 13 of 13 files at r1.
Reviewable status: all files reviewed, 28 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv and @pwmarcz)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
(@dimakuv you asked about this recently - this commit does not implement lseek, but it makes it relatively simple to do so)
Thanks! Looks like adding directory handling to lseek will be trivial after this PR -- we can simply set
dir_handle.pos
to the requested value.
To whoever will be adding this later: Please remember to add a test for it :)
LibOS/shim/src/bookkeep/shim_handle.c, line 685 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
How did this survive our tests?! (Hm, or maybe it's just a memory leak.)
You must update
new_hdl
, nothdl
. Right now you're "destroying" the information in the parent's directory handle. But what you want to do is to marknew_hdl
as an "uninitialized" directory.It looks like there was this same bug in the old code as well...
@pwmarcz: When fixing this, could you add a test which would be able to detect this bug? Seems this was completely untested?
LibOS/shim/src/fs/shim_fs_pseudo.c, line 231 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I got lost here. So what does this callback do? How do we get the list of dir entries using this function?
UPDATE: Ok, after reading the implementation, the callback is typically (always?) used to create a temporary list of child entries, and
arg
is the pointer to this list. Please add a comment in the header file about this typical usage.
Tbh current description was pretty clear to me (it's a generic function which calls the given function for all subfiles in a dir), but if you want we can describe an example usage.
LibOS/shim/src/fs/shim_namei.c, line 528 at r1 (raw file):
its names
Rather its subfiles names?
LibOS/shim/src/fs/shim_namei.c, line 560 at r1 (raw file):
list_directory
I think it's not really listing it, from the API point of view? It just verifies if all the dentries are in place, without returning the entries. So, I'd reflect this in the name somehow (assuming my understanding is correct).
LibOS/shim/src/fs/shim_namei.c, line 582 at r1 (raw file):
if (ret == 0) { put_dentry(child); } else if (ret != -EACCES) {
Should this actually be an error? Sometimes we have a dir with unreadable files. I assume that the idea here is that they still should be possible to be "looked up"? (yup, I'm not sure when exactly lookup_dentry()
may fail with -EACCES
, so I'm confused about what is actually checked here)
LibOS/shim/src/fs/shim_namei.c, line 611 at r1 (raw file):
if ((ret = list_directory(hdl->dentry)) < 0) goto out;
When exactly can this happen? (that we lack some dentries)
LibOS/shim/src/fs/shim_namei.c, line 614 at r1 (raw file):
size_t capacity = hdl->dentry->nchildren + 1; if (hdl->dentry->parent)
Actually, how is this possible? On Linux even /
has a parent (itself) and the corresponding ..
entry.
LibOS/shim/src/fs/shim_namei.c, line 617 at r1 (raw file):
capacity++; dents = malloc(sizeof(struct shim_dentry) * capacity);
Only partially related to the PR, but: All the data we work here on may be (I think) externally controlled, e.g. in chroot mounts. So, the attacker may theoretically pretend to have tons of subdirs and overflow this multiplication. Probably impossible on 64-bit, but on 32-bit it's harder to be sure. In general, I think all our dentry-related code isn't careful enough...
Anyways, this sounds theoretical, but I think this is actually triggerable on 64-bit - due to the quality of Graphene legacy code, shim_dentry::nchildren
is int
, and on 64-bit systems it should be possible to overflow it :)
LibOS/shim/src/fs/chroot/fs.c, line 273 at r1 (raw file):
/* DEP 3/18/17: If we have a directory, we need to find out how many * children it has by hand. */ /* XXX: Keep coherent with rmdir/mkdir/creat, etc */
I don't get both of these comments, are they still valid?
LibOS/shim/src/fs/chroot/fs.c, line 828 at r1 (raw file):
/* By the PAL convention, if a name ends with '/', it is a directory. However, we ignore * that distinction here and pass the name without '/' to the callback. */ if (buf[end - 1] == '/')
This whole loop assumes that empty names are impossible. Nothing bad happens if they aren't, but I'd feel safer if this was handled better.
LibOS/shim/src/fs/proc/ipc-thread.c, line 283 at r1 (raw file):
IDTYPE pid = status->status[i].pid; char name[10]; snprintf(name, sizeof(name), "%u", pid);
I think %u
may print up to 10 characters + null byte. Weird that no compiler warning was triggered...
LibOS/shim/src/fs/proc/thread.c, line 249 at r1 (raw file):
for (int i = 0; i < handle_map->fd_size; i++) if (handle_map->map[i] && handle_map->map[i]->handle) { char name[10];
ditto
LibOS/shim/src/fs/proc/thread.c, line 689 at r1 (raw file):
IDTYPE pid = thread->tid; char name[10];
ditto
LibOS/shim/src/sys/shim_open.c, line 303 at r1 (raw file):
return -EBADF; int ret = -EACCES;
Why do we set it here to this particular error? It may get overridden by ret = list_directory_handle(hdl)
and nothing reads it before.
LibOS/shim/src/sys/shim_open.c, line 352 at r1 (raw file):
struct linux_dirent64* ent = (struct linux_dirent64*)(buf + buf_pos); memset(ent, 0, ent_size);
This is a bit obscure, please add a comment that we're setting the terminating null-byte for ent->d_name
here.
LibOS/shim/src/sys/shim_open.c, line 361 at r1 (raw file):
} else { ent_size = ALIGN_UP( sizeof(struct linux_dirent) + sizeof(struct linux_dirent_tail) + name_len,
No +1 for the terminating null byte? I guess it's obscured somewhere in this linux_dirent_tail
? Why do we even need it? I guess the non-64 bit format is that weird?
LibOS/shim/src/sys/shim_open.c, line 383 at r1 (raw file):
} ret = buf_pos;
We're trimming this into int
, which may overflow it, and I think this is externally-triggerable. So, I think we should guard against trying to return something bigger than INT_MAX
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 16 files reviewed, 28 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
a discussion (no related file):
Thanks! Looks like adding directory handling to lseek will be trivial after this PR -- we can simply set dir_handle.pos to the requested value.
Almost... this still caches the directory contents, and the user doing lseek
expects to see the new files, so you probably need to also refresh (clear_directory_handle
).
LibOS/shim/include/shim_fs.h, line 455 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
initializes ... for a structure
? This doesn't sound like correct English.
Reworded.
LibOS/shim/include/shim_fs.h, line 457 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why
only called once
? What if there is a file created in this directory by Graphene at runtime? If this is not supported, we should add a TODO/explanation.Or maybe it doesn't
list
the directory (as the name suggests), but only inits some metadata? Then I suggest a rename of this function.
I changed to populate_directory_handle
/ clear_directory_handle
, both are no-op if the handle is already in desired state.
LibOS/shim/src/bookkeep/shim_handle.c, line 685 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
@pwmarcz: When fixing this, could you add a test which would be able to detect this bug? Seems this was completely untested?
Done. I modified the getdents
test so that it checks what happens with a handle across fork.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 227 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
The description is wrong now, there is no buffer.
Updated.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 231 at r1 (raw file):
the callback is typically (always?) used to create a temporary list of child entries, and arg is the pointer to this list.
Currently, I also use it in one place to just count the entries. And a previous implementation called lookup_dentry()
on each name, but that's problematic if we're already inside readdir
.
My point: this is a generic "for each" mechanism, and I believe a void*
argument is a common pattern in C for these, to use for any kind of necessary context. We already do the same for walk_thread_list
.
I fleshed out the description for readdir
. I couldn't think of an example short enough to be worth including, so I explained that arg
can be used to pass additional data, for example a list.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 234 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
There is no
dirent
, now you have a callback.
Fixed.
LibOS/shim/src/fs/shim_namei.c, line 528 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
its names
Rather its subfiles names?
Reworded.
LibOS/shim/src/fs/shim_namei.c, line 560 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
list_directory
I think it's not really listing it, from the API point of view? It just verifies if all the dentries are in place, without returning the entries. So, I'd reflect this in the name somehow (assuming my understanding is correct).
Renamed to populate_directory
.
LibOS/shim/src/fs/shim_namei.c, line 579 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why do we need to double-check what
readdir
returned? What is the case whenreaddir
lies to us and this additional loop over children detects?
readdir
gives us only a list of names, here we call the underlying lookup to create a dentry if necessary. This is to make underlying filesystem implementation easier: it doesn't need to construct the same information twice.
(It's also less efficient but given the state of FS code I really want to simplify it as much as I can first).
LibOS/shim/src/fs/shim_namei.c, line 579 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
You have a redefinition of
int ret
here. Not important though.
Fixed.
LibOS/shim/src/fs/shim_namei.c, line 582 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Should this actually be an error? Sometimes we have a dir with unreadable files. I assume that the idea here is that they still should be possible to be "looked up"? (yup, I'm not sure when exactly
lookup_dentry()
may fail with-EACCES
, so I'm confused about what is actually checked here)
Whoops, I already had a comment on master about exactly that, and deleted it during refactoring. Added again.
(In short, -EACCES
might happen if there is a symlink on host, and it points to an inaccessible file).
LibOS/shim/src/fs/shim_namei.c, line 601 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why do we have this assert? I still don't understand why we do this only once. I would expect
if (dirhdl->dents) return 0
instead.
Done: this function can now be called without checking dirhdl->dents
.
LibOS/shim/src/fs/shim_namei.c, line 608 at r1 (raw file):
This looks brittle. The caller first grabs the lock over
hdl
, and only then we grab the global lockg_dcache_lock
. Shouldn't it be the other way around?
I'm not sure if we ever did that before, but yeah, it makes sense.
Or at least shouldn't both these operations happen in the caller?
I think that's better, at least for now.
LibOS/shim/src/fs/shim_namei.c, line 611 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
When exactly can this happen? (that we lack some dentries)
You mean when list_directory
can fail? The underlying readdir
or lookup
on a dentry might fail unexpectedly, an intermediate allocation for name can fail with -ENOMEM
...
LibOS/shim/src/fs/shim_namei.c, line 613 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Maybe add a quick comment that
+1
is.
?
Done.
LibOS/shim/src/fs/shim_namei.c, line 614 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Actually, how is this possible? On Linux even
/
has a parent (itself) and the corresponding..
entry.
Oh, I did not know that! And apparently neither did whoever wrote the previous code with dot
and dotdot
pointers.
I now hardcode 2 first dentries to be .
and ..
, I think it's still better than storing the name together with the dentries.
LibOS/shim/src/fs/shim_namei.c, line 617 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Only partially related to the PR, but: All the data we work here on may be (I think) externally controlled, e.g. in chroot mounts. So, the attacker may theoretically pretend to have tons of subdirs and overflow this multiplication. Probably impossible on 64-bit, but on 32-bit it's harder to be sure. In general, I think all our dentry-related code isn't careful enough...
Anyways, this sounds theoretical, but I think this is actually triggerable on 64-bit - due to the quality of Graphene legacy code,
shim_dentry::nchildren
isint
, and on 64-bit systems it should be possible to overflow it :)
What would be the best way of mitigating that? Change all the operations here (+2
, *capacity
) to overflow versions? Or maybe put a limit on the number of children?
LibOS/shim/src/fs/shim_namei.c, line 627 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This assert is trivial, I suggest to remove it. Or do you do this just for readability?
Removed in the new version.
LibOS/shim/src/fs/chroot/fs.c, line 273 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I don't get both of these comments, are they still valid?
- "If we have a directory, we need to find out how many children it has by hand." - Indeed, we do that, we cannot just count dentries because there might be files that we haven't looked up.
- "Keep coherent with rmdir/mkdir" - I believe the idea was to not list the directory every time, but calculate the
nlink
once and then increase/decrease it as we add or remove nodes. It looks pretty bad to list the directory every time someone doesstat
. - Also, this does not take into account all "synthetic" entries: the ones created during mount, but also Unix sockets and named pipes.
I can rewrite these comments, but more importantly: it's a pain to report accurate nlink
. Maybe we don't need to do that? Do you know if applications might do something with it?
LibOS/shim/src/fs/chroot/fs.c, line 828 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This whole loop assumes that empty names are impossible. Nothing bad happens if they aren't, but I'd feel safer if this was handled better.
I added a check here.
I thought also about adding an assert
and enforcing that in PAL (SGX PAL should probably examine the data from host), but since the PAL implementation is per host, I think this is more solid.
LibOS/shim/src/fs/proc/ipc-thread.c, line 283 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I think
%u
may print up to 10 characters + null byte. Weird that no compiler warning was triggered...
Yeah, it's pretty weird. It seems that somehow a warning triggers here only on the lower bound (char name[1]
produces an error, as does a format like "xxxxxxxxxx%u"
). I have no idea why...
Anyway, changed all these to 11 characters.
LibOS/shim/src/fs/proc/thread.c, line 249 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto
Done.
LibOS/shim/src/fs/proc/thread.c, line 689 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto
Done.
LibOS/shim/src/sys/shim_open.c, line 303 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Why do we set it here to this particular error? It may get overridden by
ret = list_directory_handle(hdl)
and nothing reads it before.
Looks like this got carried over from previous version. Removed.
LibOS/shim/src/sys/shim_open.c, line 347 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Shouldn't it be
alignof(struct linux_dirent64)
?
Yes. Fixed.
LibOS/shim/src/sys/shim_open.c, line 352 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This is a bit obscure, please add a comment that we're setting the terminating null-byte for
ent->d_name
here.
Done.
LibOS/shim/src/sys/shim_open.c, line 361 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
No +1 for the terminating null byte? I guess it's obscured somewhere in this
linux_dirent_tail
? Why do we even need it? I guess the non-64 bit format is that weird?
Yes, this is what the comment in previous version suggested: linux_dirent_tail
starts with a zero padding byte.
struct linux_dirent {
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
}
On the other hand, maybe just adding +1
for clarity would be better?
LibOS/shim/src/sys/shim_open.c, line 383 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
We're trimming this into
int
, which may overflow it, and I think this is externally-triggerable. So, I think we should guard against trying to return something bigger thanINT_MAX
.
So actually getdents64
should return ssize_t
, I had a broken man page.
I added a check for the return value, does that look good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r2.
Reviewable status: all files reviewed, 21 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @mkow and @pwmarcz)
LibOS/shim/include/shim_fs.h, line 193 at r2 (raw file):
* * \param dentry the dentry, must be valid, non-negative and describing a directory * \param callback the callback to invoke on each
on each file name
LibOS/shim/include/shim_fs.h, line 197 at r2 (raw file):
* * Calls `callback(name, arg)` for all file names in the directory. `name` is not guaranteed to * be valid after callback returns, so the callback should copy it necessary.
... it if necessary
(forgot if
)
LibOS/shim/include/shim_fs.h, line 482 at r2 (raw file):
* \param hdl a directory handle * * This functions discards an array of dentries previously prepared by `populate_directory_handle`.
function
LibOS/shim/src/fs/shim_fs_pseudo.c, line 231 at r1 (raw file):
My point: this is a generic "for each" mechanism, and I believe a void* argument is a common pattern in C for these
Yes, I understand this. I didn't have a problem with understanding "for each" logic; I had a problem with "why so complex and generic at all". I'm fine with the current comments, so resolving this.
LibOS/shim/src/fs/shim_namei.c, line 579 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
readdir
gives us only a list of names, here we call the underlying lookup to create a dentry if necessary. This is to make underlying filesystem implementation easier: it doesn't need to construct the same information twice.(It's also less efficient but given the state of FS code I really want to simplify it as much as I can first).
OK, thanks for explanation!
LibOS/shim/src/fs/shim_namei.c, line 617 at r1 (raw file):
Or maybe put a limit on the number of children?
This solution sounds reasonable. I would do this.
LibOS/shim/src/fs/chroot/fs.c, line 273 at r1 (raw file):
Maybe we don't need to do that? Do you know if applications might do something with it?
IIRC, we haven't had any issues with any applications that would complain/be confused about nlink
. So I'd say let's just ignore accurate reporting of nlink.
LibOS/shim/test/regression/getdents.c, line 36 at r2 (raw file):
int count = syscall(SYS_getdents, fd, buf, buf_size); if (count < 0) { fprintf(stderr, "%s: %s\n", name, strerror(errno));
Why not perror()
?
LibOS/shim/test/regression/getdents.c, line 41 at r2 (raw file):
for (size_t offs = 0; offs < count;) { struct linux_dirent* d32 = (struct linux_dirent*)(buf + offs); char d_type = *(buf + offs + d32->d_reclen - 1);
This indentation looks ugly, please unindent.
LibOS/shim/test/regression/getdents.c, line 155 at r2 (raw file):
* * Note that we currently do not synchronize handle state between parent and child in Graphene, * so both calls after fork will return the same entries.
Is this comment valid? I thought that this PR fixes this: the directory handle retains pos
, so even after fork, the child proceeds with where the parent stopped.
Also, doesn't it directly contradict the comment line above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r2.
Reviewable status: all files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @pwmarcz)
LibOS/shim/src/bookkeep/shim_handle.c, line 685 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
Done. I modified the
getdents
test so that it checks what happens with a handle across fork.
Thanks!
LibOS/shim/src/bookkeep/shim_handle.c, line 681 at r2 (raw file):
so that `getdents/getdents64` will resume from the same place.
What's the semantics of this if the contents changed in the meantime? (I mean, on Linux)
LibOS/shim/src/fs/shim_namei.c, line 617 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Or maybe put a limit on the number of children?
This solution sounds reasonable. I would do this.
Yup, seems the easiest to cap it at e.g. 1 million or something similar. But this nchildren
type should be fixed regardless of this capping.
LibOS/shim/src/fs/chroot/fs.c, line 273 at r1 (raw file):
I can rewrite these comments
That would be helpful :)
Ad nlink
I think we can ignore it, but please reflect this problem in the comment.
LibOS/shim/src/fs/chroot/fs.c, line 828 at r2 (raw file):
if (end == start) { log_warning("chroot_readdir: empty name returned from PAL\n"); ret = -EINVAL;
I'd die_or_inf_loop()
here, as this should be only possible when the PAL is bugged or the host is attacking the enclave.
LibOS/shim/src/sys/shim_open.c, line 361 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
Yes, this is what the comment in previous version suggested:
linux_dirent_tail
starts with a zero padding byte.struct linux_dirent { unsigned long d_ino; /* Inode number */ unsigned long d_off; /* Offset to next linux_dirent */ unsigned short d_reclen; /* Length of this linux_dirent */ char d_name[]; /* Filename (null-terminated) */ /* length is actually (d_reclen - 2 - offsetof(struct linux_dirent, d_name)) */ /* char pad; // Zero padding byte char d_type; // File type (only since Linux // 2.6.4); offset is (d_reclen - 1) */ }
On the other hand, maybe just adding
+1
for clarity would be better?
I'm fine with the current, commented version. Thanks!
LibOS/shim/src/sys/shim_open.c, line 383 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
So actually
getdents64
should returnssize_t
, I had a broken man page.I added a check for the return value, does that look good?
Yup, looks good, although it silently assumes that ssize_t
is not smaller than long
(because of the type of ret
). You can add a static assert for this if you want.
LibOS/shim/test/regression/getdents.c, line 152 at r2 (raw file):
* with a buffer size small enough so that all three calls return something.
I don't get this part, didn't you mean "big enough"? Even if, I don't understand it... Won't the calls always return something > 0 for the first invocation, if the buffer is big enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 12 of 17 files reviewed, 11 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
LibOS/shim/include/shim_fs.h, line 193 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
on each file name
Done.
LibOS/shim/include/shim_fs.h, line 197 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
... it if necessary
(forgotif
)
Done.
LibOS/shim/include/shim_fs.h, line 482 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
function
I must have been tired... Fixed.
LibOS/shim/src/bookkeep/shim_handle.c, line 681 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
so that `getdents/getdents64` will resume from the same place.
What's the semantics of this if the contents changed in the meantime? (I mean, on Linux)
On Linux? I'm not sure of the exact implementation, but experimentally, subsequent getdents
calls continue from the same file, but iterate over an updated directory if we update it in the meantime.
Here, because we only store a position and the child recreates the listing, unfortunately we might jump over or repeat an entry...
LibOS/shim/src/fs/shim_fs_pseudo.c, line 231 at r1 (raw file):
I didn't have a problem with understanding "for each" logic; I had a problem with "why so complex and generic at all".
I misunderstood then, sorry. I hope my other comments explain the choice.
(Now that I think of it, I could also make readdir
return a list of names, and provide a set of functions to construct this list... but that seems pretty close to the callback solution, and would require more error handling from the filesystems).
LibOS/shim/src/fs/shim_namei.c, line 617 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Yup, seems the easiest to cap it at e.g. 1 million or something similar. But this
nchildren
type should be fixed regardless of this capping.
I converted nchildren
to size_t
, and added a limit of 1 million.
LibOS/shim/src/fs/chroot/fs.c, line 273 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I can rewrite these comments
That would be helpful :)
Ad
nlink
I think we can ignore it, but please reflect this problem in the comment.
I think some code uses nlink
to determine number of files in a directory. sys/fs.c
mentions hwloc, see here).
So we might have to revisit this, but let's try to hardcode nlink
in here for now.
LibOS/shim/src/fs/chroot/fs.c, line 828 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I'd
die_or_inf_loop()
here, as this should be only possible when the PAL is bugged or the host is attacking the enclave.
OK. I'm changing the message to error, then - it would be bad for Graphene to abort without any explanation.
LibOS/shim/src/sys/shim_open.c, line 383 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Yup, looks good, although it silently assumes that
ssize_t
is not smaller thanlong
(because of the type ofret
). You can add a static assert for this if you want.
I added a static assert.
LibOS/shim/test/regression/getdents.c, line 36 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Why not
perror()
?
perror()
doesn't do format strings, and I wanted to parametrize by name
here... but you're right, I can just do perror(name)
. Fixed.
LibOS/shim/test/regression/getdents.c, line 41 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This indentation looks ugly, please unindent.
Done.
LibOS/shim/test/regression/getdents.c, line 152 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
* with a buffer size small enough so that all three calls return something.
I don't get this part, didn't you mean "big enough"? Even if, I don't understand it... Won't the calls always return something > 0 for the first invocation, if the buffer is big enough?
If the buffer is too big, then the first call (the one before fork) will retrieve all the entries, and the calls after the fork will return 0.
LibOS/shim/test/regression/getdents.c, line 155 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Is this comment valid? I thought that this PR fixes this: the directory handle retains
pos
, so even after fork, the child proceeds with where the parent stopped.Also, doesn't it directly contradict the comment line above?
We retain pos
after fork, but don't synchronize it between parent and child after that. Under Linux, the two calls after fork
get subsequent entries because the position is shared between the processes, Graphene has two independent handles.
Running this example under Linux displays:
getdents64 before fork: dir3 [0x4]
getdents64 before fork: file1 [0x8]
parent getdents64: file2 [0x8]
parent getdents64: .. [0x4]
child getdents64: . [0x4]
Under Graphene:
getdents64 before fork: . [0x4]
getdents64 before fork: .. [0x4]
child getdents64: file2 [0x8]
child getdents64: file1 [0x8]
parent getdents64: file1 [0x8]
parent getdents64: file2 [0x8]
I'm not sure how to make this comment clearer, but I can reword if you have an idea how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r3.
Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @pwmarcz)
LibOS/shim/src/bookkeep/shim_handle.c, line 681 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
On Linux? I'm not sure of the exact implementation, but experimentally, subsequent
getdents
calls continue from the same file, but iterate over an updated directory if we update it in the meantime.Here, because we only store a position and the child recreates the listing, unfortunately we might jump over or repeat an entry...
@boryspoplawski Do you know maybe if Linux has any guarantees in this situation? I remember we discussed this long time ago, but I'm not sure which of these were actually documented/guaranteed anywhere.
LibOS/shim/src/fs/shim_dcache.c, line 172 at r3 (raw file):
if (!qstrsetstr(&dent->name, name, name_len)) { free_dentry(dent);
Is this call correct? It does qstrfree(&dent->name)
, on a qstr which failed to be allocated, and I don't think qstrsetstr()
has any guarantees about the qstr
contents after a failure.
LibOS/shim/src/fs/shim_dcache.c, line 178 at r3 (raw file):
if (parent->nchildren >= DENTRY_MAX_CHILDREN) { log_warning("get_new_dentry: nchildren limit reached\n"); free_dentry(dent);
ditto - dent
is only partially initialized here, is this really correct to call a destructor on it?
LibOS/shim/src/fs/shim_fs_pseudo.c, line 231 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
I didn't have a problem with understanding "for each" logic; I had a problem with "why so complex and generic at all".
I misunderstood then, sorry. I hope my other comments explain the choice.
(Now that I think of it, I could also make
readdir
return a list of names, and provide a set of functions to construct this list... but that seems pretty close to the callback solution, and would require more error handling from the filesystems).
I like the current solution, I don't think there's any simpler/better one we could have.
LibOS/shim/test/regression/getdents.c, line 152 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
If the buffer is too big, then the first call (the one before fork) will retrieve all the entries, and the calls after the fork will return 0.
Aaaah. Could you add a parenthesis explaining that briefly?
LibOS/shim/test/regression/getdents.c, line 155 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
We retain
pos
after fork, but don't synchronize it between parent and child after that. Under Linux, the two calls afterfork
get subsequent entries because the position is shared between the processes, Graphene has two independent handles.Running this example under Linux displays:
getdents64 before fork: dir3 [0x4] getdents64 before fork: file1 [0x8] parent getdents64: file2 [0x8] parent getdents64: .. [0x4] child getdents64: . [0x4]
Under Graphene:
getdents64 before fork: . [0x4] getdents64 before fork: .. [0x4] child getdents64: file2 [0x8] child getdents64: file1 [0x8] parent getdents64: file1 [0x8] parent getdents64: file2 [0x8]
I'm not sure how to make this comment clearer, but I can reword if you have an idea how.
@boryspoplawski please also take a look at this ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 17 files reviewed, 9 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, and @mkow)
LibOS/shim/src/fs/shim_dcache.c, line 172 at r3 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Is this call correct? It does
qstrfree(&dent->name)
, on a qstr which failed to be allocated, and I don't thinkqstrsetstr()
has any guarantees about theqstr
contents after a failure.
Looking at qstrsetstr()
and qstrfree()
, I think it's correct: qstrsetstr
can fail to allocate the oflow
field, but then qstrfree
will not free any memory. It would be a pretty serious bug if qstrsetstr
left the string in an inconsistent state from which you cannot even clean up...
LibOS/shim/src/fs/shim_dcache.c, line 178 at r3 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto -
dent
is only partially initialized here, is this really correct to call a destructor on it?
Yes. I think the idea was that alloc_dentry
performs the initialization, i.e. ensures the object is in a consistent state, after which you can call free_dentry
on it. This function (get_new_dentry
) performs additional setup, i.e. installs the newly created dentry in the dentry cache.
LibOS/shim/test/regression/getdents.c, line 152 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Aaaah. Could you add a parenthesis explaining that briefly?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r3, 1 of 1 files at r4.
Reviewable status: all files reviewed, 3 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @mkow)
LibOS/shim/test/regression/getdents.c, line 155 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
@boryspoplawski please also take a look at this ^
OK, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 12 files at r2, 1 of 5 files at r3.
Reviewable status: all files reviewed, 3 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @mkow)
LibOS/shim/src/bookkeep/shim_handle.c, line 681 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
@boryspoplawski Do you know maybe if Linux has any guarantees in this situation? I remember we discussed this long time ago, but I'm not sure which of these were actually documented/guaranteed anywhere.
This depends on the actual filesystem used, afaik there are no general guarantees. E.g. ext4 can return already deleted entries, because caching and what not.
LibOS/shim/test/regression/getdents.c, line 155 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
OK, thanks.
Yup, handle is shared on Linux, hence the position too. I guess that will be the case in Graphene, once we have the sync code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners
LibOS/shim/src/fs/shim_dcache.c, line 172 at r3 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
Looking at
qstrsetstr()
andqstrfree()
, I think it's correct:qstrsetstr
can fail to allocate theoflow
field, but thenqstrfree
will not free any memory. It would be a pretty serious bug ifqstrsetstr
left the string in an inconsistent state from which you cannot even clean up...
Hmm, I think it depends. qstrsetstr()
actually leaves it uninitialized (or rather just unchanged) in case of an error, not in any particular state, but this function seems to also be a "constructor"? And alloc_dentry()
doesn't "construct" this structure anyhow, just zeroes the whole memory, so it kinda works by an accident?
Unblocking for now, this code looks really fishy, but it's more like these "qstrings" being broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners
LibOS/shim/src/fs/shim_dcache.c, line 172 at r3 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Hmm, I think it depends.
qstrsetstr()
actually leaves it uninitialized (or rather just unchanged) in case of an error, not in any particular state, but this function seems to also be a "constructor"? Andalloc_dentry()
doesn't "construct" this structure anyhow, just zeroes the whole memory, so it kinda works by an accident?
Unblocking for now, this code looks really fishy, but it's more like these "qstrings" being broken.
We're getting close to removing them :)
22f2a9d
to
491d017
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r5.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: ITL)
LibOS/shim/src/fs/shim_dcache.c, line 172 at r3 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
We're getting close to removing them :)
Yay!
This changes the filesystem `readdir` operation so that it only needs to call a callback. This is a big simplification for underlying filesystems: a lot of code was responsible for dynamically allocating the right amount of memory. Additionally, the only piece of information required from `readdir` is a name, not file type or inode number. We perform a lookup on the underlying dentries anyway, so this prevents code duplication. The directory handle stores an array of dentries, along with size and position in that array, instead of special `dot` and `dotdot` pointers. This simplifies the implementation of `getdents/getdents64` system calls, and also should allow for easier implementation of `lseek` on directory handles. Signed-off-by: Paweł Marczewski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r5.
Reviewable status: complete! all files reviewed, all discussions resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r5.
Reviewable status: complete! all files reviewed, all discussions resolved
Description of the changes
This changes the filesystem
readdir
operation so that it only needs to call a callback. This is a big simplification for underlying filesystems: a lot of code was responsible for dynamically allocating the right amount of memory.Additionally, the only piece of information required from
readdir
is a name, not file type or inode number. We perform a lookup on the underlying dentries anyway, so this prevents code duplication.The directory handle stores an array of dentries, along with size and position in that array, instead of special
dot
anddotdot
pointers. This simplifies the implementation ofgetdents/getdents64
system calls, and also should allow for easier implementation oflseek
on directory handles.(@dimakuv you asked about this recently - this commit does not implement
lseek
, but it makes it relatively simple to do so)How to test this PR?
Existing tests (regression and fs) should be enough here.
This change is