-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LibOS] Convert pseudo
filesystem to use inodes
#402
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL)
LibOS/shim/include/shim_fs_pseudo.h, line 69 at r1 (raw file):
#define PSEUDO_PERM_FILE_RW PERM_rw_rw_rw_ /* Maximum count of created nodes (`pseudo_add_*` calls) */
FYI: The current number is something around 70.
LibOS/shim/include/shim_fs_pseudo.h, line 79 at r1 (raw file):
* The node can then be further customized by directly modifying other fields. * * NOTE: We assume that all Gramine processes within a single run create exactly the same set of
FYI: The exact pseudo nodes created depends on filesystems that we initialize. Currently, /sys
is initialized depending on manifest, and /dev/attestation
depending on whether we're running under SGX.
A safer way would be to use paths for checkpointing, e.g. /dev/null
or /proc/!pid/cmdline
; note that some pseudo-nodes like !pid
would need an artificial name). But it's more complicated, and IMO not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)
LibOS/shim/include/shim_fs_pseudo.h, line 79 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
FYI: The exact pseudo nodes created depends on filesystems that we initialize. Currently,
/sys
is initialized depending on manifest, and/dev/attestation
depending on whether we're running under SGX.A safer way would be to use paths for checkpointing, e.g.
/dev/null
or/proc/!pid/cmdline
; note that some pseudo-nodes like!pid
would need an artificial name). But it's more complicated, and IMO not worth it.
Hmm, how does this work with pseudo nodes created dynamically, like e.g. /proc/<pid>/fd/*
? I think we don't create them right now, but this implementation seems to disallow such a thing to exist?
LibOS/shim/src/fs/shim_fs_pseudo.c, line 213 at r2 (raw file):
Note that we might not be holding `g_dcache_lock` here, so we cannot call `pseudo_readdir`.
Is it still safe to traverse the children list without any locks? I guess you're assuming here that the pseudo directories can't change in runtime?
LibOS/shim/src/fs/shim_fs_pseudo.c, line 503 at r2 (raw file):
g_pseudo_node_count++
Shouldn't this be atomic? Or you're also assuming here that we create pseudo nodes only during initialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @mkow)
LibOS/shim/include/shim_fs_pseudo.h, line 79 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Hmm, how does this work with pseudo nodes created dynamically, like e.g.
/proc/<pid>/fd/*
? I think we don't create them right now, but this implementation seems to disallow such a thing to exist?
That's right. The list of pseudo-nodes is static. This checkpointing scheme also assumes that: otherwise you couldn't easily send a handle to a child process and expect it to find the same node.
However, these nodes can generate a dynamic list of files. We do have /proc/<pid>/fd/<n>
, and it is a single node. The name_exists
/ list_names
callbacks define the file names (e.g. 0
, 1
, 2
...).
LibOS/shim/src/fs/shim_fs_pseudo.c, line 213 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Note that we might not be holding `g_dcache_lock` here, so we cannot call `pseudo_readdir`.
Is it still safe to traverse the children list without any locks? I guess you're assuming here that the pseudo directories can't change in runtime?
Yes, it cannot change in runtime.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 503 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
g_pseudo_node_count++
Shouldn't this be atomic? Or you're also assuming here that we create pseudo nodes only during initialization?
Yes. This is an assumption this module always made. Notice that there are no locks taken for adding the new node to linked lists below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @mkow and @pwmarcz)
a discussion (no related file):
I'm confused as to what happens in e.g. this case:
- Parent process creates a child process and opens
/proc/<child-pid>/exe
. - Child process terminates.
- Parent process creates another child process.
- New child process gets a checkpoint with the inode for
/proc/<child-pid>/exe
and the opened shim handle. - Child process reads from the opened shim handle:
- In the old code,
pseudo_read()
would return-ENOENT
because it would try topseudo_find
and fail to find that<child-pid>
- In your new code,
pseudo_read()
would return the string that was memorized by the parent process inhdl->info.str.mem
.
- In the old code,
Is my understanding correct? Also, is this how the Linux kernel does it?
In other words, this PR fixes the problem of "dangling shim handles" -- the handles' inodes (and corresponding pseudo_node
s) are now correctly checkpointed and "survive" in the child process.
LibOS/shim/include/shim_fs_pseudo.h, line 79 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
That's right. The list of pseudo-nodes is static. This checkpointing scheme also assumes that: otherwise you couldn't easily send a handle to a child process and expect it to find the same node.
However, these nodes can generate a dynamic list of files. We do have
/proc/<pid>/fd/<n>
, and it is a single node. Thename_exists
/list_names
callbacks define the file names (e.g.0
,1
,2
...).
The assumption sounds reasonable. I cannot come up with a hypothetical case when we would diverge a list of pseudo nodes we create in the child process...
LibOS/shim/src/fs/shim_fs_pseudo.c, line 22 at r2 (raw file):
static struct pseudo_node* g_pseudo_nodes[PSEUDO_MAX_NODES]; static unsigned int g_pseudo_node_count;
Maybe add a comment here that g_pseudo_nodes
is used only in two cases:
- Filled during each-process initialization (mounting of pseudo-FSes), assumed to be deterministic in all processes of the same Gramine instance
- Queried during checkpoint-restore in the child process, to bind inodes to
pseudo_node
s created in step (1)
Maybe also add a node that in the parent process, this is basically not used at all.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 46 at r2 (raw file):
* Note that `pseudo_find` might fail for such a checkpointed dentry: for instance, we might have a * dentry for `/proc/<pid>` where the process does not exist anymore. Ideally, we would invalidate * such dentries; for now, all operations on them will return -ENOENT.
"Ideally, we would invalidate such dentries" -- we don't do it even in the new PR, right?
LibOS/shim/src/fs/shim_fs_pseudo.c, line 64 at r2 (raw file):
static int pseudo_icheckpoint(struct shim_inode* inode, void** out_data, size_t* out_size) { unsigned int* id = malloc(sizeof(*id));
Do we just memory leak this object? How is it supposed to be freed / garbage collected?
Ah, I guess it will be freed in the generic checkpoint code (or I guess in the inode-checkpoint code), after it sends all out_data
to the child process. Is it so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
a discussion (no related file):
I think that's correct.
Also, is this how the Linux kernel does it?
I haven't looked too closely. I think there is a similar layer of abstraction to our "str" files, so I would expect the value to be memoized.
LibOS/shim/include/shim_fs_pseudo.h, line 79 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
The assumption sounds reasonable. I cannot come up with a hypothetical case when we would diverge a list of pseudo nodes we create in the child process...
Mostly I'd be afraid of doing this by mistake - so far, the initialization has some conditions, but they at most depend on the manifest.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 22 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Maybe add a comment here that
g_pseudo_nodes
is used only in two cases:
- Filled during each-process initialization (mounting of pseudo-FSes), assumed to be deterministic in all processes of the same Gramine instance
- Queried during checkpoint-restore in the child process, to bind inodes to
pseudo_node
s created in step (1)Maybe also add a node that in the parent process, this is basically not used at all.
I added a comment, explaining what this array contains and how we use it. The assumption is explained in the header file as well, but maybe it doesn't hurt to repeat it here.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 46 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
"Ideally, we would invalidate such dentries" -- we don't do it even in the new PR, right?
Right. However, the rest of the paragraph does not apply to pseudo_find
(we don't even call it for dentries restored from a checkpoint). And the fact that we don't invalidate no-longer-existing files is documented in the header file already.
By the way, this is similar to the problem we discussed recently: files disappearing on the host, or modified by another process. The solution is also to mark these files (dentries) as invalid once we notice that. But this should be less problematic after inodes are introduced everywhere, and dentries are just a name-to-inode association.
LibOS/shim/src/fs/shim_fs_pseudo.c, line 64 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Do we just memory leak this object? How is it supposed to be freed / garbage collected?
Ah, I guess it will be freed in the generic checkpoint code (or I guess in the inode-checkpoint code), after it sends all
out_data
to the child process. Is it so?
Yes, it will be freed by the caller, and this is explained in the documentation for icheckpoint
.
This interface is a bit awkward - as explained in the previous PR (#297), it is done this way so that it's self-contained. If/when we rewrite the checkpoint system, it will probably make more sense for icheckpoint/irestore
to send/receive data directly "over the wire".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)
a discussion (no related file):
Previously, pwmarcz (Paweł Marczewski) wrote…
I think that's correct.
Also, is this how the Linux kernel does it?
I haven't looked too closely. I think there is a similar layer of abstraction to our "str" files, so I would expect the value to be memoized.
And, this is speculation, but I would expect Linux to do what I'm planning to do later (see the comment about invalidating files): keep a handle associated with an inode object, even though that inode object is no longer reachable by other means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners (waiting on @mkow)
LibOS/shim/src/fs/shim_fs_pseudo.c, line 46 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
Right. However, the rest of the paragraph does not apply to
pseudo_find
(we don't even call it for dentries restored from a checkpoint). And the fact that we don't invalidate no-longer-existing files is documented in the header file already.By the way, this is similar to the problem we discussed recently: files disappearing on the host, or modified by another process. The solution is also to mark these files (dentries) as invalid once we notice that. But this should be less problematic after inodes are introduced everywhere, and dentries are just a name-to-inode association.
Agreed, thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners
LibOS/shim/include/shim_fs_pseudo.h, line 79 at r1 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
Mostly I'd be afraid of doing this by mistake - so far, the initialization has some conditions, but they at most depend on the manifest.
Fair. I feel a bit uneasy about the fact that we can't assert that the trees are exactly the same in every process, but I don't have an idea for a simple solution to this.
In addition, `pseudo_node` objects are now tracked by number and checkpointed. Signed-off-by: Paweł Marczewski <[email protected]>
4435774
to
b571f2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved
Description of the changes
In addition,
pseudo_node
objects are now tracked by number and checkpointed.How to test this PR?
This should have a pretty good coverage on Jenkins.
This change is