-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add support for copying up directories into tmpfs when a tmpfs is mounted over them #845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| extFlags |= f.flag | ||
| } | ||
| } else { | ||
| data = append(data, o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure there are no mount data or fstypes that take a copyup option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one here http://linux.die.net/man/1/funionfs.
WDYT about calling it runc_copyup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there something shorter? Maybe runccopyup rcopyup rccopyup tmpcopyup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like tmpcopyup. I'll push updated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no implementation of -o delete and -o copyup for now.
😉
But I think we should have a nicer way of doing this than adding to a flag list that might eventually clash with the kernel. Maybe we should add an extension field to mounts? Or would that be overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyphar I think we can revisit when we have more extensions 😉
|
Pushed update to parse it as |
libcontainer/rootfs_linux.go
Outdated
| } | ||
| } | ||
| if copyUp { | ||
| tmpDir, err = ioutil.TempDir("/run", "runctmpdir") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using /run here? Shouldn't this be under the machines configured temp dir and not fillup space on /run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move it to /tmp
|
Why don't we do it like this:
That way, we don't have to do any double-copying. In fact, we could just use |
|
@mrunalp Also (and I know this sounds like a trivial thing), but please put your |
|
The tmpdir should probably look for environment variables Something like checking for the existence of |
This would no longer be true if we decide to change the mode to something like |
|
Yes, also if this is in a different mount namespace it could be hidden from other users. |
|
@cyphar With your suggested approach, I suspect that tmpfs won't be charged to cgroup of the container if bind mounted (I'll take a look anyway). Though the double copy take more time, it doesn't change the semantics of how --tmpfs works today. |
|
@mrunalp We could use |
|
@cyphar Yeah, MS_MOVE should work. I'll try that. |
|
I pushed a new commit (I'll clean up the PR after review) that uses MS_MOVE. Also, added code for using XDG_RUNTIME_DIR as prefix if it is set in the env. PTAL. |
|
@mrunalp do you need to rebase? |
|
Rebased. |
libcontainer/rootfs_linux.go
Outdated
| if err := mountPropagate(m, rootfs, mountLabel); err != nil { | ||
| return err | ||
| if copyUp { | ||
| tmpDirPrefix := os.Getenv("XDG_RUNTIME_DIR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this would ever be set inside the container's init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can't leak it from outside with this code so was thinking whether setting in the config might be an acceptable compromise. Else we can special case leak it.
Sent from my iPhone
On May 27, 2016, at 3:15 PM, Michael Crosby [email protected] wrote:
In libcontainer/rootfs_linux.go:
stat, err := os.Stat(dest) if err != nil { if err := os.MkdirAll(dest, 0755); err != nil { return err } }
if err := mountPropagate(m, rootfs, mountLabel); err != nil {return errif copyUp { I don't see how this would ever be set inside the container's init.tmpDirPrefix := os.Getenv("XDG_RUNTIME_DIR")—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use the container's /tmp dir or some hidden dir inside the rootfs to make this. Since its temporary and is only there to make the initial tmpfs destination it can be anywhere or even back in your container's /run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to make this work with read only rootfs that we can't write to at all.
Sent from my iPhone
On May 27, 2016, at 3:36 PM, Michael Crosby [email protected] wrote:
In libcontainer/rootfs_linux.go:
stat, err := os.Stat(dest) if err != nil { if err := os.MkdirAll(dest, 0755); err != nil { return err } }
if err := mountPropagate(m, rootfs, mountLabel); err != nil {return errif copyUp { Just use the container's /tmp dir or some hidden dir inside the rootfs to make this. Since its temporary and is only there to make the initial tmpfs destination it can be anywhere or even back in your container's /run.tmpDirPrefix := os.Getenv("XDG_RUNTIME_DIR")—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't set it to readonly until after the mounts and after we pivot. Also you can just use /dev/ or /run to do this inside the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant read only even before we make it read only and I know we will need more changes to get there.
Could possibly use some other tmpfs mount if specified in the config before this mount.
Sent from my iPhone
On May 27, 2016, at 3:48 PM, Michael Crosby [email protected] wrote:
In libcontainer/rootfs_linux.go:
stat, err := os.Stat(dest) if err != nil { if err := os.MkdirAll(dest, 0755); err != nil { return err } }
if err := mountPropagate(m, rootfs, mountLabel); err != nil {return errif copyUp { We don't set it to readonly until after the mounts and after we pivot. Also you can just use /dev/ or /run to do this inside the container.tmpDirPrefix := os.Getenv("XDG_RUNTIME_DIR")—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably safe enough to just use /tmp with 0700 and a random directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed XDG_RUNTIME_DIR lookup and switched to using /tmp which works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrunalp I don't quite understand where we are here? You wrote that you removed lookup, but it's still here.
96507db to
7a8734d
Compare
|
@cyphar @crosbymichael PTAL |
|
@mrunalp There's a bug in The directories are owned by I've opened this PR: mrunalp/fileutils#1. Also, I would really like it if that repo had some reference to where the code came from as well as a maintainers file with @mrunalp as the sole maintainer. |
|
@mrunalp Since mrunalp/fileutils#1 was merged can you rebase with the vendor updated? |
|
@cyphar Updated. |
|
@opencontainers/runc-maintainers PTAL. All comments have been addressed. |
|
I really want to start pushing the concept of readonly containers, and this patch is critical to this. Being able to mount tmpfs over certain directories allows us to preserve the layout of a container image and still run the container in read-only mode. |
cyphar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to review this over the weekend. From a first look, this looks good.
| return nil | ||
| case "tmpfs": | ||
| copyUp := m.Extensions&configs.EXT_COPYUP == configs.EXT_COPYUP | ||
| tmpDir := "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you need it now. But you can use m.Destination in the second copyUp block. Doesn't really matter though it was just a thought.
|
Dammit. GitHub doesn't let me retract the review with an "abstain". I'll LGTM this explicitly in the comments because clearly GitLab is still beating GitHub in the "reviewing code changes" space. |
|
@mrunalp This looks really nice. My main concern with this mode is the foot-gunning is quite easy. For example, if you put a mount for Now, while it's true that you shouldn't be foot-gunning like this -- is there a nicer way we can handle these errors? If not, that's fine but it does make me a bit nervous wrt the bug reports we're going to get. I'm also currently testing user namespace interactions and will probably test with rootless containers. |
|
As expected, this doesn't play nicely with user namespace setups where there are unmapped UIDs in the thing you're copying over. Unfortunately, the only /nice/ way of handling this would be to do something exceptionally dodgy like: create all of the mounts until you hit a This issue will be come quite pertinent with rootless containers. |
cyphar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding an integration test would be a really good idea too, if the current framework (of using sed to modify config.json) is sufficient.
|
Added an integration test. |
|
+1 on the integration test. While I'm concerned about the fact that this would break in some user namespaced container setups, there's not much we can do without some pretty dodgy modify-the-mount-namespace-from-another-process-that-is-actually-root shennanigans. So I'm fine with the current limitations, but would like that the error message we get is slightly nicer -- maybe just use a |
If copyup is specified for a tmpfs mount, then the contents of the underlying directory are copied into the tmpfs mounted over it. Signed-off-by: Mrunal Patel <[email protected]>
Signed-off-by: Mrunal Patel <[email protected]>
|
@cyphar Updated error messages. |
|
@opencontainers/runc-maintainers PTAL. All comments have been addressed. |
|
WooHoo. Took a long time to finally get this functionality into Docker, but it is finally there. |
|
Is this functionality accessible from Docker? @rhatdan had a Moby PR but it was not merged, and the comments lead here... At the moment, |
|
@bchallenor I haven't tried this, but if you can specify |
|
Any idea what the syntax would be for that? The only options that the Docker docs mention are |
|
I believe it should be the default. I am saddened to see that this behaviour does not work with k8s yet either. We are missing an opportunity to run containres as Read-Only mode. And this feature makes it easier. I believe the default for Kubernetes should be to run container in read-only mode. But this issue is containers tend to need to write to /run, /tmp, and /var/tmp. If we could setup container runtimes to mount tmpfs on these directories if the image is running in readonly mode by default, then most containers would run, except for the fact that sometimes the underlying directories contain content that is needed for the container. httpd for example usually ships with a /run/httpd directory and expects to write to this directory. If you just mount a tmpfs over /run and don't copy up, the httpd will fail, because it does not create the /run/httpd directory if it does not exist. |
|
By "I believe it should be the default" do you mean that you think it should be the default, or that it already is? If the former, I agree, as I am also motivated by wanting a readonly rootfs with a few read/write holes punched in it. You mentioned above "getting this functionality into Docker" - can it actually be used via the Docker UI, or is it confined to runc right now? |
|
I am not sure if this ever got fully merged into upstream docker. Simple enough to check docker run -ti --tmpfs /etc fedora ls /etc |
|
@bchallenor If you use the |
|
I believe it should be the default. I am saddened to see that this behaviour does not work with k8s yet either. We are missing an opportunity to run containres as Read-Only mode. And this feature makes it easier. I believe the default for Kubernetes should be to run container in read-only mode. But this issue is containers tend to need to write to /run, /tmp, and /var/tmp. If we could setup container runtimes to mount tmpfs on these directories if the image is running in readonly mode by default, then most containers would run, except for the fact that sometimes the underlying directories contain content that is needed for the container. httpd for example usually ships with a /run/httpd directory and expects to write to this directory. If you just mount a tmpfs over /run and don't copy up, the httpd will fail, because it does not create the /run/httpd directory if it does not exist. |
|
So far I haven't been able to get this working. Does anyone currently have this working or know what options I need to give the mount declaration when using Docker? == DETAILS == I tried each of the following options separately with no luck. I scraped these mount options from comments in the post and through review the code. Regardless of which option I tried I receive the following error: Docker documentation on volumes only includes mount options |
|
@Joshuaks I suggest you ask Docker support about it. |
This replaces #707 by simply copying over a directory instead of using tar.