Skip to content

Conversation

@mrunalp
Copy link
Contributor

@mrunalp mrunalp commented May 26, 2016

This replaces #707 by simply copying over a directory instead of using tar.

extFlags |= f.flag
}
} else {
data = append(data, o)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure there are no mount data or fstypes that take a copyup option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one here http://linux.die.net/man/1/funionfs.
WDYT about calling it runc_copyup?

Copy link
Member

@crosbymichael crosbymichael May 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there something shorter? Maybe runccopyup rcopyup rccopyup tmpcopyup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like tmpcopyup. I'll push updated code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no implementation of -o delete and -o copyup for now.

😉

But I think we should have a nicer way of doing this than adding to a flag list that might eventually clash with the kernel. Maybe we should add an extension field to mounts? Or would that be overkill?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar I think we can revisit when we have more extensions 😉

@mrunalp
Copy link
Contributor Author

mrunalp commented May 26, 2016

Pushed update to parse it as tmpcopyup

}
}
if copyUp {
tmpDir, err = ioutil.TempDir("/run", "runctmpdir")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using /run here? Shouldn't this be under the machines configured temp dir and not fillup space on /run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it to /tmp

@cyphar
Copy link
Member

cyphar commented May 27, 2016

@mrunalp

Why don't we do it like this:

  1. Create a tmpfs somewhere (/tmp/runc-copyup.XXXXX).
  2. Copy the contents of the directory into that tmpfs.
  3. Bind-mount the tmpfs over the directory.
  4. Lazy unmount and remove the initial temporary directory.

That way, we don't have to do any double-copying. In fact, we could just use mount(..., MS_MOVE) and completely skip step 4.

@cyphar
Copy link
Member

cyphar commented May 27, 2016

@mrunalp Also (and I know this sounds like a trivial thing), but please put your fileutils project under a free software license (currently it is technically proprietary).

@rhatdan
Copy link
Contributor

rhatdan commented May 27, 2016

The tmpdir should probably look for environment variables Something like checking for the existence of
$XDG_RUNTIME_DIR, and then fall back to /tmp. I always worry about creating directories in /tmp since other users could potentially read/write the content in this directory. But since we want runc to be able to run without being UID=0, we don't know for sure if it can write to /run.

@cyphar
Copy link
Member

cyphar commented May 27, 2016

@rhatdan

I always worry about creating directories in /tmp since other users could potentially read/write the content in this directory.

This would no longer be true if we decide to change the mode to something like 0700 (which I would actually recommend).

@rhatdan
Copy link
Contributor

rhatdan commented May 27, 2016

Yes, also if this is in a different mount namespace it could be hidden from other users.

@mrunalp
Copy link
Contributor Author

mrunalp commented May 27, 2016

@cyphar With your suggested approach, I suspect that tmpfs won't be charged to cgroup of the container if bind mounted (I'll take a look anyway). Though the double copy take more time, it doesn't change the semantics of how --tmpfs works today.

@cyphar
Copy link
Member

cyphar commented May 27, 2016

@mrunalp We could use MS_MOVE instead if you want. But since the tmpfs was created after cgroup join, I'm fairly sure kmemcg will track it properly (if not, that's a kernel bug).

@mrunalp
Copy link
Contributor Author

mrunalp commented May 27, 2016

@cyphar Yeah, MS_MOVE should work. I'll try that.

@mrunalp
Copy link
Contributor Author

mrunalp commented May 27, 2016

I pushed a new commit (I'll clean up the PR after review) that uses MS_MOVE. Also, added code for using XDG_RUNTIME_DIR as prefix if it is set in the env. PTAL.

@crosbymichael
Copy link
Member

@mrunalp do you need to rebase?

@mrunalp
Copy link
Contributor Author

mrunalp commented May 27, 2016

Rebased.

if err := mountPropagate(m, rootfs, mountLabel); err != nil {
return err
if copyUp {
tmpDirPrefix := os.Getenv("XDG_RUNTIME_DIR")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this would ever be set inside the container's init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can't leak it from outside with this code so was thinking whether setting in the config might be an acceptable compromise. Else we can special case leak it.

Sent from my iPhone

On May 27, 2016, at 3:15 PM, Michael Crosby [email protected] wrote:

In libcontainer/rootfs_linux.go:

    stat, err := os.Stat(dest)
    if err != nil {
        if err := os.MkdirAll(dest, 0755); err != nil {
            return err
        }
    }
  •   if err := mountPropagate(m, rootfs, mountLabel); err != nil {
    
  •       return err
    
  •   if copyUp {
    
  •       tmpDirPrefix := os.Getenv("XDG_RUNTIME_DIR")
    
    I don't see how this would ever be set inside the container's init.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use the container's /tmp dir or some hidden dir inside the rootfs to make this. Since its temporary and is only there to make the initial tmpfs destination it can be anywhere or even back in your container's /run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to make this work with read only rootfs that we can't write to at all.

Sent from my iPhone

On May 27, 2016, at 3:36 PM, Michael Crosby [email protected] wrote:

In libcontainer/rootfs_linux.go:

    stat, err := os.Stat(dest)
    if err != nil {
        if err := os.MkdirAll(dest, 0755); err != nil {
            return err
        }
    }
  •   if err := mountPropagate(m, rootfs, mountLabel); err != nil {
    
  •       return err
    
  •   if copyUp {
    
  •       tmpDirPrefix := os.Getenv("XDG_RUNTIME_DIR")
    
    Just use the container's /tmp dir or some hidden dir inside the rootfs to make this. Since its temporary and is only there to make the initial tmpfs destination it can be anywhere or even back in your container's /run.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't set it to readonly until after the mounts and after we pivot. Also you can just use /dev/ or /run to do this inside the container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant read only even before we make it read only and I know we will need more changes to get there.

Could possibly use some other tmpfs mount if specified in the config before this mount.

Sent from my iPhone

On May 27, 2016, at 3:48 PM, Michael Crosby [email protected] wrote:

In libcontainer/rootfs_linux.go:

    stat, err := os.Stat(dest)
    if err != nil {
        if err := os.MkdirAll(dest, 0755); err != nil {
            return err
        }
    }
  •   if err := mountPropagate(m, rootfs, mountLabel); err != nil {
    
  •       return err
    
  •   if copyUp {
    
  •       tmpDirPrefix := os.Getenv("XDG_RUNTIME_DIR")
    
    We don't set it to readonly until after the mounts and after we pivot. Also you can just use /dev/ or /run to do this inside the container.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably safe enough to just use /tmp with 0700 and a random directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed XDG_RUNTIME_DIR lookup and switched to using /tmp which works well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrunalp I don't quite understand where we are here? You wrote that you removed lookup, but it's still here.

@mrunalp mrunalp force-pushed the cp_tmpfs branch 2 times, most recently from 96507db to 7a8734d Compare May 31, 2016 16:25
@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 9, 2016

@cyphar @crosbymichael PTAL

@cyphar
Copy link
Member

cyphar commented Jun 10, 2016

@mrunalp There's a bug in fileutils, with the owner of directories (I've set /etc/ to be tmpcopyup and everything is owned by my user 1000:1000):

/ # ls /etc/ -la
total 72
drwxr-xr-x    6 root     root           480 Jun 10 08:44 .
drwxr-xr-x    1 1000     users          206 Jun 10 08:44 ..
-rw-r--r--    1 1000     users          466 Jun 10 08:44 fstab
-rw-r--r--    1 1000     users          298 Jun 10 08:44 group
-rw-r--r--    1 1000     users           10 Jun 10 08:44 hostname
-rw-r--r--    1 1000     users           40 Jun 10 08:44 hosts
drwxr-xr-x    2 root     root           140 Jun 10 08:44 init.d
-rw-r--r--    1 1000     users         1053 Jun 10 08:44 inittab
-rw-r--r--    1 1000     users         1180 Jun 10 08:44 inputrc
drwxr-xr-x    2 root     root           180 Jun 10 08:44 iproute2
-rw-r--r--    1 1000     users           21 Jun 10 08:44 issue
-rw-r--r--    1 1000     users            0 Jun 10 08:44 ld.so.conf
drwxr-xr-x    2 root     root            40 Jun 10 08:44 ld.so.conf.d
lrwxrwxrwx    1 1000     users           12 Jun 10 08:44 mtab -> /proc/mounts
drwxr-xr-x    8 root     root           180 Jun 10 08:44 network
-rw-r--r--    1 1000     users           95 Jun 10 08:44 os-release
-rw-r--r--    1 1000     users          371 Jun 10 08:44 passwd
-rw-r--r--    1 1000     users         1388 Jun 10 08:44 profile
-rw-r--r--    1 1000     users         2744 Jun 10 08:44 protocols
-rw-------    1 1000     users          512 Jun 10 08:44 random-seed
-rw-r--r--    1 1000     users          833 Jun 10 08:44 resolv.conf
-rw-r--r--    1 1000     users          386 Jun 10 08:44 securetty
-rw-r--r--    1 1000     users        10873 Jun 10 08:44 services
-rw-------    1 1000     users          239 Jun 10 08:44 shadow

The directories are owned by root because CopyDirectory doesn't Lchown directories. I'll open a PR against fileutils to fix this. We'll need to take the code from Docker that does MkdirAllAs.

I've opened this PR: mrunalp/fileutils#1.

Also, I would really like it if that repo had some reference to where the code came from as well as a maintainers file with @mrunalp as the sole maintainer.

@cyphar
Copy link
Member

cyphar commented Jun 12, 2016

@mrunalp Since mrunalp/fileutils#1 was merged can you rebase with the vendor updated?

@mrunalp
Copy link
Contributor Author

mrunalp commented Jun 13, 2016

@cyphar Updated.

@mrunalp
Copy link
Contributor Author

mrunalp commented Sep 8, 2016

@opencontainers/runc-maintainers PTAL. All comments have been addressed.

@rhatdan
Copy link
Contributor

rhatdan commented Sep 8, 2016

I really want to start pushing the concept of readonly containers, and this patch is critical to this. Being able to mount tmpfs over certain directories allows us to preserve the layout of a container image and still run the container in read-only mode.

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to review this over the weekend. From a first look, this looks good.

return nil
case "tmpfs":
copyUp := m.Extensions&configs.EXT_COPYUP == configs.EXT_COPYUP
tmpDir := ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you need it now. But you can use m.Destination in the second copyUp block. Doesn't really matter though it was just a thought.

@cyphar
Copy link
Member

cyphar commented Sep 30, 2016

Dammit. GitHub doesn't let me retract the review with an "abstain". I'll LGTM this explicitly in the comments because clearly GitLab is still beating GitHub in the "reviewing code changes" space.

@cyphar
Copy link
Member

cyphar commented Oct 1, 2016

@mrunalp This looks really nice. My main concern with this mode is the foot-gunning is quite easy. For example, if you put a mount for / with tmpcopyup after you've mounted /proc you get a lovely error like this:

container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"rootfs_linux.go:54: mounting \\\"tmpfs\\\" to rootfs \\\"/home/cyphar/src/runc/rootfs\\\" at \\\"/tmp/runctmpdir171994909\\\" caused \\\"failed to copy /home/cyphar/src/runc/rootfs to /tmp/runctmpdir171994909: read /home/cyphar/src/runc/rootfs/proc/1/attr/exec: invalid argument\\\"\""

Now, while it's true that you shouldn't be foot-gunning like this -- is there a nicer way we can handle these errors? If not, that's fine but it does make me a bit nervous wrt the bug reports we're going to get. I'm also currently testing user namespace interactions and will probably test with rootless containers.

@cyphar
Copy link
Member

cyphar commented Oct 1, 2016

As expected, this doesn't play nicely with user namespace setups where there are unmapped UIDs in the thing you're copying over. Unfortunately, the only /nice/ way of handling this would be to do something exceptionally dodgy like: create all of the mounts until you hit a tmpcopyup then request the calling runC process to run a privileged process inside the mount namespace that does the copyup operation which is then MS_MOVEd. Since that is not going to pretty code by any stretch of the imagination, we should come up with some nice error message for things like this so that users don't spam us with "why doesn't this work". Here's an example error:

container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"rootfs_linux.go:54: mounting \\\"tmpfs\\\" to rootfs \\\"/home/cyphar/src/runc/rootfs\\\" at \\\"/tmp/runctmpdir618894030\\\" caused \\\"failed to copy /home/cyphar/src/runc/rootfs to /tmp/runctmpdir618894030: lchown /tmp/runctmpdir618894030/bin: invalid argument\\\"\""

This issue will be come quite pertinent with rootless containers.

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an integration test would be a really good idea too, if the current framework (of using sed to modify config.json) is sufficient.

@mrunalp
Copy link
Contributor Author

mrunalp commented Oct 4, 2016

Added an integration test.

@cyphar
Copy link
Member

cyphar commented Oct 4, 2016

+1 on the integration test. While I'm concerned about the fact that this would break in some user namespaced container setups, there's not much we can do without some pretty dodgy modify-the-mount-namespace-from-another-process-that-is-actually-root shennanigans. So I'm fine with the current limitations, but would like that the error message we get is slightly nicer -- maybe just use a genericError with a cause mentioning tmpcopyup?

If copyup is specified for a tmpfs mount, then the contents of the
underlying directory are copied into the tmpfs mounted over it.

Signed-off-by: Mrunal Patel <[email protected]>
@mrunalp
Copy link
Contributor Author

mrunalp commented Oct 4, 2016

@cyphar Updated error messages.

@mrunalp
Copy link
Contributor Author

mrunalp commented Oct 12, 2016

@opencontainers/runc-maintainers PTAL. All comments have been addressed.

cyphar
cyphar approved these changes Oct 12, 2016
@cyphar
Copy link
Member

cyphar commented Oct 12, 2016

LGTM

Approved with PullApprove

@LK4D4
Copy link
Contributor

LK4D4 commented Oct 21, 2016

LGTM

Approved with PullApprove

@LK4D4 LK4D4 merged commit 1ab9d5e into opencontainers:master Oct 21, 2016
@rhatdan
Copy link
Contributor

rhatdan commented Oct 21, 2016

WooHoo. Took a long time to finally get this functionality into Docker, but it is finally there.

@bchallenor
Copy link

Is this functionality accessible from Docker? @rhatdan had a Moby PR but it was not merged, and the comments lead here...

At the moment, docker --tmpfs seems to hide the underlying contents:

$ docker --version
Docker version 18.01.0-ce, build 03596f51b1

$ docker run --tmpfs /run debian:stable find /run
/run

$ docker run debian:stable find /run
/run
/run/lock
/run/mount
/run/mount/utab
/run/utmp

@cyphar
Copy link
Member

cyphar commented Feb 14, 2018

@bchallenor I haven't tried this, but if you can specify tmpcopyup as one of the mount options with --tmpfs then it should trigger this functionality (as Docker just passes down the mount options to runc).

@bchallenor
Copy link

Any idea what the syntax would be for that? The only options that the Docker docs mention are tmpfs-size and tmpfs-mode.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 14, 2018

I believe it should be the default. I am saddened to see that this behaviour does not work with k8s yet either. We are missing an opportunity to run containres as Read-Only mode. And this feature makes it easier. I believe the default for Kubernetes should be to run container in read-only mode. But this issue is containers tend to need to write to /run, /tmp, and /var/tmp. If we could setup container runtimes to mount tmpfs on these directories if the image is running in readonly mode by default, then most containers would run, except for the fact that sometimes the underlying directories contain content that is needed for the container. httpd for example usually ships with a /run/httpd directory and expects to write to this directory. If you just mount a tmpfs over /run and don't copy up, the httpd will fail, because it does not create the /run/httpd directory if it does not exist.

@bchallenor
Copy link

By "I believe it should be the default" do you mean that you think it should be the default, or that it already is? If the former, I agree, as I am also motivated by wanting a readonly rootfs with a few read/write holes punched in it.

You mentioned above "getting this functionality into Docker" - can it actually be used via the Docker UI, or is it confined to runc right now?

@rhatdan
Copy link
Contributor

rhatdan commented Feb 14, 2018

I am not sure if this ever got fully merged into upstream docker. Simple enough to check

docker run -ti --tmpfs /etc fedora ls /etc

@cyphar
Copy link
Member

cyphar commented Feb 14, 2018

@bchallenor If you use the --mount flag you can specify the mount options explicitly. The docs you linked to explain how to use --mount instead of --tmpfs.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 15, 2018

I believe it should be the default. I am saddened to see that this behaviour does not work with k8s yet either. We are missing an opportunity to run containres as Read-Only mode. And this feature makes it easier. I believe the default for Kubernetes should be to run container in read-only mode. But this issue is containers tend to need to write to /run, /tmp, and /var/tmp. If we could setup container runtimes to mount tmpfs on these directories if the image is running in readonly mode by default, then most containers would run, except for the fact that sometimes the underlying directories contain content that is needed for the container. httpd for example usually ships with a /run/httpd directory and expects to write to this directory. If you just mount a tmpfs over /run and don't copy up, the httpd will fail, because it does not create the /run/httpd directory if it does not exist.

@josh1703658784
Copy link

josh1703658784 commented Apr 3, 2022

So far I haven't been able to get this working. Does anyone currently have this working or know what options I need to give the mount declaration when using Docker?

== DETAILS ==

# docker info
Docker version 20.10.3, build b455053
Docker Compose version v2.3.3
# container / docker-compose info
  radarr:
    image: lscr.io/linuxserver/radarr:latest
    read_only: true
    cap_drop:
      - all
    cap_add:
      ...
    environment:
      - S6_READ_ONLY_ROOT=1
      ...
    volumes:
      ...
    tmpfs:
      ...
      - /app
      ...
    expose:
      - 7878

I tried each of the following options separately with no luck. I scraped these mount options from comments in the post and through review the code.

# docker-compose tmpfs modes
      tmpfs:
        - /app:tmpcopyup
        - /app:runctmpdir
        - /app:runccopyup 
        - /app:runc_copyup
        - /app:rcopyup 
        - /app:rccopyup

Regardless of which option I tried I receive the following error:

# error
./run: line 3: cd: /app/radarr/bin: No such file or directory

Docker documentation on volumes only includes mount options tmps-size and tmps-mode.

@kolyshkin
Copy link
Contributor

@Joshuaks I suggest you ask Docker support about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants