Skip to content

Commit

Permalink
init: close internal fds before execve
Browse files Browse the repository at this point in the history
If we leak a file descriptor referencing the host filesystem, an
attacker could use a /proc/self/fd magic-link as the source for execve
to execute a host binary in the container. This would allow the binary
itself (or a process inside the container in the 'runc exec' case) to
write to a host binary, leading to a container escape.

The simple solution is to make sure we close all file descriptors
immediately before the execve(2) step. Doing this earlier can lead to very
serious issues in Go (as file descriptors can be reused, any (*os.File)
reference could start silently operating on a different file) so we have
to do it as late as possible.

Unfortunately, there are some Go runtime file descriptors that we must
not close (otherwise the Go scheduler panics randomly). The only way of
being sure which file descriptors cannot be closed is to sneakily
go:linkname the runtime internal "internal/poll.IsPollDescriptor"
function. This is almost certainly not recommended but there isn't any
other way to be absolutely sure, while also closing any other possible
files.

In addition, we can keep the logrus forwarding logfd open because you
cannot execve a pipe and the contents of the pipe are so restricted
(JSON-encoded in a format we pick) that it seems unlikely you could even
construct shellcode. Closing the logfd causes issues if there is an
error returned from execve.

In mainline runc, runc-dmz protects us against this attack because the
intermediate execve(2) closes all of the O_CLOEXEC internal runc file
descriptors and thus runc-dmz cannot access them to attack the host.

Fixes: GHSA-xr7r-f8xq-vfvv CVE-2024-21626
Signed-off-by: Aleksa Sarai <[email protected]>
  • Loading branch information
cyphar committed Jan 23, 2024
1 parent 8e1cd2f commit f2f1621
Show file tree
Hide file tree
Showing 5 changed files with 117 additions and 13 deletions.
2 changes: 1 addition & 1 deletion libcontainer/init_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ func Init() {
}
// Normally, StartInitialization() never returns, meaning
// if we are here, it had failed.
os.Exit(1)
os.Exit(255)
}

// Normally, this function does not return. If it returns, with or without an
Expand Down
9 changes: 9 additions & 0 deletions libcontainer/logs/logs.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,19 @@ import (
"bufio"
"encoding/json"
"io"
"os"

"github.com/sirupsen/logrus"
)

// IsLogrusFd returns whether the provided fd matches the one that logrus is
// currently outputting to. This should only ever be called by UnsafeCloseFrom
// from `runc init`.
func IsLogrusFd(fd uintptr) bool {
file, ok := logrus.StandardLogger().Out.(*os.File)
return ok && file.Fd() == fd
}

func ForwardLogs(logPipe io.ReadCloser) chan error {
done := make(chan error, 1)
s := bufio.NewScanner(logPipe)
Expand Down
19 changes: 19 additions & 0 deletions libcontainer/setns_init_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (
"github.com/opencontainers/runc/libcontainer/keys"
"github.com/opencontainers/runc/libcontainer/seccomp"
"github.com/opencontainers/runc/libcontainer/system"
"github.com/opencontainers/runc/libcontainer/utils"
)

// linuxSetnsInit performs the container's initialization for running a new process
Expand Down Expand Up @@ -139,5 +140,23 @@ func (l *linuxSetnsInit) Init() error {
l.config.Args[0] = name
return system.Fexecve(l.dmzExe.Fd(), l.config.Args, os.Environ())
}
// Close all file descriptors we are not passing to the container. This is
// necessary because the execve target could use internal runc fds as the
// execve path, potentially giving access to binary files from the host
// (which can then be opened by container processes, leading to container
// escapes). Note that because this operation will close any open file
// descriptors that are referenced by (*os.File) handles from underneath
// the Go runtime, we must not do any file operations after this point
// (otherwise the (*os.File) finaliser could close the wrong file). See
// CVE-2024-21626 for more information as to why this protection is
// necessary.
//
// This is not needed for runc-dmz, because the extra execve(2) step means
// that all O_CLOEXEC file descriptors have already been closed and thus
// the second execve(2) from runc-dmz cannot access internal file
// descriptors from runc.
if err := utils.UnsafeCloseFrom(l.config.PassedFilesCount + 3); err != nil {
return err
}
return system.Exec(name, l.config.Args, os.Environ())
}
18 changes: 18 additions & 0 deletions libcontainer/standard_init_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -282,5 +282,23 @@ func (l *linuxStandardInit) Init() error {
l.config.Args[0] = name
return system.Fexecve(l.dmzExe.Fd(), l.config.Args, os.Environ())
}
// Close all file descriptors we are not passing to the container. This is
// necessary because the execve target could use internal runc fds as the
// execve path, potentially giving access to binary files from the host
// (which can then be opened by container processes, leading to container
// escapes). Note that because this operation will close any open file
// descriptors that are referenced by (*os.File) handles from underneath
// the Go runtime, we must not do any file operations after this point
// (otherwise the (*os.File) finaliser could close the wrong file). See
// CVE-2024-21626 for more information as to why this protection is
// necessary.
//
// This is not needed for runc-dmz, because the extra execve(2) step means
// that all O_CLOEXEC file descriptors have already been closed and thus
// the second execve(2) from runc-dmz cannot access internal file
// descriptors from runc.
if err := utils.UnsafeCloseFrom(l.config.PassedFilesCount + 3); err != nil {
return err
}
return system.Exec(name, l.config.Args, os.Environ())
}
82 changes: 70 additions & 12 deletions libcontainer/utils/utils_unix.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,13 @@ import (
"runtime"
"strconv"
"sync"
_ "unsafe" // for go:linkname

securejoin "github.com/cyphar/filepath-securejoin"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"

"github.com/opencontainers/runc/libcontainer/logs"
)

// EnsureProcHandle returns whether or not the given file handle is on procfs.
Expand Down Expand Up @@ -53,14 +56,11 @@ func haveCloseRangeCloexec() bool {
return haveCloseRangeCloexecBool
}

// CloseExecFrom applies O_CLOEXEC to all file descriptors currently open for
// the process (except for those below the given fd value).
func CloseExecFrom(minFd int) error {
if haveCloseRangeCloexec() {
err := unix.CloseRange(uint(minFd), math.MaxUint, unix.CLOSE_RANGE_CLOEXEC)
return os.NewSyscallError("close_range", err)
}
type fdFunc func(fd int)

// fdRangeFrom calls the passed fdFunc for each file descriptor that is open in
// the current process.
func fdRangeFrom(minFd int, fn fdFunc) error {
procSelfFd, closer := ProcThreadSelf("fd")
defer closer()

Expand Down Expand Up @@ -88,15 +88,73 @@ func CloseExecFrom(minFd int) error {
if fd < minFd {
continue
}
// Intentionally ignore errors from unix.CloseOnExec -- the cases where
// this might fail are basically file descriptors that have already
// been closed (including and especially the one that was created when
// os.ReadDir did the "opendir" syscall).
unix.CloseOnExec(fd)
// Ignore the file descriptor we used for readdir, as it will be closed
// when we return.
if uintptr(fd) == fdDir.Fd() {
continue
}
// Run the closure.
fn(fd)
}
return nil
}

// CloseExecFrom sets the O_CLOEXEC flag on all file descriptors greater or
// equal to minFd in the current process.
func CloseExecFrom(minFd int) error {
// Use close_range(CLOSE_RANGE_CLOEXEC) if possible.
if haveCloseRangeCloexec() {
err := unix.CloseRange(uint(minFd), math.MaxUint, unix.CLOSE_RANGE_CLOEXEC)
return os.NewSyscallError("close_range", err)
}
// Otherwise, fall back to the standard loop.
return fdRangeFrom(minFd, unix.CloseOnExec)
}

//go:linkname runtime_IsPollDescriptor internal/poll.IsPollDescriptor

// In order to make sure we do not close the internal epoll descriptors the Go
// runtime uses, we need to ensure that we skip descriptors that match
// "internal/poll".IsPollDescriptor. Yes, this is a Go runtime internal thing,
// unfortunately there's no other way to be sure we're only keeping the file
// descriptors the Go runtime needs. Hopefully nothing blows up doing this...
func runtime_IsPollDescriptor(fd uintptr) bool //nolint:revive

// UnsafeCloseFrom closes all file descriptors greater or equal to minFd in the
// current process, except for those critical to Go's runtime (such as the
// netpoll management descriptors).
//
// NOTE: That this function is incredibly dangerous to use in most Go code, as
// closing file descriptors from underneath *os.File handles can lead to very
// bad behaviour (the closed file descriptor can be re-used and then any
// *os.File operations would apply to the wrong file). This function is only
// intended to be called from the last stage of runc init.
func UnsafeCloseFrom(minFd int) error {
// We cannot use close_range(2) even if it is available, because we must
// not close some file descriptors.
return fdRangeFrom(minFd, func(fd int) {
if runtime_IsPollDescriptor(uintptr(fd)) {
// These are the Go runtimes internal netpoll file descriptors.
// These file descriptors are operated on deep in the Go scheduler,
// and closing those files from underneath Go can result in panics.
// There is no issue with keeping them because they are not
// executable and are not useful to an attacker anyway. Also we
// don't have any choice.
return
}
if logs.IsLogrusFd(uintptr(fd)) {
// Do not close the logrus output fd. We cannot exec a pipe, and
// the contents are quite limited (very little attacker control,
// JSON-encoded) making shellcode attacks unlikely.
return
}
// There's nothing we can do about errors from close(2), and the
// only likely error to be seen is EBADF which indicates the fd was
// already closed (in which case, we got what we wanted).
_ = unix.Close(fd)
})
}

// NewSockPair returns a new SOCK_STREAM unix socket pair.
func NewSockPair(name string) (parent, child *os.File, err error) {
fds, err := unix.Socketpair(unix.AF_LOCAL, unix.SOCK_STREAM|unix.SOCK_CLOEXEC, 0)
Expand Down

0 comments on commit f2f1621

Please sign in to comment.