Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

virtcontainers: Add support for ephemeral volumes #307

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion cli/create.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import (
"strings"

vc "github.com/kata-containers/runtime/virtcontainers"
"github.com/kata-containers/runtime/virtcontainers/device/config"
"github.com/kata-containers/runtime/virtcontainers/pkg/oci"
"github.com/sirupsen/logrus"
"github.com/urfave/cli"
Expand Down Expand Up @@ -103,7 +104,6 @@ func create(containerID, bundlePath, console, pidFilePath string, detach bool,
disableOutput := noNeedForOutput(detach, ociSpec.Process.Terminal)

var process vc.Process

switch containerType {
case vc.PodSandbox:
process, err = createSandbox(ociSpec, runtimeConfig, containerID, bundlePath, console, disableOutput)
Expand Down Expand Up @@ -259,6 +259,19 @@ func createContainer(ociSpec oci.CompatOCISpec, containerID, bundlePath,
return vc.Process{}, err
}

// Add the ephemeral device if ephemeral volume
// has to be attached to the container. For the given pod
// ephemeral volume is created only once backed by tmpfs
// inside the VM. For successive containers of the same
// pod the already existing volume is reused.
for _, mnt := range contConfig.Mounts {
if IsEphemeralStorage(mnt.Source) {
deviceInfo := config.DeviceInfo{}
deviceInfo.DevType = "e"
deviceInfo.ContainerPath = mnt.Source
contConfig.DeviceInfos = append(contConfig.DeviceInfos, deviceInfo)
}
}
_, c, err := vci.CreateContainer(sandboxID, contConfig)
if err != nil {
return vc.Process{}, err
Expand Down
25 changes: 24 additions & 1 deletion cli/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,10 @@ import (
"strings"
)

const unknown = "<<unknown>>"
const (
unknown = "<<unknown>>"
k8sEmptyDir = "kubernetes.io~empty-dir"
)

// variables to allow tests to modify the values
var (
Expand Down Expand Up @@ -43,6 +46,26 @@ func getFileContents(file string) (string, error) {
return string(bytes), nil
}

// IsEphemeralStorage returns true if the given path
// to the storage belongs to kubernetes ephemeral storage
//
// This method depends on a specific path used by k8s
// to detect if it's of type ephemeral. As of now,
// this is a very k8s specific solution that works
// but in future there should be a better way for this
// method to determine if the path is for ephemeral
// volume type
func IsEphemeralStorage(path string) bool {
splitSourceSlice := strings.Split(path, "/")
if len(splitSourceSlice) > 1 {
storageType := splitSourceSlice[len(splitSourceSlice)-2]
if storageType == k8sEmptyDir {
return true
}
}
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like this IsEphemeralStroage.
Can Kubernetes guarantee this name will be maintained and kept backward compatible? In other words, is this function reliable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I had a better way here. :(

In future, if k8s decides to change that path, kata might just end up provisioning the storage over 9pfs just like it does right now.

I am trying to see if we can come up with some sort of standard for ephemeral storage that everyone can comply with (k8s, runtimes etc). Until then, I don't see any other way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel that this will break at some point in the future, so I'd rather we get absolute clarification on the best approach from the k8s folk (and ideally some sort of assurances).

Does anyone have a contact from that project who could comment here?

/cc @sameo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @saad-ali

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel @WeiZhang555 @sboeuf I have created a topic in k8s community on the stability of ephemeral volumes they provision, https://groups.google.com/forum/#!topic/kubernetes-dev/4JtfGSSUO_A

Feel free to join the discussion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thx @harche.
In the meantime, just make sure our code has enough comment to explain this is not a perfect/stable solution.

Copy link
Contributor Author

@harche harche Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf sure.

Meanwhile, Looking at Michelle Au's (Google) comment and Tim Hockin's (Google) comment, the general tone there resonates some of the concerns raised here.

However, Michelle mentioned in her reply on slack,
to answer your question, emprydir path on the kubelet should be stable

Quoting's Tim's reply on mailing list,
I would probably NAK any change to the volume implementations that changed the path, but I don't think it is guaranteed.

Although for the longer term, the we need to come up with a solution that doesn't depend on these assurances (or lack of them). There are a couple of approaches,

  1. As Michelle suggested, we need to find a way to generically invoke volume plugins in the runtime

  2. Or In my opinion, we should decouple storage drivers from kata. Storage drivers should be these pluggable entities which anyone can write for the storage they are dealing with. Pretty much how docker decoupled runtime from their code.

@egernst @WeiZhang555 @sboeuf @bergwolf @jodh-intel
Let me know your thoughts on how should we proceed in short-term as well as in long-term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is still a reasonable improvement especially on security aspect, I'm fine to merge it as a short term enhancement.
But it's better to find a solid method doing this later.
LGTM for now to merge this.


func getKernelVersion() (string, error) {
contents, err := getFileContents(procVersion)
if err != nil {
Expand Down
14 changes: 14 additions & 0 deletions cli/utils_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,20 @@ func TestFileExists(t *testing.T) {
fmt.Sprintf("File %q should exist", file))
}

func TestIsEphemeralStorage(t *testing.T) {
sampleEphePath := "/var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/kubernetes.io~empty-dir/cache-volume"
isEphe := IsEphemeralStorage(sampleEphePath)
if !isEphe {
t.Fatalf("Unable to correctly determine volume type")
}

sampleEphePath = "/var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/cache-volume"
isEphe = IsEphemeralStorage(sampleEphePath)
if isEphe {
t.Fatalf("Unable to correctly determine volume type")
}
}

func TestGetFileContents(t *testing.T) {
type testData struct {
contents string
Expand Down
13 changes: 12 additions & 1 deletion virtcontainers/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,7 @@ func (c *Container) createContainersDirs() error {
// available when we will need to unmount those mounts.
func (c *Container) mountSharedDirMounts(hostSharedDir, guestSharedDir string) ([]Mount, error) {
var sharedDirMounts []Mount

for idx, m := range c.mounts {
if isSystemMount(m.Destination) || m.Type != "bind" {
continue
Expand All @@ -435,6 +436,17 @@ func (c *Container) mountSharedDirMounts(hostSharedDir, guestSharedDir string) (
return nil, err
}

// If there exists an ephemeral device corresponding
// to the given mount point we will skip the shared mounts
//
// We are not going to bind mount host ephemeral directory
// provided by k8s, instead at later stage we are going to
// create a tmpfs inside a VM and use it to bind mount to the
// containers of the pod.
if utils.IsEphemeralDevice(c.config.DeviceInfos, m.Source) {
continue
}

// Check if mount is a block device file. If it is, the block device will be attached to the host
// instead of passing this as a shared mount.
if c.checkBlockDeviceSupport() && stat.Mode&unix.S_IFBLK == unix.S_IFBLK {
Expand Down Expand Up @@ -476,7 +488,6 @@ func (c *Container) mountSharedDirMounts(hostSharedDir, guestSharedDir string) (
if err := bindMount(m.Source, mountDest, false); err != nil {
return nil, err
}

// Save HostPath mount value into the mount list of the container.
c.mounts[idx].HostPath = mountDest

Expand Down
2 changes: 2 additions & 0 deletions virtcontainers/device/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ type DeviceInfo struct {
// p - FIFO
// b - block(buffered) special file
// More info in mknod(1).
// also,
// e - ephemeral volume
DevType string

// Major, minor numbers for device.
Expand Down
1 change: 0 additions & 1 deletion virtcontainers/filesystem.go
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,6 @@ func (fs *filesystem) fetchDeviceFile(fileData []byte, devices *[]api.Device) er
}
tempDevices = append(tempDevices, &device)
l.Infof("Generic device unmarshalled [%v]", device)

default:
return fmt.Errorf("Unknown device type, could not unmarshal")
}
Expand Down
28 changes: 28 additions & 0 deletions virtcontainers/kata_agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
kataclient "github.com/kata-containers/agent/protocols/client"
"github.com/kata-containers/agent/protocols/grpc"
"github.com/kata-containers/runtime/virtcontainers/device/api"
"github.com/kata-containers/runtime/virtcontainers/device/config"
"github.com/kata-containers/runtime/virtcontainers/device/drivers"
vcAnnotations "github.com/kata-containers/runtime/virtcontainers/pkg/annotations"
ns "github.com/kata-containers/runtime/virtcontainers/pkg/nsenter"
Expand Down Expand Up @@ -51,6 +52,7 @@ var (
sharedDir9pOptions = []string{"trans=virtio,version=9p2000.L", "nodev"}
shmDir = "shm"
kataEphemeralDevType = "ephemeral"
ephemeralPath = filepath.Join(kataGuestSandboxDir, kataEphemeralDevType)
)

// KataAgentConfig is a structure storing information needed
Expand Down Expand Up @@ -775,12 +777,38 @@ func (k *kataAgent) createContainer(sandbox *Sandbox, c *Container) (p *Process,
return nil, err
}

var devInfos []config.DeviceInfo
for _, devInfo := range c.config.DeviceInfos {
if devInfo.DevType == "e" {
filename := filepath.Join(ephemeralPath, filepath.Base(devInfo.ContainerPath))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Base guaranteed to be unique here? If not, I would append this with a randomly generated string to make sure there are no conflicts.

Copy link
Contributor Author

@harche harche Jun 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Base is guaranteed to be unique. In k8s ephemeral volume is defined by,

  volumes:
  - name: cache-volume
    emptyDir:
      medium: "Memory"

In the pod yaml.

The Base comes from name of the volume which is unique within the pod (which in the example above would be cache-volume)

epheStorage := &grpc.Storage{
Driver: kataEphemeralDevType,
Source: "tmpfs",
Fstype: "tmpfs",
MountPoint: filename,
}
ctrStorages = append(ctrStorages, epheStorage)

} else {
devInfos = append(devInfos, devInfo)
}
}

// Handle container mounts
newMounts, err := c.mountSharedDirMounts(kataHostSharedDir, kataGuestSharedDir)
if err != nil {
return nil, err
}

// Modify the mount source for ephemeral volume
for idx, mnt := range ociSpec.Mounts {
if utils.IsEphemeralDevice(c.config.DeviceInfos, mnt.Source) {
ociSpec.Mounts[idx].Source = filepath.Join(ephemeralPath, filepath.Base(mnt.Source))
}
}

c.config.DeviceInfos = devInfos

// We replace all OCI mount sources that match our container mount
// with the right source path (The guest one).
if err = k.replaceOCIMountSource(ociSpec, newMounts); err != nil {
Expand Down
14 changes: 14 additions & 0 deletions virtcontainers/utils/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ import (
"os"
"os/exec"
"path/filepath"

"github.com/kata-containers/runtime/virtcontainers/device/config"
)

const cpBinaryName = "cp"
Expand Down Expand Up @@ -67,6 +69,18 @@ func ReverseString(s string) string {
return string(r)
}

// IsEphemeralDevice returns true if there exists an ephemeral
// device in the configuration who's ContainerPath matches with
// the mount source
func IsEphemeralDevice(deviceInfos []config.DeviceInfo, source string) bool {
for _, devInfo := range deviceInfos {
if devInfo.DevType == "e" && devInfo.ContainerPath == source {
return true
}
}
return false
}

// CleanupFds closed bundles of open fds in batch
func CleanupFds(fds []*os.File, numFds int) {
maxFds := len(fds)
Expand Down