Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

agent: Add support for ephemeral volumes #236

Merged
merged 1 commit into from
May 29, 2018

Conversation

harche
Copy link
Contributor

@harche harche commented May 14, 2018

Ephemeral volumes needs to mounted inside VM
and not passed as 9pfs mounts.

Signed-off-by: Harshal Patil [email protected]

@harche
Copy link
Contributor Author

harche commented May 14, 2018

Corresponding PR in runtime, kata-containers/runtime#307

Copy link
Member

@egernst egernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea - thanks for the PR, Harshal. Couple questions/comments.

@amshinde PTAL?

mount.go Outdated
absSource, err = filepath.EvalSymlinks(source)
if err != nil {
return grpcStatus.Errorf(codes.Internal, "Could not resolve symlink for source %v", source)
if fsType == typeTmpFs {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have more than a couple cases now for the fsType now, I wonder if it'd warrant throwing this into a case now? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, switch case makes more sense here.

grpc.go Outdated
if len(splitSourceSlice) > 1 {
k8sStorageType := splitSourceSlice[len(splitSourceSlice)-2]
if k8sStorageType == "emphemeral" {
// Only mount tmpfs once on the host, and reuse the the mount destination
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my own ignorance likely -- can you clarify "mount temps once on the host" ? The agent is running in the guest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shouldn't mount tmpfs in the host. And direct mount it in the guest vm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egernst Sorry I will update that comment in the code to avoid the confusion. It should read, Only mount tmpfs once inside the VM, and reuse the the mount destination

Ephemeral Volumes live and die with the pod. They are created and mounted on the host by kublet and then binded mounted to every container in the pod that wants to use the volume. In case of kata, the pod is our VM Sandbox. So the volume has to be confined within the VM context. This tmpfs has to be created inside the VM, hence you see the code to create tmpfs in the agent. This way we increase the isolation provided by kata to the containers. The containers that want to share the data using ephemeral volumes won't have to leak their data to the host because the backing volume stays within VM memory.

@jshachm By the time control comes to kata, kublet has already provision tmpfs on the host. We can choose to ingore that and provision our tmpfs inside the guest as this patching achieving to do.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create some tests for this change (either unit tests here, or maybe raise an issue in https://github.com/kata-containers/tests for a different type of test)?

grpc.go Outdated
splitSourceSlice := strings.Split(ociMnt.Source, "/")
if len(splitSourceSlice) > 1 {
k8sStorageType := splitSourceSlice[len(splitSourceSlice)-2]
if k8sStorageType == "emphemeral" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is this a reliable test? This appears to be looking for the word "emphemeral" in the ociMnt.Source path (and that "magic tag" might change at some point.
  • This block uses deep indentation. You could save one level by restructuring slightly:
    if k8sStorageType != "emphemeral" {
        continue
    }
    
    ...
    

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel Is it very reliable to check for k8sStorageType == "emphemeral" condition. The string emphemeral is introduced by this patch, https://github.com/kata-containers/runtime/pull/307/files#diff-8a56f5a1fa5971cc43e650ffa24f1e2bR376

That patch in runtime loops over the given OCI mounts to look for the mounts used by k8s for ephemeral storage. It then replaces those mount paths on the host with the mount paths in the guest by adding "/ephemeral" to the path. So it's very much in our control the existance of that string. Right now without this patch, an ephemeral volume is passed to guested via 9p.

e.g. A typical ephemeral volume path looks like, /var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/kubernetes.io~empty-dir/cache-volume

But this resides on the host. We want that volume to be provisioned inside our VM. We can keep the same path or we can make it simpler since we have the total control on what happens inside the sandbox VM. So I replace that original source path with much simpler "/ephemeral". Inside the VM that above path becomes, /ephemeral/cache-volume

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel I will add some unit test cases.

grpc.go Outdated
for idx, ociMnt := range ociSpec.Mounts {
splitSourceSlice := strings.Split(ociMnt.Source, "/")
if len(splitSourceSlice) > 1 {
k8sStorageType := splitSourceSlice[len(splitSourceSlice)-2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could create a variable for len(splitSourceSlice) as you've scanned the string twice here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@bergwolf
Copy link
Member

bergwolf commented May 16, 2018

Instead of checking k8sStorageType, we should add an ephemeral storage driver type and let ociSpec.Mounts just reference it. For example,

Let req.stroage have
Strorage{Driver:"ephemeral", fstype:"tmpfs", MountPoint: <sandbox-mountpoint-foo>}

And in ociSpec.Mounts, let ociSpec.Mounts[i].Mount have
Mount{Source:<sandbox-mountpoint-foo>, Destination:< ephemeral-volume-inside-of-container>, Type: "bind"}

The main change is in addStorages, we create the tmpfs mountpoint at <sandbox-mountpoint-foo>, with a new storage driver handler.

@harche
Copy link
Contributor Author

harche commented May 16, 2018

@bergwolf Sounds good, I will give it a shot and update the PR.

@sboeuf
Copy link

sboeuf commented May 17, 2018

@bergwolf I completely agree this should be the way to do it. We have designed Storage and Device for this purpose.
@harche please let us know when the code is ready :)

@sboeuf
Copy link

sboeuf commented May 17, 2018

@harche just to clarify about ephemeral volumes. Do you expect anything to be present on the host at a location like /var/lib/kubelet/pods/366c3a75-4869-11e8-b479-507b9ddd5ce4/volumes/kubernetes.io~empty-dir/cache-volume ?

I ask because if this is basically an empty volume, then great, a simple volume backed by tmpfs on the guest will do the trick.
But if we expect some data to be there initially, then we would also have to share them, or copy them inside the guest as a first step.

Just want to make sure we're on the same page !

@harche
Copy link
Contributor Author

harche commented May 18, 2018

@sboeuf There isn't any data initially. Ephemeral volumes are used only for sharing the data between the containers of the same pod. So the volume is empty to begin with unless a container from the respective pod writes something in it.

@sboeuf
Copy link

sboeuf commented May 18, 2018

@harche okay thanks for clarification ;)

@harche
Copy link
Contributor Author

harche commented May 22, 2018

@sboeuf @bergwolf @jodh-intel I have updated this PR as well as corresponding PR in runtime.

@bergwolf
Copy link
Member

bergwolf commented May 22, 2018

@harche Thanks for the change. It
LGTM!

Approved with PullApprove

Copy link
Member

@caoruidong caoruidong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Once CI is green, it can be merged.

mount.go Outdated
_, err = commonStorageHandler(storage)
return "", err

} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete this else to make CI happy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@codecov
Copy link

codecov bot commented May 22, 2018

Codecov Report

Merging #236 into master will increase coverage by 0.42%.
The diff coverage is 64.7%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #236      +/-   ##
==========================================
+ Coverage   43.42%   43.84%   +0.42%     
==========================================
  Files          14       14              
  Lines        2091     2178      +87     
==========================================
+ Hits          908      955      +47     
- Misses       1070     1103      +33     
- Partials      113      120       +7
Impacted Files Coverage Δ
device.go 38.11% <ø> (ø) ⬆️
mount.go 64.65% <64.7%> (+2.03%) ⬆️
network.go 49.16% <0%> (+0.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7d934d...08674c0. Read the comment docs.

@jodh-intel
Copy link
Contributor

jodh-intel commented May 22, 2018

Thanks @harche!

lgtm

Approved with PullApprove

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment!
Looks good to me otherwise.

mount.go Outdated

func ephemeralStorageHandler(storage pb.Storage, s *sandbox) (string, error) {
if _, err := os.Stat(storage.MountPoint); os.IsNotExist(err) {
os.MkdirAll(storage.MountPoint, os.ModePerm)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test the return of MkdirAll().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @sboeuf!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harche You do not even need to check if the directory exists before calling MkdirAll. From the docs, "If path is already a directory, MkdirAll does nothing and returns nil."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde Well I think you need to check this because in case it does exist, you don't need to call into commonStorageHandler().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why are we not calling then commonStorageHandler() then? It may happen that the directory exists, but we skip mounting altogether.

Copy link
Contributor Author

@harche harche May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde the tmpfs mount point has to be mounted on the guest only once. Then rest of the containers will bind mount that directory.

e.g. If all the containers need to have an ephemeral volume in the pod they will bind mount, say /sharedMount directory on the VM. But before that this /sharedMount has to be mounted on the VM backed by tmpfs (but only once)

The above code is tested with a k8s yaml with 3 containers sharing the mount point.

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  annotations:
    io.kubernetes.cri.untrusted-workload: "true"
  labels:
    app: myapp
spec:
  containers:
  # Application Container
  - name: myapp-container
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
    imagePullPolicy: Never
    image: app_image:latest
    command: ['sh', '-c', 'cat /cache/test && sleep 3600']
  initContainers:
  # Copy Data container
  - name: data-container
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
    imagePullPolicy: Never
    image: data_image:latest
    command: ['sh', '-c', 'cp /test /cache/']
  # Append Data container
  - name: init-append-data
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
    imagePullPolicy: Never
    image: test_image:latest
    command: ['sh', '-c', 'echo test >> /cache/test']
  volumes:
  - name: cache-volume
    emptyDir:
      medium: "Memory"
``

Copy link
Contributor Author

@harche harche May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde It's is intentionally skipped mounting if dir exists because then every single container end up pointing towards a seperate tmpfs volume and they don't end up sharing the volume. So only once (for the first time) you have to make sure the dir exists and then mount it on guest backed by tmpfs. The corresponding enteries in all container's OCI spec already point towards bind mounting that dir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harche Thanks for the explanation, my question was based on the assumption that you would be adding storage just once for the first container.

@jodh-intel
Copy link
Contributor

Adding dnm label until the mkdir check is added.

if fsType != type9pFs {
var err error

var err error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you dont need to declare err here, since you use a new instance in the inner scope.

Copy link
Contributor Author

@harche harche May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. thanks.

mount.go Outdated

func ephemeralStorageHandler(storage pb.Storage, s *sandbox) (string, error) {
if _, err := os.Stat(storage.MountPoint); os.IsNotExist(err) {
os.MkdirAll(storage.MountPoint, os.ModePerm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harche You do not even need to check if the directory exists before calling MkdirAll. From the docs, "If path is already a directory, MkdirAll does nothing and returns nil."

mount.go Outdated
var err error
switch fsType {
case typeTmpFs, type9pFs:
if err := createDestinationDir(destination); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harche Are you not already creating the mount destination directory in the "ephemeral" storage handler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will update the PR. Thanks.

Copy link
Member

@amshinde amshinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@amshinde
Copy link
Member

@harche CI is failing,can you take a look so that this can be merged.

@sboeuf
Copy link

sboeuf commented May 25, 2018

@harche please fix the unit tests.

Ephemeral volumes needs to mounted inside VM
and not passed as 9pfs mounts.

Fixes : kata-containers#235

Signed-off-by: Harshal Patil <[email protected]>
@harche
Copy link
Contributor Author

harche commented May 28, 2018

@sboeuf @amshinde done.

@sboeuf sboeuf merged commit f6db83c into kata-containers:master May 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants