Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Velero+restic backup gets stuck when NFS persistent volume is included #2721

Closed
guimenezes opened this issue Jul 18, 2020 · 25 comments
Closed
Assignees
Labels
Needs info Waiting for information Needs investigation Restic Relates to the restic integration

Comments

@guimenezes
Copy link

guimenezes commented Jul 18, 2020

What steps did you take and what happened:
I am testing Velero to backup my application.

In a minikube deployment, I followed the basic mysql + wordpress tutorial here:
https://kubernetes.io/docs/tutorials/stateful-application/mysql-wordpress-persistent-volume/

I setup an NFS server in a VM and changed the example to use an NFS persistent volume instead.

The mysql pod config I used is the following:

apiVersion: v1
kind: Service
metadata:
  name: wordpress-mysql
  labels:
    app: wordpress
spec:
  ports:
    - port: 3306
  selector:
    app: wordpress
    tier: mysql
  clusterIP: None
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: mysql-nfs-pv
  labels:
    app: wordpress
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: 192.168.120.177
    path: /mnt/data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-nfs-pvc
  labels:
    app: wordpress
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 20Gi
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: wordpress-mysql
  labels:
    app: wordpress
spec:
  selector:
    matchLabels:
      app: wordpress
      tier: mysql
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: wordpress
        tier: mysql
    spec:
      containers:
      - image: mysql:5.6
        name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-pass
              key: password
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-persistent-storage
          mountPath: /var/lib/mysql
      volumes:
      - name: mysql-persistent-storage
        persistentVolumeClaim:
          claimName: mysql-nfs-pvc

The example also has a wordpress-deployment.yaml file and kustomization.yaml file which I did not modify.

After deployment:

kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
hello-node-7bf657c596-xbghl        1/1     Running   0          3d
wordpress-5bbd7fd785-b87zj         1/1     Running   1          118s
wordpress-mysql-5dcc45d9f9-l4fzs   1/1     Running   0          118s
$ kubectl describe pod wordpress-mysql-5dcc45d9f9-l4fzs
Name:         wordpress-mysql-5dcc45d9f9-l4fzs
Namespace:    default
Priority:     0
Node:         minikube/192.168.120.166
Start Time:   Fri, 17 Jul 2020 17:28:47 -0700
Labels:       app=wordpress
              pod-template-hash=5dcc45d9f9
              tier=mysql
Annotations:  <none>
Status:       Running
IP:           172.17.0.6
IPs:
  IP:           172.17.0.6
Controlled By:  ReplicaSet/wordpress-mysql-5dcc45d9f9
Containers:
  mysql:
    Container ID:   docker://60ca381a170fb123f0e3851382ff39bbae12fb76cd50dee8e4ff479c1793882a
    Image:          mysql:5.6
    Image ID:       docker-pullable://mysql@sha256:19a164794d3cef15c9ac44754604fa079adb448f82d40e4b8be8381148c785fa
    Port:           3306/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 17 Jul 2020 17:28:48 -0700
    Ready:          True
    Restart Count:  0
    Environment:
      MYSQL_ROOT_PASSWORD:  <set to the key 'password' in secret 'mysql-pass-c57bb4t7mf'>  Optional: false
    Mounts:
      /var/lib/mysql from mysql-persistent-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-q77zk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  mysql-persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mysql-nfs-pvc
    ReadOnly:   false
  default-token-q77zk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-q77zk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  2m33s (x2 over 2m33s)  default-scheduler  persistentvolumeclaim "mysql-nfs-pvc" not found
  Normal   Scheduled         2m31s                  default-scheduler  Successfully assigned default/wordpress-mysql-5dcc45d9f9-l4fzs to minikube
  Normal   Pulled            2m30s                  kubelet, minikube  Container image "mysql:5.6" already present on machine
  Normal   Created           2m30s                  kubelet, minikube  Created container mysql
  Normal   Started           2m30s                  kubelet, minikube  Started container mysql

The S3 endpoint is a vanilla Minio S3 server. I installed velero with:

$ velero install      --provider aws      --plugins velero/velero-plugin-for-aws:v1.0.0      --bucket backup      --secret-file ./credentials-velero      --use-volume-snapshots=false      --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://172.27.254.245:9000

Started a backup (without annotation) and things work perfectly for metadata:

$ velero backup create mysql-backup-no-nfs
$ velero backup describe mysql-backup-no-nfs
Name:         mysql-backup-no-nfs
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.18.3
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=18

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-07-17 17:30:02 -0700 PDT
Completed:  2020-07-17 17:30:09 -0700 PDT

Expiration:  2020-08-16 17:30:02 -0700 PDT

Total items to be backed up:  427
Items backed up:              427

Velero-Native Snapshots: <none included>

However after adding the annotation the backup gets stuck:

$ kubectl -n default annotate pod/wordpress-mysql-5dcc45d9f9-l4fzs backup.velero.io/backup-volumes=mysql-persistent-storage
pod/wordpress-mysql-5dcc45d9f9-l4fzs annotated
$ velero backup create mysql-backup-with-nfs
Backup request "mysql-backup-with-nfs" submitted successfully.
$ velero backup describe mysql-backup-with-nfs
Name:         mysql-backup-with-nfs
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.18.3
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=18

Phase:  InProgress

Errors:    0
Warnings:  0

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-07-17 17:33:38 -0700 PDT
Completed:  <n/a>

Expiration:  2020-08-16 17:33:38 -0700 PDT

Estimated total items to be backed up:  410
Items backed up so far:                 4

Velero-Native Snapshots: <none included>

Restic Backups (specify --details for more information):
  New:  1

These are the last messages in the log:

$ kubectl logs deployment/velero -n velero
...
time="2020-07-18T00:33:40Z" level=info msg="Backed up 4 items out of an estimated total of 410 (estimate will change throughout the backup)" backup=velero/mysql-backup-with-nfs logSource="pkg/backup/backup.go:411" name=wordpress-5bbd7fd785-b87zj namespace=default progress= resource=pods
time="2020-07-18T00:33:40Z" level=info msg="Processing item" backup=velero/mysql-backup-with-nfs logSource="pkg/backup/backup.go:371" name=wordpress-mysql-5dcc45d9f9-l4fzs namespace=default progress= resource=pods
time="2020-07-18T00:33:40Z" level=info msg="Backing up item" backup=velero/mysql-backup-with-nfs logSource="pkg/backup/item_backupper.go:120" name=wordpress-mysql-5dcc45d9f9-l4fzs namespace=default resource=pods
time="2020-07-18T00:33:40Z" level=info msg="Executing custom action" backup=velero/mysql-backup-with-nfs logSource="pkg/backup/item_backupper.go:326" name=wordpress-mysql-5dcc45d9f9-l4fzs namespace=default resource=pods
time="2020-07-18T00:33:40Z" level=info msg="Executing podAction" backup=velero/mysql-backup-with-nfs cmd=/velero logSource="pkg/backup/pod_action.go:51" pluginName=velero
time="2020-07-18T00:33:40Z" level=info msg="Adding pvc mysql-nfs-pvc to additionalItems" backup=velero/mysql-backup-with-nfs cmd=/velero logSource="pkg/backup/pod_action.go:67" pluginName=velero
time="2020-07-18T00:33:40Z" level=info msg="Done executing podAction" backup=velero/mysql-backup-with-nfs cmd=/velero logSource="pkg/backup/pod_action.go:77" pluginName=velero
time="2020-07-18T00:33:40Z" level=info msg="Backing up item" backup=velero/mysql-backup-with-nfs logSource="pkg/backup/item_backupper.go:120" name=mysql-nfs-pvc namespace=default resource=persistentvolumeclaims
time="2020-07-18T00:33:40Z" level=info msg="Executing custom action" backup=velero/mysql-backup-with-nfs logSource="pkg/backup/item_backupper.go:326" name=mysql-nfs-pvc namespace=default resource=persistentvolumeclaims
time="2020-07-18T00:33:40Z" level=info msg="Executing PVCAction" backup=velero/mysql-backup-with-nfs cmd=/velero logSource="pkg/backup/backup_pv_action.go:49" pluginName=velero
time="2020-07-18T00:33:40Z" level=info msg="Backing up item" backup=velero/mysql-backup-with-nfs logSource="pkg/backup/item_backupper.go:120" name=mysql-nfs-pv namespace= resource=persistentvolumes
time="2020-07-18T00:33:40Z" level=info msg="Executing takePVSnapshot" backup=velero/mysql-backup-with-nfs logSource="pkg/backup/item_backupper.go:404" name=mysql-nfs-pv namespace= resource=persistentvolumes
time="2020-07-18T00:33:40Z" level=info msg="Skipping snapshot of persistent volume because volume is being backed up with restic." backup=velero/mysql-backup-with-nfs logSource="pkg/backup/item_backupper.go:422" name=mysql-nfs-pv namespace= persistentVolume=mysql-nfs-pv resource=persistentvolumes
time="2020-07-18T00:33:40Z" level=info msg="Initializing restic repository" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:155" name=default-default-8rhfs namespace=velero

I enabled debug logs like described in https://velero.io/docs/master/troubleshooting/.
Also re-deployed to clean state and pod name changed to wordpress-mysql-5dcc45d9f9-2zgkn.

$ date; velero backup create mysql-backup-with-nfs2
Fri Jul 17 17:40:48 PDT 2020
Backup request "mysql-backup-with-nfs2" submitted successfully.
Run `velero backup describe mysql-backup-with-nfs2` or `velero backup logs mysql-backup-with-nfs2` for more details.
$ kubectl logs deployment/velero -n velero
...
time="2020-07-18T00:40:50Z" level=info msg="Skipping snapshot of persistent volume because volume is being backed up with restic." backup=velero/mysql-backup-with-nfs2 logSource="pkg/backup/item_backupper.go:422" name=mysql-nfs-pv namespace= persistentVolume=mysql-nfs-pv resource=persistentvolumes
time="2020-07-18T00:40:50Z" level=debug msg="Executing post hooks" backup=velero/mysql-backup-with-nfs2 logSource="pkg/backup/item_backupper.go:204" name=mysql-nfs-pv namespace= resource=persistentvolumes
time="2020-07-18T00:40:50Z" level=debug msg="Resource persistentvolumes/mysql-nfs-pv, version= v1, preferredVersion=v1" backup=velero/mysql-backup-with-nfs2 logSource="pkg/backup/item_backupper.go:255" name=mysql-nfs-pv namespace= resource=persistentvolumes
time="2020-07-18T00:40:50Z" level=debug msg="Skipping action because it does not apply to this resource" backup=velero/mysql-backup-with-nfs2 logSource="pkg/backup/item_backupper.go:307" name=mysql-nfs-pvc namespace=default resource=persistentvolumeclaims
time="2020-07-18T00:40:50Z" level=debug msg="Executing post hooks" backup=velero/mysql-backup-with-nfs2 logSource="pkg/backup/item_backupper.go:204" name=mysql-nfs-pvc namespace=default resource=persistentvolumeclaims
time="2020-07-18T00:40:50Z" level=debug msg="Resource persistentvolumeclaims/mysql-nfs-pvc, version= v1, preferredVersion=v1" backup=velero/mysql-backup-with-nfs2 logSource="pkg/backup/item_backupper.go:255" name=mysql-nfs-pvc namespace=default resource=persistentvolumeclaims
time="2020-07-18T00:40:50Z" level=debug msg="Skipping action because it does not apply to this resource" backup=velero/mysql-backup-with-nfs2 logSource="pkg/backup/item_backupper.go:307" name=wordpress-mysql-5dcc45d9f9-2zgkn namespace=default resource=pods
time="2020-07-18T00:40:50Z" level=debug msg="Skipping action because it does not apply to this resource" backup=velero/mysql-backup-with-nfs2 logSource="pkg/backup/item_backupper.go:307" name=wordpress-mysql-5dcc45d9f9-2zgkn namespace=default resource=pods
time="2020-07-18T00:40:50Z" level=debug msg="Acquiring lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:122" volumeNamespace=default
time="2020-07-18T00:40:50Z" level=debug msg="Acquired lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:131" volumeNamespace=default
time="2020-07-18T00:40:50Z" level=debug msg="No repository found, creating one" backupLocation=default logSource="pkg/restic/repository_ensurer.go:151" volumeNamespace=default
time="2020-07-18T00:40:50Z" level=debug msg="Running processQueueItem" controller=restic-repository key=velero/default-default-tbrht logSource="pkg/controller/restic_repository_controller.go:110"
time="2020-07-18T00:40:50Z" level=info msg="Initializing restic repository" controller=restic-repository logSource="pkg/controller/restic_repository_controller.go:155" name=default-default-tbrht namespace=velero
time="2020-07-18T00:40:51Z" level=debug msg="Backup has not expired yet, skipping" backup=velero/mysql-backup-with-nfs2 controller=gc-controller expiration="2020-08-17 00:40:49 +0000 UTC" logSource="pkg/controller/gc_controller.go:127"
time="2020-07-18T00:40:51Z" level=debug msg="Ran restic command" command="restic snapshots --repo=s3:http://172.27.254.245:9000/backup/restic/default --password-file=/tmp/velero-restic-credentials-default827864671 --cache-dir=/scratch/.cache/restic --last" logSource="pkg/restic/repository_manager.go:291" repository=default stderr= stdout="created new cache in /scratch/.cache/restic\n"
time="2020-07-18T00:40:51Z" level=debug msg="Released lock" backupLocation=default logSource="pkg/restic/repository_ensurer.go:128" volumeNamespace=default
time="2020-07-18T00:41:12Z" level=debug msg="Checking for existing backup storage locations to sync into cluster" controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:124"
time="2020-07-18T00:41:12Z" level=debug msg="Checking if backups need to be synced at this time for this location" backupLocation=default controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:156"
time="2020-07-18T00:41:12Z" level=debug msg="Checking backup location for backups to sync into cluster" backupLocation=default controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:163"
time="2020-07-18T00:41:12Z" level=debug msg="looking for plugin in registry" controller=backup-sync kind=ObjectStore logSource="pkg/plugin/clientmgmt/manager.go:99" name=velero.io/aws
time="2020-07-18T00:41:12Z" level=debug msg="creating new restartable plugin process" command=/plugins/velero-plugin-for-aws controller=backup-sync kind=ObjectStore logSource="pkg/plugin/clientmgmt/manager.go:114" name=velero.io/aws
time="2020-07-18T00:41:12Z" level=debug msg="starting plugin" args="[/plugins/velero-plugin-for-aws --log-level debug --features ]" cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/logrus_adapter.go:74" path=/plugins/velero-plugin-for-aws
time="2020-07-18T00:41:12Z" level=debug msg="plugin started" cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/logrus_adapter.go:74" path=/plugins/velero-plugin-for-aws pid=88
time="2020-07-18T00:41:12Z" level=debug msg="waiting for RPC address" cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/logrus_adapter.go:74" path=/plugins/velero-plugin-for-aws
time="2020-07-18T00:41:12Z" level=debug msg="plugin address" address=/tmp/plugin150964435 cmd=/plugins/velero-plugin-for-aws controller=backup-sync logSource="pkg/plugin/clientmgmt/logrus_adapter.go:74" network=unix pluginName=velero-plugin-for-aws
...

Full output is here: velero.txt

What did you expect to happen:
Backup should either fail or succeed.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
    velero.txt

  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml

$ velero backup describe mysql-backup-with-nfs2
Name:         mysql-backup-with-nfs2
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.18.3
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=18

Phase:  InProgress

Errors:    0
Warnings:  0

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-07-17 17:40:49 -0700 PDT
Completed:  <n/a>

Expiration:  2020-08-16 17:40:49 -0700 PDT

Estimated total items to be backed up:  443
Items backed up so far:                 4

Velero-Native Snapshots: <none included>

Restic Backups (specify --details for more information):
  New:  1
  • velero backup logs <backupname>
$ velero backup logs mysql-backup-with-nfs2
Logs for backup "mysql-backup-with-nfs2" are not available until it's finished processing. Please wait until the backup has a phase of Completed or Failed and try again.
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
    NA
  • velero restore logs <restorename>
    NA

Anything else you would like to add:
NA

Environment:

  • Velero version (use velero version):
$ velero version
Client:
	Version: v1.4.2
	Git commit: -
Server:
	Version: v1.4.2
  • Velero features (use velero client config get features):
$ velero client config get features
features: <NOT SET>
  • Kubernetes version (use kubectl version):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-06-26T03:47:41Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.3", GitCommit:"2e7996e3e2712684bc73f0dec0200d64eec7fe40", GitTreeState:"clean", BuildDate:"2020-05-20T12:43:34Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: minikube (vanilla)
  • Cloud provider or hardware configuration: MacOs
  • OS (e.g. from /etc/os-release): MacOs

THANKS!!

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@reddiablo85
Copy link

Hi @guimenezes,
When running the velero install cmd you need to tell it to install restic also with the flag "--use-restic". Without this velero doesn´t install the restic element and even though you annotate the workload it doesn´t have the tools to do the pvc backup

Give it a go and run another backup then run "velero describe backup --details"...at the bottom you should see your pvc name backup

@guimenezes
Copy link
Author

Hi, thanks for your response.

This is my installation command now:
velero install --use-restic --provider aws --plugins velero/velero-plugin-for-aws:v1.0.0 --bucket backup --secret-file ./credentials-velero --use-volume-snapshots=false --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://172.27.254.245:9000

However it seems like the backup is stuck in "New" state now:

Name:         mysql-backup-with-restic
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  New

Errors:    0
Warnings:  0

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  0

Started:    <n/a>
Completed:  <n/a>

Expiration:  <nil>

Velero-Native Snapshots: <none included> 

Pods:
kubectl get pods

NAME                               READY   STATUS    RESTARTS   AGE
hello-node-7bf657c596-xbghl        1/1     Running   0          5d16h
wordpress-5bbd7fd785-4l6vw         1/1     Running   0          7m36s
wordpress-mysql-5dcc45d9f9-nntmc   1/1     Running   0          7m36s

kubectl get pods -n velero

NAME                     READY   STATUS    RESTARTS   AGE
restic-w8bjc             1/1     Running   0          3m38s
velero-8dc7498d9-ghnzp   1/1     Running   1          3m38s

kubectl logs deployment/velero -n velero

time="2020-07-20T17:04:36Z" level=info msg="setting log-level to INFO" logSource="pkg/cmd/server/server.go:177"
time="2020-07-20T17:04:36Z" level=info msg="Starting Velero server v1.4.2 (56a08a4d695d893f0863f697c2f926e27d70c0c5)" logSource="pkg/cmd/server/server.go:179"
time="2020-07-20T17:04:36Z" level=info msg="1 feature flags enabled []" logSource="pkg/cmd/server/server.go:181"
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/crd-remap-version
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pod
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pv
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/service-account
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/add-pv-from-pvc
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/add-pvc-from-pod
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/change-pvc-node-selector
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/change-storage-class
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/cluster-role-bindings
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/crd-preserve-fields
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/job
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pod
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/restic
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/role-bindings
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/service
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/service-account
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-aws kind=VolumeSnapshotter logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/aws
time="2020-07-20T17:04:36Z" level=info msg="registering plugin" command=/plugins/velero-plugin-for-aws kind=ObjectStore logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/aws
time="2020-07-20T17:04:36Z" level=info msg="Checking existence of namespace" logSource="pkg/cmd/server/server.go:361" namespace=velero
time="2020-07-20T17:04:36Z" level=info msg="Namespace exists" logSource="pkg/cmd/server/server.go:367" namespace=velero
time="2020-07-20T17:04:39Z" level=info msg="Checking existence of Velero custom resource definitions" logSource="pkg/cmd/server/server.go:396"
time="2020-07-20T17:04:42Z" level=info msg="All Velero custom resource definitions exist" logSource="pkg/cmd/server/server.go:430"
time="2020-07-20T17:04:42Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:437"

kubectl logs restic-w8bjc -n velero

$ kubectl logs restic-w8bjc -n velero
time="2020-07-20T17:00:11Z" level=info msg="Setting log-level to INFO"
time="2020-07-20T17:00:11Z" level=info msg="Starting Velero restic server v1.4.2 (56a08a4d695d893f0863f697c2f926e27d70c0c5)" logSource="pkg/cmd/cli/restic/server.go:62"
time="2020-07-20T17:00:11Z" level=info msg="Starting controllers" logSource="pkg/cmd/cli/restic/server.go:156"
time="2020-07-20T17:00:11Z" level=info msg="Controllers started successfully" logSource="pkg/cmd/cli/restic/server.go:199"
time="2020-07-20T17:00:11Z" level=info msg="Starting controller" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:76"
time="2020-07-20T17:00:11Z" level=info msg="Waiting for caches to sync" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:81"
time="2020-07-20T17:00:11Z" level=info msg="Starting controller" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:76"
time="2020-07-20T17:00:11Z" level=info msg="Waiting for caches to sync" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:81"
time="2020-07-20T17:00:11Z" level=info msg="Caches are synced" controller=pod-volume-restore logSource="pkg/controller/generic_controller.go:85"
time="2020-07-20T17:00:11Z" level=info msg="Caches are synced" controller=pod-volume-backup logSource="pkg/controller/generic_controller.go:85"

Any known issue that rings a bell?

Please let me know what other information to get. I am sure I am missing something here.

@reddiablo85
Copy link

I still think there are issues with your install command, specifically the s3 parameters section. Here´s a blanked version of mine, note the use of quotations around the s3 parameters and also the region setting. I use a cert for backups over https but you don´t need that if you´re going over http also I´m using an internal registry, hence the repo flags, something else you don´t need.

velero install
--image dtr01.abc.com/velero/velero:v1.4.2 \ ##not necessary for you
--provider aws
--plugins dtr01.abc.com:5000/velero/velero-plugin-for-aws:v1.1.0
--bucket velero-eu02
--prefix prod01
--backup-location-config region="us-east-1",s3ForcePathStyle="true",s3Url="https://s3-prd.abc.com:10443"
--snapshot-location-config region="us-east-1"
--secret-file /root/rancherlab/Velero/crds_prd
--cacert /root/certs/ca_root_bundle.crt
--use-restic

@guimenezes
Copy link
Author

Thanks.

Honestly I believe the s3 configuration is fine since I can back up other resources if the pod is not annotated with the NFS PVC.

I was able though to do some progress. After ssh'ing into minikube VM I listed the contents of /var/lib/kubelet/pods/ to see if I could access the NFS mount point manually. Seems like my commands get stuck when trying to contact the NFS server from inside the minikube VM:

ioctl(1, TIOCGWINSZ, {ws_row=25, ws_col=80, ws_xpixel=640, ws_ypixel=200}) = 0
brk(NULL)                               = 0x120a000
brk(0x122b000)                          = 0x122b000
stat("/mnt/sda1/var/lib/kubelet/pods/bf768f02-294d-489d-872e-e9545f4fee42/volumes/kubernetes.io~nfs/mysql-nfs-pv", ^C^X^C^C^Z
[1]+  Stopped                 strace ls /mnt/sda1/var/lib/kubelet/pods/bf768f02-294d-489d-872e-e9545f4fee42/volumes/kubernetes.io~nfs/mysql-nfs-pv

However, I do see that the actual Mysql pod can normally access it to write its data. This is the NFS partition on the NFS server:

ubuntu@ubuntu:~$ ls /mnt/data/
auto.cnf  ibdata1  ib_logfile0  ib_logfile1  mysql  performance_schema  wordpress

So it sounds like restic is stuck while trying to access this NFS mounted directory inside the minikube VM. I believe restic would have exactly the same problem. Unless I am missing something.

I am not sure exactly why this happens. The NFS server is accessible and visible from inside the Mysql container.

@ashish-amarnath
Copy link
Member

@guimenezes If I am understanding your issue correctly, you are able to backup and restore your applications that don't use the NFS PVs. But trying to backup applications that are using NFS PVs, the backup does not make progress and the backup is stuck in New Phase.
Can you please share the output of kubectl -n velero get podvolumebackups?

If the Backup is stuck in New state, then it appears that the backup controller is not processing any backups.

@ashish-amarnath ashish-amarnath added the Needs info Waiting for information label Aug 4, 2020
@marco-seramondi
Copy link

marco-seramondi commented Aug 6, 2020

Hello there the same things is happening to my openshift 4.3 version. Velero and restic is installed and running properly.

kubectl get pod -n velero
NAME READY STATUS RESTARTS AGE
restic-2zvfr 1/1 Running 0 6d22h
restic-72lhw 1/1 Running 0 6d22h
restic-7vzsh 1/1 Running 0 6d22h
restic-8t24h 1/1 Running 0 6d22h
restic-f6m5r 1/1 Running 0 6d22h
restic-f9r47 1/1 Running 0 6d22h
restic-h7nnh 1/1 Running 0 6d22h
restic-lf65l 1/1 Running 0 6d22h
restic-mqfg9 1/1 Running 0 6d22h
restic-p6hwj 1/1 Running 0 6d22h
restic-sgjhj 1/1 Running 0 6d22h
velero-7d5db6bd4f-tsx28 1/1 Running 0 13d

I get this error if I check for errors in restic pod
kubectl logs restic-h7nnh -n velero | grep error time="2020-08-03T10:24:39Z" level=error msg="Error running command=restic backup --repo=s3:s3-u-west-2.amazonaws.com/velero-backup/restic/test4 --password-file=/tmp/velero-restic-credentials-test5556019 --cache-dir=/scratch/.cache/restic . --tag=pod-uid=934c-433a-b810-05f1ffb8 --tag=volume=cfg --tag=backup=help --tag=backup-uid=47c2-b322-589a09f967df --tag=ns=test4 --tag=pod=jarvis-api-2466m --host=velero --json, stdout=, stderr=Fatal: unable to open config file: Stat: The specified key does not exist.\nIs there a repository at the following location?\ns3:s3-u-west-2.amazonaws.com/velero-backup/restic/test4\n" backup=velero/help controller=pod-volume-backup error="unable to find summary in restic backup command output"

`velero backup describe 000-test4-okd-nprod --details
Name: 000-test4-okd-nprod
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.16.2
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=16+

Phase: PartiallyFailed (run velero backup logs 000-test4-okd-nprod for more information)

Errors: 6
Warnings: 0

Namespaces:
Included: test4
Excluded:

Resources:
Included: *
Excluded:
Cluster-scoped: auto

Label selector:

Storage Location: default

Velero-Native Snapshot PVs: auto

TTL: 720h0m0s

Hooks:

Backup Format Version: 1

Started: 2020-08-06 11:19:12 +0000 UTC
Completed: 2020-08-06 11:19:30 +0000 UTC

Expiration: 2020-09-05 11:19:12 +0000 UTC

Total items to be backed up: 176
Items backed up: 176

Resource List:
apps/v1/ControllerRevision:
- test4/logstash-65986dc4fb
- test4/logstash-85894fbfc
- test4/redis-6b475c8b5f
- test4/redis-6f8c784ddf
apps/v1/Deployment:
- test4/api-node
- test4/web-ui
- test4/worker
apps/v1/ReplicaSet:
- test4/api-node-65d6b9dcb7
- test4/web-ui-575ccf9b95
- test4/worker-77fc978c87
apps/v1/StatefulSet:
- test4/postgres-postgresql
- test4/redis
authorization.openshift.io/v1/RoleBinding:
- test4/admin
- test4/rb-test4
batch/v1/Job:
- test4/reconcile-equity-1596676200
batch/v1beta1/CronJob:
- test4/merge-equity
extensions/v1beta1/Ingress:
- test4/test4-wpl
networking.k8s.io/v1/NetworkPolicy:
- test4/jvs-access
networking.k8s.io/v1beta1/Ingress:
- test4/test4-wpl
- test4/test4-wpl-internal
rbac.authorization.k8s.io/v1/RoleBinding:
- test4/admin
- test4/rb-test4
- test4/system:deployers
route.openshift.io/v1/Route:
- test4/test4-wpl-6j6mz
- test4/test4-wpl-6rp42
- test4/test4-wpl-z7zvj
v1/ConfigMap:
- test4/cj-patch
- test4/k8s-cfg
- test4/logstash-patterns
v1/Endpoints:
- test4/api-node-svc
- test4/tezos-sig
- test4/wpl-svc
- test4/wpl-ui
v1/Namespace:
- test4
v1/PersistentVolume:
- pvc-139781c1-b0e5-457-8e09-d32b800d434
- pvc-4208bc0b-770-4a70-be62-11d484b3b2d
- pvc-52ddce34-e4a-42f-bd63-8c35bfeb3ad
- pvc-54e1f97-a4a-4ed9-b2e2-aba001cd7c
- pvc-62cab2c-cce3-480-b3cd-dcddec2db
- test4/data-logstash-0
- test4/redis-data-redis-0
- test4/shared-logs
- test4/tezos-logs
- test4/var-test4
v1/Pod:
- test4/console-ui-b9c4f5794-2xq55
- test4/mongo-test4-mongodb-primary-0
- test4/mongo-test4-mongodb-secondary-0
- test4/postgres-postgresql-0
- test4/worker-77fc978c87-zddgm
v1/Secret:
- test4/builder-dockercfg-bn97k
- test4/builder-token-4jq6m
- test4/builder-token-4ttrj
v1/Service:
- test4/api-node-svc
- test4/tezos-api-svc
v1/ServiceAccount:
- test4/builder
- test4/default
- test4/deployer

Velero-Native Snapshots:

Restic Backups:
Failed:
test4/web-platform-5cf78f6b5-c6bm: certs, logs, run-sh, varfile, varshared`

Can someone give me some guidance? It seems that velero + restic can not backup the aws efs yet. Please comment here asap.

@ashish-amarnath
Copy link
Member

Sorry that you are having issues with this.
I've seen a couple of recent issues using restic and NFS as object storage. I don't know why this isn't working as expected. We'll have to investigate this further.
In the meantime, are you able to test this out without using custom ca-cert?

Related #2789

@cloudcafetech
Copy link

cloudcafetech commented Aug 7, 2020

I use automatic annotation (https://github.com/duyanghao/velero-volume-controller) with NFS. which works fine with NFS PVs.

But not tested with minikube, my setup with kubeadm.

@marco-seramondi
Copy link

marco-seramondi commented Aug 7, 2020 via email

@guimenezes
Copy link
Author

@ashish-amarnath in the original issue the backup gets stuck "In Progress". Unfortunately I don't have the setup anymore but I can reproduce it again as soon as I get some time.

Also in the original issue when looking inside the minikube VM I get into the same stuck condition by just ls'ing the NFS mount point. So I think it is likely restic was also getting stuck in the same way.

I am in the process of deploying Kubernetes in EKS and I can verify if this happens or not there as well.

I suspect (hopefully) this may be just a minikube issue, but let's see.

@dbgoytia
Copy link

I think that this is a Velero issue. I am having the same issue on a cluster that was installed using kubeadm + NFS server. By the way @guimenezes , the installation you did on Velero was just fine. The issue here is that I think Velero doesn't support NFS migrations. Is that correct?

@dbgoytia
Copy link

@guimenezes Related #1229 Check this one out.

@cloudcafetech
Copy link

kubeadm + NFS server + Minio without cacert works fine.

what I faced with cacert "restic repository is not ready" error

@dbgoytia
Copy link

@cloudcafetech that's interesting. May I know how you managed to fix the issue of the cacert error? Thank you in advance.

@cloudcafetech
Copy link

I am not using cacert, for that I use Minio setup with http not https

kubeadm + NFS server + Minio without cacert

@ashish-amarnath ashish-amarnath added Needs investigation Restic Relates to the restic integration labels Aug 24, 2020
@zubron zubron assigned zubron and unassigned zubron Oct 21, 2020
@nrb nrb self-assigned this Oct 23, 2020
@nrb
Copy link
Contributor

nrb commented Oct 23, 2020

@ironknight78 @cloudcafetech @guimenezes

I'm curious, do you know what the hostPath on the PVs that restic is trying to mount? Our restic support uses the hostPath for volumes on a given node, and I'm wondering if some NFS implementations provide a hostPath mounting that is incompatible with what we're trying to do generically.

How is the NFS volume mounted into the pod in each of your cases? Can I see the YAML of a pod?

@carlisia
Copy link
Contributor

This issue has been inactive for a while so I'm closing it.

@pixlcrashr
Copy link

pixlcrashr commented Dec 8, 2020

This issue has been inactive for a while so I'm closing it.

Since when closing an issue for inactivity is a proper solution to the issue itself? The problem is still there. This issue even got tagged "needs-investigation" and you close it due to a meaningless reason? "Standards".

I mean, if there would be a reason like "closing due to a duplicate" or "we dont have time and dont want to waste time and effort (and ofc money)" on this, it would be understandable.
But inactivity - Really?

Can be opened again.

@osaffer
Copy link

osaffer commented Feb 2, 2021

Did you find the problem?

Did you see any error in logs from nfs server side?

Did rpcbind start?

@pixlcrashr
Copy link

pixlcrashr commented Feb 3, 2021

Did you find the problem?

Did you see any error in logs from nfs server side?

Did rpcbind start?

If it would work, I wouldn't suggest to reopen it again.

The backup gets stuck, as the title of this issue states and I therefore do not get any errors in both the NFS server and velero.
The only thing that is important to note is that some volumes are re-scheduled to be backed up again.

@marco-seramondi
Copy link

marco-seramondi commented Feb 3, 2021 via email

@fsz285
Copy link

fsz285 commented Feb 10, 2021

Having the same issue. Restic gets stuck when trying to back up an nfs-pv

@marco-seramondi
Copy link

marco-seramondi commented Feb 11, 2021 via email

@fsz285
Copy link

fsz285 commented Feb 12, 2021

see #3450

@MnrGreg
Copy link

MnrGreg commented Feb 13, 2021

This might be an implementation specific issue. Velero and Restic able to successfully backup & restore kube-nfs-client-provisioner NFS persistent volumes.

nfs-client-provisioner: v3.1.0-k8s1.11
velero: v1.5.2

$ velero backup describe ecommerce-etcd-qa-02-13-2021  --details
Name:         ecommerce-etcd-qa-02-13-2021
Namespace:    kube-velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.17.6-docker-1
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=17+

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
  Included:  ecommerce-etcd-qa
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2021-02-13 07:38:21 -0800 PST
Completed:  2021-02-13 07:41:31 -0800 PST

Expiration:  2021-03-15 08:38:21 -0700 PDT

Total items to be backed up:  38
Items backed up:              38

Resource List:
...
  v1/PersistentVolume:
    - pvc-192e7963-a024-4c53-80c6-21552b30ee91
    - pvc-41b92c87-1a8c-413e-aae8-5d6c1ef0a2b2
    - pvc-685729c3-d6a9-40e1-be57-1455b6cf9780
$ kubectl describe pv pvc-192e7963-a024-4c53-80c6-21552b30ee91
Name:            pvc-192e7963-a024-4c53-80c6-21552b30ee91
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: fuseim.pri/ifs
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    managed-nfs-storage
Status:          Bound
Claim:           ecommerce-etcd-qa/data-etcd-1
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        8Gi
Node Affinity:   <none>
Message:
Source:
    Type:      NFS (an NFS mount that lasts the lifetime of a pod)
    Server:    redacted
    Path:      /redacted/ecommerce-etcd-qa-data-etcd-1-pvc-192e7963-a024-4c53-80c6-21552b30ee91
    ReadOnly:  false
Events:        <none>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs info Waiting for information Needs investigation Restic Relates to the restic integration
Projects
None yet
Development

No branches or pull requests