Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datavolume is failed occasionally when clone a datavolume from a pvc for concurrency #3259

Closed
mengyu1987 opened this issue May 14, 2024 · 12 comments
Labels

Comments

@mengyu1987
Copy link

What happened:
A clear and concise description of what the bug is.

  1. clone a dv from a pvc of another namespace

clone
2. clone 20 dvs for concurrency
clone-batch

  1. 2 dvs are failed
    failed

What you expected to happen:
A clear and concise description of what you expected to happen.

all dvs are success.

How to reproduce it (as minimally and precisely as possible):
Steps to reproduce the behavior.

Additional context:
Add any other context about the problem here.

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml): 1.58.1
  • Kubernetes version (use kubectl version): N/A
  • DV specification: N/A
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Install tools: N/A
  • Others: N/A
@awels
Copy link
Member

awels commented May 15, 2024

@mhenriks Any thoughts on this? Looks like something is going wrong with the namespace transfer since we are seeing the lost claim?

@mhenriks
Copy link
Member

mhenriks commented May 16, 2024

@mengyu1987 what version of CDI is this?

EDIT I see 1.58

@mhenriks
Copy link
Member

@awels lost claims are expected but should be deleted

@awels
Copy link
Member

awels commented May 16, 2024

The description says it is CDI 1.58.1

@mengyu1987
Copy link
Author

I have the same issue using version 1.59.0 of CDI. Please note that when the source PVC and cloned DV are in the same namespace, the issue does not occur.

@mhenriks
Copy link
Member

mhenriks commented May 17, 2024

@mengyu1987 I was unable to reproduce with kubevirtci with:
KUBEVIRT_PROVIDER=k8s-1.28
KUBEVIRT_STORAGE=rook-ceph-default
KUBEVIRT_NUM_NODES=2

Can you post the following:
Lost PVC/DV yamls
Events on lost PVC/DV
StorageClass yaml

Any relevant errors in cdi-deployment log?

@mengyu1987
Copy link
Author

@mhenriks Thanks for your reply. I just can reproduce when the count of cloned dvs is more than 20 for concurrency. My k8s cluster with:
KUBEVIRT_PROVIDER=k8s-1.27.6
KUBEVIRT_STORAGE=rook-ceph.rbd.csi
KUBEVIRT_NUM_NODES=3

the yaml follows:

failed DV yaml:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  annotations:
    cdi.kubevirt.io/cloneType: csi-clone
    cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MTYxODQ5OTQsImlhdCI6MTcxNjE4NDY5NCwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctMzlmNXkydG0iLCJuYW1lc3BhY2UiOiJkZWZhdWx0IiwibmJmIjoxNzE2MTg0Njk0LCJvcGVyYXRpb24iOiJDbG9uZSIsInBhcmFtcyI6eyJ0YXJnZXROYW1lIjoiYXppbXYiLCJ0YXJnZXROYW1lc3BhY2UiOiJ2bS00ZmMwYWU3MyJ9LCJyZXNvdXJjZSI6eyJncm91cCI6IiIsInJlc291cmNlIjoicGVyc2lzdGVudHZvbHVtZWNsYWltcyIsInZlcnNpb24iOiJ2MSJ9fQ.N1rHKXOtPyY9qjk6kpawAUihTbBFnDNqUBvYucw5R7RNaWDh-mwbCX9AUgKW84l5-36avAUEesawGAHnU-XyAqT-p69OCxlsOlRMfDN8wHNJe3T05kiaDH2R-T_shTPRWpzypoUO4L9OsoRhrQuZ2sW9PruyOavhhRLTgyixIw7GFzbeiZzAPi_rudMZAEVv0KOK3v1A951v6n13MJkHkANh5spO7qYJIAPeg3JLzW8m7bBviLNGBR9wkMepN1VuIxpobJxty0aUKluCgTqRAHJccY9q9-m0yvPY0jLG2JT3A1foiuwdiT8MJBycxfep3zwwvfsiud9HaSuQWDDrUA
    cdi.kubevirt.io/storage.extended.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjIwMzE1NDQ2OTUsImlhdCI6MTcxNjE4NDY5NSwiaXNzIjoiY2RpLWRlcGxveW1lbnQiLCJuYW1lIjoiaW1nLTM5ZjV5MnRtIiwibmFtZXNwYWNlIjoiZGVmYXVsdCIsIm5iZiI6MTcxNjE4NDY5NSwib3BlcmF0aW9uIjoiQ2xvbmUiLCJwYXJhbXMiOnsidGFyZ2V0TmFtZSI6ImF6aW12IiwidGFyZ2V0TmFtZXNwYWNlIjoidm0tNGZjMGFlNzMiLCJ1aWQiOiJhNWE1NzJiNS1lNWQ1LTQ4MDktYTQzYy0wYzZlODQ0MTE4YjkifSwicmVzb3VyY2UiOnsiZ3JvdXAiOiIiLCJyZXNvdXJjZSI6InBlcnNpc3RlbnR2b2x1bWVjbGFpbXMiLCJ2ZXJzaW9uIjoidjEifX0.g5NNJ8K9oS1EJAWDQ2Kj4zeQxWIwPluI7pv_G5vDRbDS6LsoUQDoBKGS_VlqHi6kw96GQnov5wgemRWN6onUJWuKrCmXSVhz9Mk9hf9Y-AwEwfvNBcyKZ5OPiVHbKRhA2fslIwW4G_ZT9xCpzDiTjU5LZ3vi-MFhWMKP230xDij7SkJCA7wp6sQT9tM1oewsbPS9Tmf6wX86a_QzGVvldSu7qg_5o4LN0QVV68veoD1xTWDJO0gtwcOBkyL6y1pNR26KVDIGi2p3R7Q8mByoQBpASyfqxnyq12RxSADqm_iRmwBI5Y-EY3Y9UMfFxc40-kh1gjRpKsGmytphoc2BZg
    cdi.kubevirt.io/storage.usePopulator: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"azimv","namespace":"vm-4fc0ae73"},"spec":{"pvc":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"ceph-block","volumeMode":"Block"},"source":{"pvc":{"name":"img-39f5y2tm","namespace":"default"}}}}
  creationTimestamp: "2024-05-20T05:58:14Z"
  generation: 1
  name: azimv
  namespace: vm-4fc0ae73
  resourceVersion: "4788828"
  uid: a5a572b5-e5d5-4809-a43c-0c6e844118b9
spec:
  pvc:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 20Gi
    storageClassName: ceph-block
    volumeMode: Block
  source:
    pvc:
      name: img-39f5y2tm
      namespace: default
status:
  claimName: azimv
  conditions:
  - lastHeartbeatTime: "2024-05-20T05:58:25Z"
    lastTransitionTime: "2024-05-20T05:58:25Z"
    message: Claim Lost
    reason: ClaimLost
    status: "False"
    type: Bound
  - lastHeartbeatTime: "2024-05-20T05:58:25Z"
    lastTransitionTime: "2024-05-20T05:58:25Z"
    status: "False"
    type: Ready
  - lastHeartbeatTime: "2024-05-20T05:58:20Z"
    lastTransitionTime: "2024-05-20T05:58:20Z"
    message: Clone Complete
    reason: Completed
    status: "False"
    type: Running
  phase: Failed
  progress: 100.0%

Lost PVC yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    cdi.kubevirt.io/clonePhase: Succeeded
    cdi.kubevirt.io/cloneType: csi-clone
    cdi.kubevirt.io/dataSourceNamespace: default
    cdi.kubevirt.io/storage.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjE3MTYxODQ5OTQsImlhdCI6MTcxNjE4NDY5NCwiaXNzIjoiY2RpLWFwaXNlcnZlciIsIm5hbWUiOiJpbWctMzlmNXkydG0iLCJuYW1lc3BhY2UiOiJkZWZhdWx0IiwibmJmIjoxNzE2MTg0Njk0LCJvcGVyYXRpb24iOiJDbG9uZSIsInBhcmFtcyI6eyJ0YXJnZXROYW1lIjoiYXppbXYiLCJ0YXJnZXROYW1lc3BhY2UiOiJ2bS00ZmMwYWU3MyJ9LCJyZXNvdXJjZSI6eyJncm91cCI6IiIsInJlc291cmNlIjoicGVyc2lzdGVudHZvbHVtZWNsYWltcyIsInZlcnNpb24iOiJ2MSJ9fQ.N1rHKXOtPyY9qjk6kpawAUihTbBFnDNqUBvYucw5R7RNaWDh-mwbCX9AUgKW84l5-36avAUEesawGAHnU-XyAqT-p69OCxlsOlRMfDN8wHNJe3T05kiaDH2R-T_shTPRWpzypoUO4L9OsoRhrQuZ2sW9PruyOavhhRLTgyixIw7GFzbeiZzAPi_rudMZAEVv0KOK3v1A951v6n13MJkHkANh5spO7qYJIAPeg3JLzW8m7bBviLNGBR9wkMepN1VuIxpobJxty0aUKluCgTqRAHJccY9q9-m0yvPY0jLG2JT3A1foiuwdiT8MJBycxfep3zwwvfsiud9HaSuQWDDrUA
    cdi.kubevirt.io/storage.condition.running: "false"
    cdi.kubevirt.io/storage.condition.running.message: Clone Complete
    cdi.kubevirt.io/storage.condition.running.reason: Completed
    cdi.kubevirt.io/storage.contentType: kubevirt
    cdi.kubevirt.io/storage.extended.clone.token: eyJhbGciOiJQUzI1NiJ9.eyJleHAiOjIwMzE1NDQ2OTYsImlhdCI6MTcxNjE4NDY5NiwiaXNzIjoiY2RpLWRlcGxveW1lbnQiLCJuYW1lIjoiaW1nLTM5ZjV5MnRtIiwibmFtZXNwYWNlIjoiZGVmYXVsdCIsIm5iZiI6MTcxNjE4NDY5Niwib3BlcmF0aW9uIjoiQ2xvbmUiLCJwYXJhbXMiOnsidGFyZ2V0TmFtZSI6ImF6aW12IiwidGFyZ2V0TmFtZXNwYWNlIjoidm0tNGZjMGFlNzMiLCJ1aWQiOiI2MjA5ZjRmNS03MzU0LTRmM2EtYjlmNC1iZjQ5NzY2M2FmMDYifSwicmVzb3VyY2UiOnsiZ3JvdXAiOiIiLCJyZXNvdXJjZSI6InBlcnNpc3RlbnR2b2x1bWVjbGFpbXMiLCJ2ZXJzaW9uIjoidjEifX0.bJv06O4R8N2pnYBWIUrU2ViKN5KxguIENjcsTyVILEZXQxdJwzecl5H6LfDehWvQpdMXI2LkXi-QhRpAJhPCYlD3nbnzjIsziThybi8qQZn20VZKaQPWiriVg1y3tqGAe42jvPGhksdtTjEMIqpiJYFGz9w4TFcblWH7-0MDuwr6TQIDVYx7pP86LuKNCqO4EscPI9nQ1ILN5HPUDX0-HuTQDQ0Frp4jHuXNTkR3BLtP1A_fObRAUFffOfZ-6qlAGfZ9GbmlULfLN98lZ-WNa3HTV3OgolgI32ohFDDk1MVl9hHDtULIcgNSMgwDKxsAT0jtP16HuvII7_jyBam7DQ
    cdi.kubevirt.io/storage.pod.restarts: "0"
    cdi.kubevirt.io/storage.populator.progress: 100.0%
    cdi.kubevirt.io/storage.preallocation.requested: "false"
    cdi.kubevirt.io/storage.usePopulator: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cdi.kubevirt.io/v1beta1","kind":"DataVolume","metadata":{"annotations":{},"name":"azimv","namespace":"vm-4fc0ae73"},"spec":{"pvc":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"ceph-block","volumeMode":"Block"},"source":{"pvc":{"name":"img-39f5y2tm","namespace":"default"}}}}
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
  creationTimestamp: "2024-05-20T05:58:16Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app: containerized-data-importer
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
  name: azimv
  namespace: vm-4fc0ae73
  ownerReferences:
  - apiVersion: cdi.kubevirt.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: DataVolume
    name: azimv
    uid: a5a572b5-e5d5-4809-a43c-0c6e844118b9
  resourceVersion: "4788825"
  uid: 6209f4f5-7354-4f3a-b9f4-bf497663af06
spec:
  accessModes:
  - ReadWriteOnce
  dataSource:
    apiGroup: cdi.kubevirt.io
    kind: VolumeCloneSource
    name: volume-clone-source-a5a572b5-e5d5-4809-a43c-0c6e844118b9
  dataSourceRef:
    apiGroup: cdi.kubevirt.io
    kind: VolumeCloneSource
    name: volume-clone-source-a5a572b5-e5d5-4809-a43c-0c6e844118b9
  resources:
    requests:
      storage: 20Gi
  storageClassName: ceph-block
  volumeMode: Block
  volumeName: pvc-48e24d59-725f-41fc-8cca-f406710ff7bf
status:
  phase: Lost

Events on lost DV:

Events:
  Type     Reason              Age   From                             Message
  ----     ------              ----  ----                             -------
  Normal   CloneScheduled      23m   datavolume-pvc-clone-controller  Cloning from default/img-39f5y2tm into vm-4fc0ae73/azimv scheduled
  Normal   Pending             23m   datavolume-pvc-clone-controller  PVC azimv Pending
  Warning  Pending             23m   datavolume-pvc-clone-controller  Clone Pending
  Normal   CSICloneInProgress  23m   datavolume-pvc-clone-controller  CSI Volume clone in progress (for pvc default/img-39f5y2tm)
  Warning  Error               22m   datavolume-pvc-clone-controller  actual PVC size missing
  Normal   Bound               22m   datavolume-pvc-clone-controller  PVC azimv Bound
  Normal   CloneSucceeded      22m   datavolume-pvc-clone-controller  Successfully cloned from default/img-39f5y2tm into vm-4fc0ae73/azimv
  Warning  ErrClaimLost        22m   datavolume-pvc-clone-controller  PVC azimv lost
  Normal   ClaimLost           22m   datavolume-pvc-clone-controller  Claim Lost

Events on lost PVC:

Events:
  Type     Reason                Age                From                                                                                                        Message
  ----     ------                ----               ----                                                                                                        -------
  Normal   ExternalProvisioning  22m (x7 over 22m)  persistentvolume-controller                                                                                 waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          22m (x6 over 22m)  rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-76f966fdd8-gh558_ae0d33a0-c292-4c40-bf36-49e2fd466e8b  External provisioner is provisioning volume for claim "vm-4fc0ae73/azimv"
  Normal   Provisioning          22m (x6 over 22m)  external-provisioner                                                                                        Assuming an external populator will provision the volume
  Warning  ClaimMisbound         22m                persistentvolume-controller                                                                                 Two claims are bound to the same volume, this one is bound incorrectly

StorageClass yaml:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: os-rook-ceph-cluster
    meta.helm.sh/release-namespace: default
    storageclass.kubernetes.io/is-default-class: "false"
  creationTimestamp: "2024-05-15T12:01:27Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: ceph-block
  resourceVersion: "2289"
  uid: dd454972-29e1-4932-bbc1-789ef28e5c6e
parameters:
  clusterID: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  imageFeatures: layering
  imageFormat: "2"
  pool: ceph-blockpool
provisioner: rook-ceph.rbd.csi.ceph.com
reclaimPolicy: Retain
volumeBindingMode: Immediate

PV yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: rook-ceph.rbd.csi.ceph.com
    volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-rbd-provisioner
    volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph
  creationTimestamp: "2024-05-20T05:58:20Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-48e24d59-725f-41fc-8cca-f406710ff7bf
  resourceVersion: "4788637"
  uid: b576d6e4-4a1e-43fe-895c-60f0aacecc4e
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 20Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06
    namespace: default
    resourceVersion: "4788443"
    uid: 48e24d59-725f-41fc-8cca-f406710ff7bf
  csi:
    controllerExpandSecretRef:
      name: rook-csi-rbd-provisioner
      namespace: rook-ceph
    driver: rook-ceph.rbd.csi.ceph.com
    nodeStageSecretRef:
      name: rook-csi-rbd-node
      namespace: rook-ceph
    volumeAttributes:
      clusterID: rook-ceph
      imageFeatures: layering
      imageFormat: "2"
      imageName: csi-vol-a68f3967-ec00-42ae-91d4-161c9ed87dae
      journalPool: ceph-blockpool
      pool: ceph-blockpool
      storage.kubernetes.io/csiProvisionerIdentity: 1715856786508-2790-rook-ceph.rbd.csi.ceph.com
    volumeHandle: 0001-0009-rook-ceph-0000000000000001-a68f3967-ec00-42ae-91d4-161c9ed87dae
  persistentVolumeReclaimPolicy: Retain
  storageClassName: ceph-block
  volumeMode: Block
status:
  phase: Released

some relevant infos in cdi-deployment log:

{"level":"error","ts":"2024-05-20T05:58:16Z","logger":"controller.datavolume-pvc-clone-controller","msg":"unable to update datavolume status","name":"azimv","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).emitEvent\n\tpkg/controller/datavolume/controller-base.go:1012\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).updateStatus\n\tpkg/controller/datavolume/controller-base.go:909\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).reconcile\n\tpkg/controller/datavolume/controller-base.go:434\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*PvcCloneReconciler).Reconcile\n\tpkg/controller/datavolume/pvc-clone-controller.go:179\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-05-20T05:58:16Z","msg":"Reconciler error","controller":"datavolume-pvc-clone-controller","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"b4f03528-12c1-42dc-b99c-9082ccfc80e5","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-05-20T05:58:16Z","logger":"controller.datavolume-pvc-clone-controller","msg":"unable to update datavolume status","name":"azimv","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).emitEvent\n\tpkg/controller/datavolume/controller-base.go:1012\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).updateDataVolumeStatusPhaseWithEvent\n\tpkg/controller/datavolume/controller-base.go:828\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).updateDataVolumeStatusPhaseSync\n\tpkg/controller/datavolume/controller-base.go:804\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).updateStatus\n\tpkg/controller/datavolume/controller-base.go:846\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).reconcile\n\tpkg/controller/datavolume/controller-base.go:434\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*PvcCloneReconciler).Reconcile\n\tpkg/controller/datavolume/pvc-clone-controller.go:179\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-05-20T05:58:16Z","msg":"Reconciler error","controller":"datavolume-pvc-clone-controller","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"4ca351d4-54ca-430e-bd84-a0268dad08e7","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}


{"level":"error","ts":"2024-05-20T05:58:16Z","msg":"Reconciler error","controller":"clone-populator","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"945ab067-aa17-4596-a8d5-2f5a782adeff","error":"Operation cannot be fulfilled on persistentvolumeclaims \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.clone-controller","msg":"Should not reconcile this PVC","PVC":"default/tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06","checkPVC(AnnCloneRequest)":false,"NOT has annotation(AnnCloneOf)":true,"isBound":false,"has finalizer?":false}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.clone-controller","msg":"Should not reconcile this PVC","PVC":"default/tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06","checkPVC(AnnCloneRequest)":false,"NOT has annotation(AnnCloneOf)":true,"isBound":true,"has finalizer?":false}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.upload-controller","msg":"not doing anything with PVC","PVC":"default/tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06","isUpload":false,"isCloneTarget":false,"isBound":true,"podSucceededFromPVC":false,"deletionTimeStamp set?":true}


{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.clone-populator","msg":"removing finalizer","PVC":"vm-4fc0ae73/azimv"}
{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"events","msg":"Successfully cloned from default/img-39f5y2tm into vm-4fc0ae73/azimv","type":"Normal","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788596"},"reason":"CloneSucceeded"}


{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.import-controller","msg":"PVC not bound, skipping pvc","PVC":"default/tmp-pvc-6209f4f5-7354-4f3a-b9f4-bf497663af06","Phase":"Lost"}



{"level":"error","ts":"2024-05-20T05:58:20Z","msg":"Reconciler error","controller":"clone-populator","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"830576eb-a951-4fe6-a647-cd9ca8b8daf4","error":"actual PVC size missing","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"events","msg":"actual PVC size missing","type":"Warning","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788588"},"reason":"Error"}
{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.clone-controller","msg":"Should not reconcile this PVC","PVC":"vm-4fc0ae73/azimv","checkPVC(AnnCloneRequest)":false,"NOT has annotation(AnnCloneOf)":true,"isBound":true,"has finalizer?":false}
{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"controller.datavolume-pvc-clone-controller","msg":"Unknown populator phase","phase":"Error"}


{"level":"debug","ts":"2024-05-20T05:58:20Z","logger":"events","msg":"Successfully cloned from default/img-39f5y2tm into vm-4fc0ae73/azimv","type":"Normal","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788596"},"reason":"CloneSucceeded"}


{"level":"error","ts":"2024-05-20T05:58:21Z","logger":"controller.datavolume-pvc-clone-controller","msg":"Unable to sync update dv meta","name":"azimv","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).syncUpdate\n\tpkg/controller/datavolume/controller-base.go:540\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).syncCommon\n\tpkg/controller/datavolume/controller-base.go:449\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*PvcCloneReconciler).syncClone\n\tpkg/controller/datavolume/pvc-clone-controller.go:247\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*PvcCloneReconciler).sync\n\tpkg/controller/datavolume/pvc-clone-controller.go:239\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*ReconcilerBase).reconcile\n\tpkg/controller/datavolume/controller-base.go:433\nkubevirt.io/containerized-data-importer/pkg/controller/datavolume.(*PvcCloneReconciler).Reconcile\n\tpkg/controller/datavolume/pvc-clone-controller.go:179\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:323\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}
{"level":"error","ts":"2024-05-20T05:58:21Z","msg":"Reconciler error","controller":"datavolume-pvc-clone-controller","object":{"name":"azimv","namespace":"vm-4fc0ae73"},"namespace":"vm-4fc0ae73","name":"azimv","reconcileID":"ea1b450d-b15f-4e83-8223-92a4ad1a04a0","error":"Operation cannot be fulfilled on datavolumes.cdi.kubevirt.io \"azimv\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:274\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tvendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235"}

{"level":"debug","ts":"2024-05-20T05:58:25Z","logger":"controller.clone-controller","msg":"PVC annotation not found, skipping pvc","PVC":"vm-4fc0ae73/azimv","annotation":"k8s.io/CloneRequest"}

{"level":"debug","ts":"2024-05-20T05:58:25Z","logger":"controller.clone-controller","msg":"PVC not bound, skipping pvc","PVC":"vm-4fc0ae73/azimv","Phase":"Lost"}

{"level":"debug","ts":"2024-05-20T05:58:25Z","logger":"events","msg":"PVC azimv lost","type":"Warning","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788828"},"reason":"ErrClaimLost"}
{"level":"debug","ts":"2024-05-20T05:58:25Z","logger":"events","msg":"Claim Lost","type":"Normal","object":{"kind":"DataVolume","namespace":"vm-4fc0ae73","name":"azimv","uid":"a5a572b5-e5d5-4809-a43c-0c6e844118b9","apiVersion":"cdi.kubevirt.io/v1beta1","resourceVersion":"4788828"},"reason":"ClaimLost"}

And my test shell:

#! /bin/bash

count=0
while [ $count -lt 20 ]
do
count=`expr $count + 1`
length=5
random_string=$(cat /dev/urandom | tr -dc 'a-z0-9' | fold -w $length | head -n 1)
echo $random_string

sed  "s/DV-NAME/$random_string/g" /root/my/batch-dv.yaml > /root/my/batch-dv-1.yaml

kubectl apply -f /root/my/batch-dv-1.yaml

done

my test dv yaml:

apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: DV-NAME
  namespace: vm-4fc0ae73
spec:
  pvc:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 20Gi
    storageClassName: ceph-block
    volumeMode: Block
  source:
    pvc:
      name: img-39f5y2tm
      namespace: default

@mhenriks
Copy link
Member

@mengyu1987 looks like you are not using the ceph provisioner included in kubevirtci

I could not reproduce with the following params

KUBEVIRT_PROVIDER=k8s-1.27.6
KUBEVIRT_STORAGE=rook-ceph-default
KUBEVIRT_NUM_NODES=3

KUBEVIRT_STORAGE=rook-ceph-default will install rook/ceph provisioner

I noticed your StorageClass has Retain rather than Delete so I tried testing with that setting and still could not reproduce with rook-ceph-default.

I suggest you run with rook-ceph-default. It that works then there may be an issue with your provisioner config

@mengyu1987
Copy link
Author

@mengyu1987 looks like you are not using the ceph provisioner included in kubevirtci

I could not reproduce with the following params

KUBEVIRT_PROVIDER=k8s-1.27.6
KUBEVIRT_STORAGE=rook-ceph-default
KUBEVIRT_NUM_NODES=3

KUBEVIRT_STORAGE=rook-ceph-default will install rook/ceph provisioner

I noticed your StorageClass has Retain rather than Delete so I tried testing with that setting and still could not reproduce with rook-ceph-default.

I suggest you run with rook-ceph-default. It that works then there may be an issue with your provisioner config

thanks @mhenriks. I run with rook-ceph-default,It does not reproduce. It is resolved.

@mengyu1987
Copy link
Author

mengyu1987 commented May 23, 2024

@mhenriks It is reproduced once more with rook-ceph-default.
There are two StorageClasses:

[root@node152 ~]# kubectl get sc
NAME                   PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ceph-block (default)   rook-ceph.rbd.csi.ceph.com      Retain          Immediate           true                   7d15h
ceph-fs                rook-ceph.cephfs.csi.ceph.com   Retain          Immediate           true                   7d15h

And ceph-block-default yaml:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: os-rook-ceph-cluster
    meta.helm.sh/release-namespace: default
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2024-05-15T12:01:27Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: ceph-block
  resourceVersion: "8172170"
  uid: dd454972-29e1-4932-bbc1-789ef28e5c6e
parameters:
  clusterID: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  imageFeatures: layering
  imageFormat: "2"
  pool: ceph-blockpool
provisioner: rook-ceph.rbd.csi.ceph.com
reclaimPolicy: Retain
volumeBindingMode: Immediate

I dont know why the dv is reconciled to failed after it is succeeded.
succeeded-failed
The dv is succeeded in cdi-deployment log:
succeeded-log
Then reconsile, the dv is failed.
reconsile-log

@mengyu1987 mengyu1987 reopened this May 23, 2024
@mhenriks
Copy link
Member

@mengyu1987 this setting is not compatible with rook-ceph-default

pool: ceph-blockpool

✗ k get  cephblockpools -n rook-ceph
NAME          PHASE
replicapool   Ready

there is no blockpool named ceph-blockpool

@aglitke
Copy link
Member

aglitke commented Jun 3, 2024

This seems related to your custom implementation of rook-ceph and not an issue with CDI. Therefore I am going to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants