Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition happens in shallow copy volume #1221

Open
saurabhwani5 opened this issue Sep 25, 2024 · 0 comments
Open

Race condition happens in shallow copy volume #1221

saurabhwani5 opened this issue Sep 25, 2024 · 0 comments
Assignees
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Low (1) Issue only occurs during failure condition - disk, server, network, test assert, ... Found In: 2.13.0 For Bug issues to identify what release level issue was found in 2.13.0 Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.
Milestone

Comments

@saurabhwani5
Copy link
Member

saurabhwani5 commented Sep 25, 2024

Describe the bug

when shallow copy volume and snapshot deleted together then race condition happens and snapshot doesn't get deleted with log error as following
I have uploaded complete test run report here : /u/DUMPS/scale-csi/D.1221

E0926 06:15:36.386636       1 rest_v2.go:162] [0b437ec0-2a66-4d93-922c-647ebd5a251c] Async Job failed: {{200 The request finished successfully.} [{{[] [] 6 [EFSSG0264C The path /var/mnt/local-sample/pvc-4a29738e-01e1-4431-bbd7-d17dbff22080/snapshot-191e5413-29a8-4fd2-997e-8cc8f940b2c4 does not exist.] []} {GET /scalemgmt/v2/filesystems/local-sample/directory/pvc-4a29738e-01e1-4431-bbd7-d17dbff22080%2Fsnapshot-191e5413-29a8-4fd2-997e-8cc8f940b2c4 map[]} 1000000023847 2024-09-26 06:15:34,370 2024-09-26 06:15:34,470 FAILED}]}
E0926 06:15:36.386706       1 controllerserver.go:3129] [0b437ec0-2a66-4d93-922c-647ebd5a251c] unable to stat directory using FS [local-sample] at path [pvc-4a29738e-01e1-4431-bbd7-d17dbff22080/snapshot-191e5413-29a8-4fd2-997e-8cc8f940b2c4]. Error [unable to stat dir pvc-4a29738e-01e1-4431-bbd7-d17dbff22080/snapshot-191e5413-29a8-4fd2-997e-8cc8f940b2c4:[EFSSG0264C The path /var/mnt/local-sample/pvc-4a29738e-01e1-4431-bbd7-d17dbff22080/snapshot-191e5413-29a8-4fd2-997e-8cc8f940b2c4 does not exist.]]

How to Reproduce?

  1. PVCs :
    [root@k8s~]# kubectl get pvc -w
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
pvc-mjltdvgkgw   Bound    pvc-4a29738e-01e1-4431-bbd7-d17dbff22080   1Gi        RWX            sc-htfdiygqaw   <unset>                 75s
restored-pvc-rhckeut-0   Bound     pvc-8847b497-775b-4fac-a192-5fc0d33abd74   1Gi        ROX            sc-htfdiygqaw   <unset>                 7s
restored-pvc-rhckeut-0                      Bound     pvc-8847b497-775b-4fac-a192-5fc0d33abd74   1Gi        ROX            sc-htfdiygqaw   <unset>                 32s
clone-restored-pvc-rhckeut-0-0-0-lfwoizlj   Bound     pvc-03e74d8d-8942-438b-901c-a248817d4ae6   1Gi        RWX            sc-htfdiygqaw   <unset>                 57s
clone-restored-pvc-rhckeut-0-1-0-erkmd      Bound     pvc-41503de7-4256-45e0-8171-203677dc6323   1Gi        RWO            sc-htfdiygqaw   <unset>                 73s
  1. VS :
[root@k8s ~]# kubectl get vs -w
NAME           READYTOUSE   SOURCEPVC        SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS      SNAPSHOTCONTENT   CREATIONTIME   AGE
vs-rhckeut-0   true         pvc-mjltdvgkgw                           1Gi           vs-class-dlbkqlr   snapcontent-191e5413-29a8-4fd2-997e-8cc8f940b2c4   4s             6s

Expected behavior

Source PV and snapshot should be deleted

Data Collection and Debugging

CSI snap and failed test case log :
/u/DUMPS/scale-csi/D.1221

@saurabhwani5 saurabhwani5 added the Type: Bug Indicates issue is an undesired behavior, usually caused by code error. label Sep 25, 2024
@hemalathagajendran hemalathagajendran self-assigned this Sep 25, 2024
@Jainbrt Jainbrt added Found In: 2.13.0 For Bug issues to identify what release level issue was found in 2.13.0 Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Customer Probability: Low (1) Issue only occurs during failure condition - disk, server, network, test assert, ... Customer Impact: Localized low impact (2) Temporary / limited perf impact, unnecessary failovers, issues occur while in degraded state Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. and removed Customer Impact: Localized low impact (2) Temporary / limited perf impact, unnecessary failovers, issues occur while in degraded state labels Sep 25, 2024
@Jainbrt Jainbrt added this to the v2.13.0 milestone Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Low (1) Issue only occurs during failure condition - disk, server, network, test assert, ... Found In: 2.13.0 For Bug issues to identify what release level issue was found in 2.13.0 Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.
Projects
None yet
Development

No branches or pull requests

3 participants