Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fileset is not getting deleted in case of GUI HA when another GUI node is getting used #1061

Open
saurabhwani5 opened this issue Nov 9, 2023 · 0 comments
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Low (1) Issue only occurs during failure condition - disk, server, network, test assert, ... Found In: 2.10.0 Severity: 3 Indicates the the issue is on the priority list for next milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.

Comments

@saurabhwani5
Copy link
Member

saurabhwani5 commented Nov 9, 2023

Describe the bug

When all GUI nodes are down , suppose we bring GUI node -1 and create PVC and we stop this GUI node and we delete the created PVC. Now bring another GUI up , in this case after some time pv is getting deleted but fileset is not getting deleted

How to Reproduce?

  1. Install GM build on GUI HA cluster (3 GUI Nodes):
[root@multiguiubu-master pr1050]# oc get pods
NAME                                                  READY   STATUS    RESTARTS      AGE
ibm-spectrum-scale-csi-59sx7                          3/3     Running   0             18s
ibm-spectrum-scale-csi-attacher-846bcfb84c-fx2mh      1/1     Running   1 (20s ago)   29s
ibm-spectrum-scale-csi-attacher-846bcfb84c-qrsd7      1/1     Running   0             28s
ibm-spectrum-scale-csi-operator-85f6545658-qr6z9      1/1     Running   0             3h23m
ibm-spectrum-scale-csi-provisioner-754c55ff57-lvxkf   1/1     Running   0             29s
ibm-spectrum-scale-csi-qn9ll                          3/3     Running   0             26s
ibm-spectrum-scale-csi-resizer-6b966777fc-hntmf       1/1     Running   0             29s
ibm-spectrum-scale-csi-snapshotter-55448875d8-qvhp6   1/1     Running   0             29s
[root@multiguiubu-master pr1050]# oc describe pod | grep quay
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.10.0-GM
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:57b4ee494ca48342d1ffaf22a166286202b0406b88316e4dcbe87212df6ca8f0
  Normal  Pulled     24s   kubelet            Container image "quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.10.0-GM" already present on machine
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator:v2.10.0-GM
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-operator@sha256:52fac65c6258a874a1fcd3a2feb45d21a2129b1d5b6914fd8699ee36f5e9d787
      CSI_DRIVER_IMAGE:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.10.0-GM
    Image:         quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.10.0-GM
    Image ID:      quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver@sha256:57b4ee494ca48342d1ffaf22a166286202b0406b88316e4dcbe87212df6ca8f0
  Normal  Pulled     31s   kubelet            Container image "quay.io/ibm-spectrum-scale-dev/ibm-spectrum-scale-csi-driver:v2.10.0-GM" already present on machine
[root@multiguiubu-master pr1050]# oc get cso
NAME                     VERSION   SUCCESS
ibm-spectrum-scale-csi   2.10.0    True
  1. Make all GUI node down except one GUI node (GUI 1):
[root@multiguiubu-scalegui-1 ~]# systemctl status gpfsgui
● gpfsgui.service - IBM_Spectrum_Scale Administration GUI
   Loaded: loaded (/usr/lib/systemd/system/gpfsgui.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-11-09 00:59:10 PST; 1h 31min ago
[root@multiguiubu-scalegui-2 ~]# systemctl stop gpfsgui
[root@multiguiubu-scalegui-2 ~]# systemctl status gpfsgui
● gpfsgui.service - IBM_Spectrum_Scale Administration GUI
   Loaded: loaded (/usr/lib/systemd/system/gpfsgui.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2023-11-09 02:31:26 PST; 8s ago
 [root@multiguiubu-scalegui-3 ~]# systemctl stop gpfsgui
[root@multiguiubu-scalegui-3 ~]#  systemctl status gpfsgui
● gpfsgui.service - IBM_Spectrum_Scale Administration GUI
   Loaded: loaded (/usr/lib/systemd/system/gpfsgui.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2023-11-09 02:32:49 PST; 27s ago
  1. Create PVC as following :
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scale-advance-pvc-5
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: ibm-spectrum-scale-csi-advance

---

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ibm-spectrum-scale-csi-advance
provisioner: spectrumscale.csi.ibm.com
parameters:
   volBackendFs: "fs1"
   version: "2"
reclaimPolicy: Delete
[root@multiguiubu-master pr1050]# oc apply -f apply.yaml
persistentvolumeclaim/scale-advance-pvc-5 created
storageclass.storage.k8s.io/ibm-spectrum-scale-csi-advance created
[root@multiguiubu-master pr1050]# oc get pvc -w
NAME                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                     AGE
scale-advance-pvc-5   Bound    pvc-a1df5bd1-05a1-4f05-b7b5-d5075af376c7   1Gi        RWX            ibm-spectrum-scale-csi-advance   42s
  1. Stop gui node (GUI 1):
[root@multiguiubu-scalegui-1 ~]# systemctl stop gpfsgui
[root@multiguiubu-scalegui-1 ~]# systemctl status gpfsgui
● gpfsgui.service - IBM_Spectrum_Scale Administration GUI
   Loaded: loaded (/usr/lib/systemd/system/gpfsgui.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2023-11-09 02:35:28 PST; 2min 30s ago
  1. Delete the created PVC :
[root@multiguiubu-master pr1050]# oc delete -f apply.yaml
persistentvolumeclaim "scale-advance-pvc-5" deleted
storageclass.storage.k8s.io "ibm-spectrum-scale-csi-advance" deleted
[root@multiguiubu-master pr1050]# oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                               STORAGECLASS                     REASON   AGE
pvc-a1df5bd1-05a1-4f05-b7b5-d5075af376c7   1Gi        RWX            Delete           Released   ibm-spectrum-scale-csi-driver/scale-advance-pvc-5   ibm-spectrum-scale-csi-advance            4m55s
  1. Now start GUI node 2 :
[root@multiguiubu-scalegui-2 ~]# systemctl start gpfsgui
[root@multiguiubu-scalegui-2 ~]# systemctl status gpfsgui
● gpfsgui.service - IBM_Spectrum_Scale Administration GUI
   Loaded: loaded (/usr/lib/systemd/system/gpfsgui.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2023-11-09 02:39:53 PST; 1min 2s ago
  1. Check if pv is getting deleted :
[root@multiguiubu-master pr1050]# oc get pv
No resources found
  1. Check if fileset is present or not :
[root@multiguiubu-scalegui-2 pvc-a1df5bd1-05a1-4f05-b7b5-d5075af376c7]# ls
[root@multiguiubu-scalegui-2 pvc-a1df5bd1-05a1-4f05-b7b5-d5075af376c7]# pwd
/ibm/fs1/6505846e-5027-418c-8030-6fcf39da580b-ibm-spectrum-scale-csi-driver/pvc-a1df5bd1-05a1-4f05-b7b5-d5075af376c7

Expected behavior

Fileset should be deleted and CSI shouldn't delete PV if there is some issue from GUI

Logs:

/scale-csi/D.1061
csisnap.tar.gz

@saurabhwani5 saurabhwani5 added Severity: 3 Indicates the the issue is on the priority list for next milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error. Customer Probability: Low (1) Issue only occurs during failure condition - disk, server, network, test assert, ... Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Found In: 2.10.0 labels Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Low (1) Issue only occurs during failure condition - disk, server, network, test assert, ... Found In: 2.10.0 Severity: 3 Indicates the the issue is on the priority list for next milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.
Projects
None yet
Development

No branches or pull requests

1 participant