Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVC creation taking more time when 50K PVCs are created #993

Open
saurabhwani5 opened this issue Jul 6, 2023 · 2 comments
Open

PVC creation taking more time when 50K PVCs are created #993

saurabhwani5 opened this issue Jul 6, 2023 · 2 comments
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Found In: 2.10.0 Severity: 3 Indicates the the issue is on the priority list for next milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.

Comments

@saurabhwani5
Copy link
Member

saurabhwani5 commented Jul 6, 2023

Describe the bug

Currently, I am conducting tests for the creation of 50,000 PVCs. Out of these, 3,000 are independent PVCs, and the remaining 50,000 are dependent PVCs.

How to Reproduce?

  1. Install to CSI version 2.9.0.
  2. Since PVC creation consumes significant CPU resources, we should reduce the CPU allocation for all sidecars in the deployment and set the replica count of the operator to 0.
resources:
          limits:
            cpu: 1500m
            ephemeral-storage: 25Gi
            memory: 1500Mi
  1. Create 3000 independent PVCs and associate 20 dependent PVCs with each independent PVC (totaling 60,000 dependent PVCs).
  2. Note that when the number of PVCs increases, it takes more time (around 10 minutes or more) to create each PVC.
[root@saurabh5-master ~]# oc get pods
NAME                                                  READY   STATUS    RESTARTS        AGE
ibm-spectrum-scale-csi-2bjh9                          3/3     Running   2 (3h32m ago)   20h
ibm-spectrum-scale-csi-5qrsp                          3/3     Running   3 (116m ago)    20h
ibm-spectrum-scale-csi-attacher-79849cffcb-c8kbd      1/1     Running   0               20h
ibm-spectrum-scale-csi-attacher-79849cffcb-r78rp      1/1     Running   0               20h
ibm-spectrum-scale-csi-provisioner-6fb458cb77-5npbm   1/1     Running   2 (3h32m ago)   20h
ibm-spectrum-scale-csi-resizer-78b6699ff4-m7p2w       1/1     Running   0               20h
ibm-spectrum-scale-csi-snapshotter-59fb55f65b-7vhnk   1/1     Running   0               20h
[root@saurabh5-master ~]# kubectl top pod --namespace ibm-spectrum-scale-csi-driver
NAME                                                  CPU(cores)   MEMORY(bytes)
ibm-spectrum-scale-csi-2bjh9                          2m           50Mi
ibm-spectrum-scale-csi-5qrsp                          1m           37Mi
ibm-spectrum-scale-csi-attacher-79849cffcb-c8kbd      1m           379Mi
ibm-spectrum-scale-csi-attacher-79849cffcb-r78rp      1m           14Mi
ibm-spectrum-scale-csi-provisioner-6fb458cb77-5npbm   35m          684Mi
ibm-spectrum-scale-csi-resizer-78b6699ff4-m7p2w       1m           660Mi
ibm-spectrum-scale-csi-snapshotter-59fb55f65b-7vhnk   1m           18Mi
[root@saurabh5-master ~]# oc get pvc -A | wc -l
45120

[root@saurabh5-master ~]# oc get pvc | grep scale-fset-dependent-sc-2107-pvc-1
scale-fset-dependent-sc-2107-pvc-1    Bound     pvc-99206a74-3840-47b0-9055-eb217b982bfb   1Gi        RWX            ibm-spectrum-scale-csi-fileset-dependent-2107   21m


Scripts used :

  1. For creation of independent PVC :
[root@saurabh5-master 50K]# cat sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ibm-spectrum-scale-csi-fileset-independent
provisioner: spectrumscale.csi.ibm.com
parameters:
    volBackendFs: "fs1"
    inodeLimit: "1024"
reclaimPolicy: Delete
[root@saurabh5-master 50K]# cat pvc.sh
#!/bin/bash
for (( i=1 ; i<=$1 ; i++ ))
do
    echo "apiVersion: v1"> test.yaml
    echo "kind: PersistentVolumeClaim">> test.yaml
    echo "metadata:">> test.yaml
    echo "  name: scale-fset-independent-pvc-$i">> test.yaml
    echo "spec:">> test.yaml
    echo "  accessModes:">> test.yaml
    echo "  - ReadWriteMany">> test.yaml
    echo "  resources:">> test.yaml
    echo "    requests:">> test.yaml
    echo "      storage: 1Gi">> test.yaml
    echo "  storageClassName: ibm-spectrum-scale-csi-fileset-independent">> test.yaml
    kubectl apply -f test.yaml
done
  1. For creation of dependent pvc :
    First we need to collect all the independent fileset names of which we will provide it for dependent pvc :
[root@saurabh5-master dep]# cat 3000sc.sh
#!/bin/bash

read -p "Enter the path to the file: " file_path

if [[ ! -f "$file_path" ]]; then
    echo "File not found: $file_path"
    exit 1
fi

# Read the file line by line
i=1
while IFS= read -r line; do
    # Process each line
    echo "apiVersion: storage.k8s.io/v1"> testsc.yaml
    echo "kind: StorageClass">> testsc.yaml
    echo "metadata:">> testsc.yaml
    echo "  name: ibm-spectrum-scale-csi-fileset-dependent-$i">> testsc.yaml
    echo "provisioner: spectrumscale.csi.ibm.com">> testsc.yaml
    echo "parameters:">> testsc.yaml
    echo "    volBackendFs: fs1">> testsc.yaml
    echo "    filesetType: dependent">> testsc.yaml
    echo "    parentFileset: $line">> testsc.yaml
    echo "reclaimPolicy: Delete">> testsc.yaml
    kubectl apply -f testsc.yaml
    i=$((i+1))

done < "$file_path"

Creating dependent 50,000 PVCs: (we are creating pvc in batches of 20)

[root@saurabh5-master dep]# cat apply.sh
#!/bin/bash

for ((i=1; i<=3000; i++))
do
    while true; do
    pending_count=$(oc get pvc | grep Pending | wc -l)

    if [ "$pending_count" -eq 0 ]; then
        echo "All PVCs are in a bound state. Proceeding..."
        break
    else
        echo "There are $pending_count PVC(s) in a pending state. Waiting..."
        #sleep 10
    fi
done
    for ((j=1; j<=20; j++))
    do
        echo "apiVersion: v1"> test.yaml
        echo "kind: PersistentVolumeClaim">> test.yaml
        echo "metadata:">> test.yaml
        echo "  name: scale-fset-dependent-sc-$i-pvc-$j">> test.yaml
        echo "spec:">> test.yaml
        echo "  accessModes:">> test.yaml
        echo "  - ReadWriteMany">> test.yaml
        echo "  resources:">> test.yaml
        echo "    requests:">> test.yaml
        echo "      storage: 1Gi">> test.yaml
        echo "  storageClassName: ibm-spectrum-scale-csi-fileset-dependent-$i">> test.yaml
        kubectl apply -f test.yaml
    done
done

Expected behavior

PVC creation should take less time

Data Collection and Debugging

CSI Snap : /scale-csi/D.993

@saurabhwani5 saurabhwani5 added the Type: Bug Indicates issue is an undesired behavior, usually caused by code error. label Jul 6, 2023
@Jainbrt Jainbrt added Severity: 2 Indicates that the issue is critical and must be addressed before milestone. Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Found In: 2.10.0 labels Jul 6, 2023
@Jainbrt Jainbrt added this to the v2.10.0 milestone Jul 6, 2023
@amdabhad
Copy link
Member

amdabhad commented Jul 11, 2023

While we have the setup where close to 50k fileset based PVCs are created, please capture below things:

  1. Current resources set on all the CSI pods
  2. Resources in use while creating PVC and pod
  3. Time taken for each of the following:
  • mmlsfileset <fs> <existing fileset name> :
[root@saurabh5-scalegui ~]# date; mmlsfileset fs1 check8; date
Mon Jul 17 23:13:54 PDT 2023

Unable to start tslsfileset on 'fs1' because conflicting program tslsfileset is running. Waiting until it completes or moves to the next phase, which may allow the current command to start.
tslsfileset on 'fs1' is finished waiting.  Processing continues ...
Filesets in file system 'fs1':
Name                     Status    Path
check8                   Linked    /ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75/check8
Mon Jul 17 23:21:26 PDT 2023
  • REST call to GUI for the same mmlsfilest:
[root@saurabh5-master ~]# curl --insecure -u 'username:password' -X GET [https://saurabh5-scalegui.fyre.ibm.com:443/scalemgmt/v2/filesystems/fs1/filesets/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75](https://saurabh5-scalegui.fyre.ibm.com/scalemgmt/v2/filesystems/fs1/filesets/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75)
{
  "filesets" : [ {
    "config" : {
      "comment" : "Fileset created by IBM Container Storage Interface driver",
      "created" : "2023-06-08 23:42:28,000",
      "iamMode" : "off",
      "id" : 2467,
      "inodeSpace" : 2467,
      "inodeSpaceMask" : 2096640,
      "isInodeSpaceOwner" : true,
      "maxNumInodes" : 1024,
      "oid" : 5866,
      "parentId" : 0,
      "path" : "/ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75",
      "permissionChangeMode" : "chmodAndSetacl",
      "rootInode" : 1293418499,
      "snapId" : 0,
      "status" : "Linked"
    },
    "filesetName" : "pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75",
    "filesystemName" : "fs1",
    "usage" : {
      "allocatedInodes" : 1024,
      "inodeSpaceFreeInodes" : 976,
      "inodeSpaceUsedInodes" : 48,
      "usedBytes" : 0,
      "usedInodes" : 48
    }
  } ],
  "status" : {
    "code" : 200,
    "message" : "The request finished successfully."
  }
}
  • Create a new fileset using mmcrfileset:
[root@saurabh5-scalegui ~]# date; mmcrfileset fs1 check8 --inode-space pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75; date
Mon Jul 17 22:08:53 PDT 2023
Fileset check8 created with id 55000 root inode 1293418544.
Mon Jul 17 22:08:54 PDT 2023
  • Link fileset using mmlinkfileset:
[root@saurabh5-scalegui ~]# date; mmlinkfileset fs1 check8 -J /ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75/check8; date
Mon Jul 17 22:12:30 PDT 2023
Fileset check8 linked at /ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75/check8
Mon Jul 17 22:12:31 PDT 2023
[root@saurabh5-scalegui ~]#
  • Time taken for PVC from create to go to bound state, also the time taken for the same volumeCreate call in driver logs
  • REST call to GUI to create a fileset - for this you will get a job id, keep checking the status of job for the completion using the job id: Took 5 Mins 45 Sec
[root@saurabh5-master ~]# curl --insecure -u 'udername:password' -X GET https://saurabh5-scalegui.fyre.ibm.com:443/scalemgmt/v2/jobs/1000000270533
{
 "jobs" : [ {
   "jobId" : 1000000270533,
   "status" : "COMPLETED",
   "submitted" : "2023-07-17 07:51:06,090",
   "completed" : "2023-07-17 07:56:51,061",
   "runtime" : 344971,
   "request" : {
     "data" : {
       "filesetName" : "check3",
       "inodeSpace" : "pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75"
     },
     "type" : "POST",
     "url" : "/scalemgmt/v2/filesystems/fs1/filesets"
   },
   "result" : {
     "progress" : [ ],
     "commands" : [ "mmcrfileset 'fs1' 'check3' --inode-space 'pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75' --allow-permission-change 'chmodAndSetAcl' ", "mmlinkfileset 'fs1' 'check3' -J '/ibm/fs1/pvc-bbaa3ce6-3126-4efa-b91f-ebbefc559a75/check3' " ],
     "stdout" : [ "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSA0194I Waiting for concurrent operation to complete.", "EFSSG0070I File set check3 created successfully.", "EFSSG0078I File set check3 successfully linked.\n" ],
     "stderr" : [ ],
     "exitCode" : 0
   },
   "pids" : [ ]
 } ],
 "status" : {
   "code" : 200,
   "message" : "The request finished successfully."
 }
}

@amdabhad
Copy link
Member

@amdabhad amdabhad changed the title PVC creation taking more time when many pvc are created PVC creation taking more time when 50K PVCs are created Sep 21, 2023
@Jainbrt Jainbrt modified the milestones: v2.10.0, v2.11.0 Dec 14, 2023
@Jainbrt Jainbrt removed this from the v2.11.0 milestone Mar 9, 2024
@deeghuge deeghuge added Severity: 3 Indicates the the issue is on the priority list for next milestone. and removed Severity: 2 Indicates that the issue is critical and must be addressed before milestone. labels Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Customer Impact: Localized high impact (3) Reduction of function. Significant impact to workload. Customer Probability: Medium (3) Issue occurs in normal path but specific limited timing window, or other mitigating factor Found In: 2.10.0 Severity: 3 Indicates the the issue is on the priority list for next milestone. Type: Bug Indicates issue is an undesired behavior, usually caused by code error.
Projects
None yet
Development

No branches or pull requests

4 participants