Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

the status.phase is always Pending after deleting a pod #96

Closed
mackwong opened this issue Aug 21, 2019 · 5 comments
Closed

the status.phase is always Pending after deleting a pod #96

mackwong opened this issue Aug 21, 2019 · 5 comments
Labels
bug Something isn't working

Comments

@mackwong
Copy link
Contributor

deploy a cassandra cluster, and the cassandracluster is:

$ kubectl get cassandracluster -n test my-cluster -o yaml

apiVersion: db.orange.com/v1alpha1
kind: CassandraCluster
metadata:
  finalizers:
  - kubernetes.io/pvc-to-delete
  name: my-cluster
  namespace: test
spec:
  baseImage: xxx.com/diamond/service-providers/cassandra
  dataCapacity: 20Gi
  dataStorageClass: ceph-rbd
  deletePVC: true
  gcStdout: true
  imageJolokiaSecret: {}
  imagePullSecret: {}
  imagepullpolicy: IfNotPresent
  maxPodUnavailable: 1
  nodesPerRacks: 1
  resources:
    limits:
      cpu: "1"
      memory: 2Gi
    requests:
      cpu: "1"
      memory: 2Gi
  runAsUser: 1000
  topology:
    dc:
    - name: dc1
      nodesPerRacks: 2
      rack:
      - name: rack1
  version: 3.11.4-8u212-0.3.2-release-cqlsh
status:
  cassandraRackStatus:
    dc1-rack1:
      cassandraLastAction:
        Name: Initializing
        endTime: "2019-08-21T10:21:48Z"
        status: Done
      phase: Running
      podLastOperation: {}
  lastClusterAction: Initializing
  lastClusterActionStatus: Done
  phase: Running
  seedlist:
  - my-cluster-dc1-rack1-0.my-cluster.test
  - my-cluster-dc1-rack1-1.my-cluster.test

I delete a pod:

kubectl  delete pod my-cluster-dc1-rack1-0  -n test

after a while, the cassandracluster becomes like this, and it doesn't change anymore.

apiVersion: db.orange.com/v1alpha1
kind: CassandraCluster
metadata:
  finalizers:
  - kubernetes.io/pvc-to-delete
  generation: 6
  name: my-cluster
  namespace: test
spec:
  baseImage: xxx.com/diamond/service-providers/cassandra
  dataCapacity: 20Gi
  dataStorageClass: ceph-rbd
  deletePVC: true
  gcStdout: true
  imageJolokiaSecret: {}
  imagePullSecret: {}
  imagepullpolicy: IfNotPresent
  maxPodUnavailable: 1
  nodesPerRacks: 1
  resources:
    limits:
      cpu: "1"
      memory: 2Gi
    requests:
      cpu: "1"
      memory: 2Gi
  runAsUser: 1000
  topology:
    dc:
    - name: dc1
      nodesPerRacks: 2
      rack:
      - name: rack1
  version: 3.11.4-8u212-0.3.2-release-cqlsh
status:
  cassandraRackStatus:
    dc1-rack1:
      cassandraLastAction:
        Name: Initializing
        endTime: "2019-08-21T10:21:48Z"
        status: Done
      phase: Running
      podLastOperation: {}
  lastClusterAction: Initializing
  lastClusterActionStatus: Done
  phase: Pending
  seedlist:
  - my-cluster-dc1-rack1-0.my-cluster.test
  - my-cluster-dc1-rack1-1.my-cluster.test

the status.phase is Pending.

@allamand
Copy link

Hi @mackwong, thanks for reporting.

Do you have access to Operator logs and to kubernetes events while this happened ?
did the deleted pod has been recreated ?

thanks

@mackwong
Copy link
Contributor Author

mackwong commented Aug 28, 2019

$ kubectl logs -f cassandra-operator-f97949c48-5cglh -n cassandra-demo

time="2019-08-28T06:05:36Z" level=debug msg="Statefulsets Are Equal: No Update" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:05:41Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:05:41Z" level=debug msg="Statefulsets Are Equal: No Update" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:05:46Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:05:46Z" level=debug msg="Statefulsets Are Equal: No Update" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:05:51Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:05:51Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:05:51Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:05:56Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:05:56Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:05:56Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:01Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:01Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:01Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:06Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:06Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:06Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:11Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:11Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:11Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:16Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:16Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:16Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="Update Rack Status: Pending" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:17Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:17Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:17Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:22Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:22Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:22Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:22Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:22Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:27Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:27Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:27Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:27Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:27Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:32Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:32Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:32Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:32Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:32Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:37Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:37Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:37Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:37Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:37Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:42Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:42Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:42Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:42Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:42Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:47Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:47Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:47Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:47Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:47Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:52Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:52Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:52Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:52Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:52Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:57Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:06:57Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:57Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:06:57Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:06:57Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:02Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:07:02Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:02Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:02Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:02Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:07Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:07:07Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:07Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:07Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:07Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:12Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:07:12Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:12Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:12Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:12Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:17Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:07:17Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:17Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:17Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:17Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:22Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:07:22Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:22Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:22Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:22Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:27Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing) Replicas Number Not OK: 2 on 2, ready[1]"
time="2019-08-28T06:07:27Z" level=info msg="We don't check for new action before the cluster become stable again" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:27Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:27Z" level=info msg="Cluster has Disruption on Pods, we wait before applying any change to statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:27Z" level=info msg="Waiting Rack to be running before continuing, we break ReconcileRack after updated statefulset" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:32Z" level=info msg="[my-cluster][dc1-rack1]: StatefulSet(Initializing): Replicas Number OK: ready[2]"
time="2019-08-28T06:07:32Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:32Z" level=debug msg="Statefulsets Are Equal: No Update" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:32Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:32Z" level=debug msg="Statefulsets Are Equal: No Update" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:32Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:32Z" level=debug msg="Statefulsets Are Equal: No Update" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:37Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:37Z" level=debug msg="Statefulsets Are Equal: No Update" cluster=my-cluster dc-rack=dc1-rack1
time="2019-08-28T06:07:42Z" level=debug msg="Display information about pods" cluster=my-cluster podLastOperation.OperatorName= podLastOperation.Pods="[]" rack=dc1-rack1
time="2019-08-28T06:07:42Z" level=debug msg="Statefulsets Are Equal: No Update" cluster=my-cluster dc-rack=dc1-rack1
$ kc get event -n test
LAST SEEN   TYPE      REASON                    OBJECT                                              MESSAGE
7m10s       Normal    ExternalProvisioning      persistentvolumeclaim/data-my-cluster-dc1-rack1-0   waiting for a volume to be created, either by external provisioner "csi-rbdplugin" or manually created by system administrator
6m22s       Normal    Provisioning              persistentvolumeclaim/data-my-cluster-dc1-rack1-0   External provisioner is provisioning volume for claim "test/data-my-cluster-dc1-rack1-0"
6m21s       Normal    ProvisioningSucceeded     persistentvolumeclaim/data-my-cluster-dc1-rack1-0   Successfully provisioned volume pvc-5889b689-c112-43b0-a534-13a448ee834f
5m56s       Normal    ExternalProvisioning      persistentvolumeclaim/data-my-cluster-dc1-rack1-1   waiting for a volume to be created, either by external provisioner "csi-rbdplugin" or manually created by system administrator
5m9s        Normal    Provisioning              persistentvolumeclaim/data-my-cluster-dc1-rack1-1   External provisioner is provisioning volume for claim "test/data-my-cluster-dc1-rack1-1"
5m8s        Normal    ProvisioningSucceeded     persistentvolumeclaim/data-my-cluster-dc1-rack1-1   Successfully provisioned volume pvc-4f6ccb23-15ab-46b3-b3d0-178024fef2a6
6m22s       Normal    ProvisionedSuccessfully   serviceinstance/demo                                The instance was provisioned successfully
7m10s       Warning   FailedScheduling          pod/my-cluster-dc1-rack1-0                          pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
7m8s        Normal    Scheduled                 pod/my-cluster-dc1-rack1-0                          Successfully assigned test/my-cluster-dc1-rack1-0 to bj-idc1-10-10-40-57
7m8s        Normal    SuccessfulAttachVolume    pod/my-cluster-dc1-rack1-0                          AttachVolume.Attach succeeded for volume "pvc-5889b689-c112-43b0-a534-13a448ee834f"
7m2s        Normal    Pulled                    pod/my-cluster-dc1-rack1-0                          Container image "registry.test.com/diamond/service-providers/cassandra:3.11.4-8u212-0.3.2-release-cqlsh" already present on machine
7m1s        Normal    Created                   pod/my-cluster-dc1-rack1-0                          Created container cassandra
7m1s        Normal    Started                   pod/my-cluster-dc1-rack1-0                          Started container cassandra
2m57s       Normal    Killing                   pod/my-cluster-dc1-rack1-0                          Stopping container cassandra
2m29s       Warning   Unhealthy                 pod/my-cluster-dc1-rack1-0                          Liveness probe failed:
2m29s       Warning   FailedPreStopHook         pod/my-cluster-dc1-rack1-0                          Exec lifecycle hook ([/bin/bash -c /etc/cassandra/pre_stop.sh]) for Container "cassandra" in Pod "my-cluster-dc1-rack1-0_test(34fbeb0c-902e-4b84-a8a6-b73126d4ba05)" failed - error: command '/bin/bash -c /etc/cassandra/pre_stop.sh' exited with 137: , message: "INFO  [main] 2019-08-28 06:06:13,282 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml\nINFO  [main] 2019-08-28 06:06:14,001 Config.java:496 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=PasswordAuthenticator; authorizer=CassandraAuthorizer; auto_bootstrap=true; auto_snapshot=true; back_pressure_enabled=false; back_pressure_strategy=null; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=10.244.0.64; broadcast_rpc_address=10.244.0.64; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; cdc_enabled=false; cdc_free_space_check_interval_ms=250; cdc_raw_directory=null; cdc_total_space_in_mb=0; client_encryption_options=<REDACTED>; cluster_name=my-cluster; column_index_cache_size_in_kb=2; column_index_size_in_kb=64; commit_failure_policy=stop; commitlog_compression=null; commitlog_directory=/var/lib/cassandra/commitlog; commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; commitlog_segment_size_in_mb=32; commitlog_sync=periodic; commitlog_sync_batch_window_in_ms=NaN; commitlog_sync_period_in_ms=10000; commitlog_total_space_in_mb=null; compaction_large_partition_warning_threshold_mb=100; compaction_throughput_mb_per_sec=16; concurrent_compactors=null; concurrent_counter_writes=32; concurrent_materialized_view_writes=32; concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; credentials_cache_max_entries=1000; credentials_update_interval_in_ms=-1; credentials_validity_in_ms=2000; cross_node_timeout=false; data_file_directories=[Ljava.lang.String;@66d3eec0; disk_access_mode=auto; disk_failure_policy=stop; disk_optimization_estimate_percentile=0.95; disk_optimization_page_cross_chance=0.1; disk_optimization_strategy=ssd; dynamic_snitch=false; dynamic_snitch_badness_threshold=0.1; dynamic_snitch_reset_interval_in_ms=600000; dynamic_snitch_update_interval_in_ms=100; enable_materialized_views=true; enable_scripted_user_defined_functions=false; enable_user_defined_functions=false; enable_user_defined_functions_threads=true; encryption_options=null; endpoint_snitch=GossipingPropertyFileSnitch; file_cache_round_up=null; file_cache_size_in_mb=null; gc_log_threshold_in_ms=200; gc_warn_threshold_in_ms=1000; hinted_handoff_disabled_datacenters=[]; hinted_handoff_enabled=true; hinted_handoff_throttle_in_kb=1024; hints_compression=null; hints_directory=/var/lib/cassandra/hints; hints_flush_period_in_ms=10000; incremental_backups=false; index_interval=null; index_summary_capacity_in_mb=null; index_summary_resize_interval_in_minutes=60; initial_token=null; inter_dc_stream_throughput_outbound_megabits_per_sec=200; inter_dc_tcp_nodelay=false; internode_authenticator=null; internode_compression=all; internode_recv_buff_size_in_bytes=0; internode_send_buff_size_in_bytes=0; key_cache_keys_to_save=2147483647; key_cache_save_period=14400; key_cache_size_in_mb=null; listen_address=10.244.0.64; listen_interface=null; listen_interface_prefer_ipv6=false; listen_on_broadcast_address=false; max_hint_window_in_ms=10800000; max_hints_delivery_threads=2; max_hints_file_size_in_mb=128; max_mutation_size_in_kb=null; max_streaming_retries=3; max_value_size_in_mb=256; memtable_allocation_type=offheap_objects; memtable_cleanup_threshold=null; memtable_flush_writers=2; memtable_heap_space_in_mb=null; memtable_offheap_space_in_mb=null; min_free_space_per_drive_in_mb=50; native_transport_flush_in_batches_legacy=true; native_transport_max_concurrent_connections=-1; native_transport_max_concurrent_connections_per_ip=-1; native_transport_max_frame_size_in_mb=256; native_transport_max_threads=128; native_transport_port=9042; native_transport_port_ssl=null; num_tokens=256; otc_backlog_expiration_interval_ms=200; otc_coalescing_enough_coalesced_messages=8; otc_coalescing_strategy=DISABLED; otc_coalescing_window_us=200; partitioner=org.apache.cassandra.dht.Murmur3Partitioner; permissions_cache_max_entries=1000; permissions_update_interval_in_ms=-1; permissions_validity_in_ms=2000; phi_convict_threshold=8.0; prepared_statements_cache_size_mb=null; range_request_timeout_in_ms=10000; read_request_timeout_in_ms=5000; request_scheduler=org.apache.cassandra.scheduler.NoScheduler; request_scheduler_id=null; request_scheduler_options=null; request_timeout_in_ms=10000; role_manager=CassandraRoleManager; roles_cache_max_entries=1000; roles_update_interval_in_ms=-1; roles_validity_in_ms=2000; row_cache_class_name=org.apache.cassandra.cache.OHCProvider; row_cache_keys_to_save=2147483647; row_cache_save_period=0; row_cache_size_in_mb=0; rpc_address=0.0.0.0; rpc_interface=null; rpc_interface_prefer_ipv6=false; rpc_keepalive=true; rpc_listen_backlog=50; rpc_max_threads=2147483647; rpc_min_threads=16; rpc_port=9160; rpc_recv_buff_size_in_bytes=null; rpc_send_buff_size_in_bytes=null; rpc_server_type=sync; saved_caches_directory=/var/lib/cassandra/saved_caches; seed_provider=org.apache.cassandra.locator.SimpleSeedProvider{seeds=my-cluster-dc1-rack1-0.my-cluster.test.svc.cluster.local}; server_encryption_options=<REDACTED>; slow_query_log_timeout_in_ms=500; snapshot_before_compaction=false; ssl_storage_port=7001; sstable_preemptive_open_interval_in_mb=50; start_native_transport=true; start_rpc=false; storage_port=7000; stream_throughput_outbound_megabits_per_sec=200; streaming_keep_alive_period_in_secs=300; streaming_socket_timeout_in_ms=86400000; thrift_framed_transport_size_in_mb=15; thrift_max_message_length_in_mb=16; thrift_prepared_statements_cache_size_mb=null; tombstone_failure_threshold=100000; tombstone_warn_threshold=1000; tracetype_query_ttl=86400; tracetype_repair_ttl=604800; transparent_data_encryption_options=org.apache.cassandra.config.TransparentDataEncryptionOptions@1e04fa0a; trickle_fsync=false; trickle_fsync_interval_in_kb=10240; truncate_request_timeout_in_ms=60000; unlogged_batch_across_partitions_warn_threshold=10; user_defined_function_fail_timeout=1500; user_defined_function_warn_timeout=500; user_function_timeout_policy=die; windows_timer_interval=1; write_request_timeout_in_ms=5000]\nINFO  [main] 2019-08-28 06:06:14,002 DatabaseDescriptor.java:373 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap\nINFO  [main] 2019-08-28 06:06:14,002 DatabaseDescriptor.java:431 - Global memtable on-heap threshold is enabled at 30MB\nINFO  [main] 2019-08-28 06:06:14,002 DatabaseDescriptor.java:435 - Global memtable off-heap threshold is enabled at 30MB\nWARN  [main] 2019-08-28 06:06:14,006 DatabaseDescriptor.java:480 - Small commitlog volume detected at /var/lib/cassandra/commitlog; setting commitlog_total_space_in_mb to 5007.  You can override this in cassandra.yaml\nWARN  [main] 2019-08-28 06:06:14,008 DatabaseDescriptor.java:556 - Only 19.517GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots\nINFO  [main] 2019-08-28 06:06:14,115 RateBasedBackPressure.java:123 - Initialized back-pressure with high ratio: 0.9, factor: 5, flow: FAST, window size: 5000.\nINFO  [main] 2019-08-28 06:06:14,115 DatabaseDescriptor.java:735 - Back-pressure is disabled with strategy null.\nINFO  [main] 2019-08-28 06:06:14,189 GossipingPropertyFileSnitch.java:68 - Unable to load cassandra-topology.properties; compatibility mode disabled\n"
2m29s       Warning   Unhealthy                 pod/my-cluster-dc1-rack1-0                          Readiness probe failed:
2m27s       Normal    Scheduled                 pod/my-cluster-dc1-rack1-0                          Successfully assigned test/my-cluster-dc1-rack1-0 to bj-idc1-10-10-40-57
2m23s       Normal    Pulled                    pod/my-cluster-dc1-rack1-0                          Container image "registry.test.com/diamond/service-providers/cassandra:3.11.4-8u212-0.3.2-release-cqlsh" already present on machine
2m23s       Normal    Created                   pod/my-cluster-dc1-rack1-0                          Created container cassandra
2m23s       Normal    Started                   pod/my-cluster-dc1-rack1-0                          Started container cassandra
5m56s       Warning   FailedScheduling          pod/my-cluster-dc1-rack1-1                          pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
5m55s       Normal    Scheduled                 pod/my-cluster-dc1-rack1-1                          Successfully assigned test/my-cluster-dc1-rack1-1 to bj-idc1-10-10-40-73
5m55s       Normal    SuccessfulAttachVolume    pod/my-cluster-dc1-rack1-1                          AttachVolume.Attach succeeded for volume "pvc-4f6ccb23-15ab-46b3-b3d0-178024fef2a6"
5m4s        Normal    Pulled                    pod/my-cluster-dc1-rack1-1                          Container image "registry.test.com/diamond/service-providers/cassandra:3.11.4-8u212-0.3.2-release-cqlsh" already present on machine
5m4s        Normal    Created                   pod/my-cluster-dc1-rack1-1                          Created container cassandra
5m3s        Normal    Started                   pod/my-cluster-dc1-rack1-1                          Started container cassandra
7m10s       Normal    SuccessfulCreate          statefulset/my-cluster-dc1-rack1                    create Claim data-my-cluster-dc1-rack1-0 Pod my-cluster-dc1-rack1-0 in StatefulSet my-cluster-dc1-rack1 success
2m27s       Normal    SuccessfulCreate          statefulset/my-cluster-dc1-rack1                    create Pod my-cluster-dc1-rack1-0 in StatefulSet my-cluster-dc1-rack1 successful
5m56s       Normal    SuccessfulCreate          statefulset/my-cluster-dc1-rack1                    create Claim data-my-cluster-dc1-rack1-1 Pod my-cluster-dc1-rack1-1 in StatefulSet my-cluster-dc1-rack1 success
5m56s       Normal    SuccessfulCreate          statefulset/my-cluster-dc1-rack1                    create Pod my-cluster-dc1-rack1-1 in StatefulSet my-cluster-dc1-rack1 successful
7m10s       Normal    NoPods                    poddisruptionbudget/my-cluster                      No matching pods found
$ kubectl get po -n test
NAME                     READY   STATUS    RESTARTS   AGE
my-cluster-dc1-rack1-0   1/1     Running   0          80s
my-cluster-dc1-rack1-1   1/1     Running   0          4m49s
$ kc get cassandracluster -n test my-cluster -o yaml
apiVersion: db.orange.com/v1alpha1
kind: CassandraCluster
metadata:
  annotations:
    cassandraclusters.db.orange.com/last-applied-configuration: '{"kind":"CassandraCluster","apiVersion":"db.orange.com/v1alpha1","metadata":{"name":"my-cluster","namespace":"test","selfLink":"/apis/db.orange.com/v1alpha1/namespaces/test/cassandraclusters/my-cluster","uid":"907c913f-7bcf-49ee-835e-235ae68d4992","generation":6,"creationTimestamp":"2019-08-28T06:01:34Z","labels":{"instance_name":"666d211f-c959-11e9-b739-a2502451f700","plan_id":"2db0b31d-6912-4d24-8704-cfdf9b98af81","service_id":"137a3ded-59ab-4ece-bbda-9cfff850a1f3"},"finalizers":["kubernetes.io/pvc-to-delete"]},"spec":{"nodesPerRacks":1,"baseImage":"registry.test.com/diamond/service-providers/cassandra","version":"3.11.4-8u212-0.3.2-release-cqlsh","imagepullpolicy":"IfNotPresent","runAsUser":1000,"resources":{"requests":{"cpu":"1","memory":"2Gi"},"limits":{"cpu":"1","memory":"2Gi"}},"deletePVC":true,"gcStdout":true,"maxPodUnavailable":1,"dataCapacity":"20Gi","dataStorageClass":"ceph-rbd","imagePullSecret":{},"imageJolokiaSecret":{},"topology":{"dc":[{"name":"dc1","rack":[{"name":"rack1"}],"nodesPerRacks":2}]}},"status":{}}'
  creationTimestamp: "2019-08-28T06:01:34Z"
  finalizers:
  - kubernetes.io/pvc-to-delete
  generation: 6
  labels:
    instance_name: 666d211f-c959-11e9-b739-a2502451f700
    plan_id: 2db0b31d-6912-4d24-8704-cfdf9b98af81
    service_id: 137a3ded-59ab-4ece-bbda-9cfff850a1f3
  name: my-cluster
  namespace: test
  resourceVersion: "4751173"
  selfLink: /apis/db.orange.com/v1alpha1/namespaces/test/cassandraclusters/my-cluster
  uid: 907c913f-7bcf-49ee-835e-235ae68d4992
spec:
  baseImage: registry.test.com/diamond/service-providers/cassandra
  dataCapacity: 20Gi
  dataStorageClass: ceph-rbd
  deletePVC: true
  gcStdout: true
  imageJolokiaSecret: {}
  imagePullSecret: {}
  imagepullpolicy: IfNotPresent
  maxPodUnavailable: 1
  nodesPerRacks: 1
  resources:
    limits:
      cpu: "1"
      memory: 2Gi
    requests:
      cpu: "1"
      memory: 2Gi
  runAsUser: 1000
  topology:
    dc:
    - name: dc1
      nodesPerRacks: 2
      rack:
      - name: rack1
  version: 3.11.4-8u212-0.3.2-release-cqlsh
status:
  cassandraRackStatus:
    dc1-rack1:
      cassandraLastAction:
        Name: Initializing
        endTime: "2019-08-28T06:04:01Z"
        status: Done
      phase: Running
      podLastOperation: {}
  lastClusterAction: Initializing
  lastClusterActionStatus: Done
  phase: Pending
  seedlist:
  - my-cluster-dc1-rack1-0.my-cluster.test
  - my-cluster-dc1-rack1-1.my-cluster.test

The deleted pod was recreated and running, but the status.phase was still Pending

@mackwong
Copy link
Contributor Author

I think the reason is that the lastClusterActionStatus is always Done during pod is recreating, so the status.Phase can not be updated because of:
https://github.com/Orange-OpenSource/cassandra-k8s-operator/blob/73fd24e4d80d853a640defa42b91345cb29847ba/pkg/controller/cassandracluster/reconcile.go#L527

@allamand allamand added the bug Something isn't working label Oct 8, 2019
@allamand
Copy link

Thanks @mackwong you're right, i'm going to fix this.

@allamand
Copy link

closed in #128

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants