Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] starrocks ent fe pod always PodInitializing after hscale out fe and be then restart #7663

Open
JashBook opened this issue Jun 28, 2024 · 3 comments
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@JashBook
Copy link
Collaborator

JashBook commented Jun 28, 2024

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. install starrocks ent addon
helm upgrade --install --namespace kb-system kb-addon-starrocks kubeblocks-enterprise/starrocks --version 0.9.0
  1. create starrocks ent cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: strsent-nerqht
  namespace: default
spec:
  terminationPolicy: Delete
  componentSpecs:
    - name: be
      componentDef: starrocks-be
      replicas: 1
      resources:
        requests:
          cpu: 1000m
          memory: 1Gi
        limits:
          cpu: 1000m
          memory: 1Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: fe
      componentDef: starrocks-fe-sn
      replicas: 1
      resources:
        requests:
          cpu: 1000m
          memory: 1Gi
        limits:
          cpu: 1000m
          memory: 1Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  1. do some ops
kbcli cluster hscale strsent-nerqht --auto-approve --force=true --components fe --replicas 2 --namespace default

kbcli cluster hscale strsent-nerqht --auto-approve --force=true --components be --replicas 2 --namespace default 

kbcli cluster vscale strsent-nerqht --auto-approve --force=true                 --components fe                 --cpu 1100m                 --memory 2Gi --namespace default
  1. See error
➜  ~ kubectl get pod -l app.kubernetes.io/instance=strsent-nerqht
NAME                  READY   STATUS            RESTARTS   AGE
strsent-nerqht-be-0   3/3     Running           0          23m
strsent-nerqht-be-1   3/3     Running           0          22m
strsent-nerqht-fe-0   0/3     PodInitializing   0          20m
strsent-nerqht-fe-1   3/3     Running           0          20m
➜  ~ 
➜  ~ kubectl get ops  -l app.kubernetes.io/instance=strsent-nerqht
NAME                                   TYPE              CLUSTER          STATUS    PROGRESS   AGE
strsent-nerqht-verticalscaling-9vbgm   VerticalScaling   strsent-nerqht   Running   1/2        20m
➜  ~ 
➜  ~ 
➜  ~ kubectl get cluster strsent-nerqht
NAME             CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS     AGE
strsent-nerqht                                  Delete               Updating   43m

describe cluster

 kubectl describe cluster strsent-nerqht 
Name:         strsent-nerqht
Namespace:    default
Labels:       app.kubernetes.io/instance=strsent-nerqht
Annotations:  kubeblocks.io/ops-request: [{"name":"strsent-nerqht-verticalscaling-9vbgm","type":"VerticalScaling"}]
              kubeblocks.io/reconcile: 2024-06-28T01:35:33.833177453Z
API Version:  apps.kubeblocks.io/v1alpha1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2024-06-28T01:34:19Z
  Finalizers:
    cluster.kubeblocks.io/finalizer
  Generation:  12
  Managed Fields:
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:terminationPolicy:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2024-06-28T01:34:19Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:app.kubernetes.io/instance:
    Manager:      kbcli
    Operation:    Update
    Time:         2024-06-28T01:36:43Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kubeblocks.io/ops-request:
          f:kubeblocks.io/reconcile:
        f:finalizers:
          .:
          v:"cluster.kubeblocks.io/finalizer":
      f:spec:
        f:componentSpecs:
        f:resources:
          .:
          f:cpu:
          f:memory:
        f:services:
        f:storage:
          .:
          f:size:
    Manager:      manager
    Operation:    Update
    Time:         2024-06-28T01:56:12Z
    API Version:  apps.kubeblocks.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        .:
        f:components:
          .:
          f:be:
            .:
            f:phase:
            f:podsReady:
            f:podsReadyTime:
          f:fe:
            .:
            f:phase:
            f:podsReady:
            f:podsReadyTime:
        f:conditions:
        f:observedGeneration:
        f:phase:
    Manager:         manager
    Operation:       Update
    Subresource:     status
    Time:            2024-06-28T01:56:14Z
  Resource Version:  410692079
  UID:               9b4a3e93-5c2c-4d6a-8240-f6d742bd3e4c
Spec:
  Component Specs:
    Component Def:  starrocks-be
    Name:           be
    Replicas:       2
    Resources:
      Limits:
        Cpu:     1100m
        Memory:  2Gi
      Requests:
        Cpu:          1100m
        Memory:       2Gi
    Service Version:  3.2.2
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:  20Gi
    Component Def:    starrocks-fe-sn
    Name:             fe
    Replicas:         2
    Resources:
      Limits:
        Cpu:     1100m
        Memory:  2Gi
      Requests:
        Cpu:          1100m
        Memory:       2Gi
    Service Version:  3.2.2
    Volume Claim Templates:
      Name:  data
      Spec:
        Access Modes:
          ReadWriteOnce
        Resources:
          Requests:
            Storage:  24Gi
  Resources:
    Cpu:     0
    Memory:  0
  Services:
    Annotations:
      networking.gke.io/load-balancer-type:  Internal
    Component Selector:                      fe
    Name:                                    fe-vpc
    Service Name:                            fe-vpc
    Spec:
      Ports:
        Name:         fe-http
        Node Port:    30243
        Port:         8030
        Protocol:     TCP
        Target Port:  http-port
        Name:         fe-mysql
        Node Port:    30369
        Port:         9030
        Protocol:     TCP
        Target Port:  query-port
      Type:           LoadBalancer
  Storage:
    Size:              0
  Termination Policy:  Delete
Status:
  Components:
    Be:
      Phase:            Running
      Pods Ready:       true
      Pods Ready Time:  2024-06-28T01:56:14Z
    Fe:
      Phase:            Updating
      Pods Ready:       false
      Pods Ready Time:  2024-06-28T01:54:53Z
  Conditions:
    Last Transition Time:  2024-06-28T01:34:19Z
    Message:               The operator has started the provisioning of Cluster: strsent-nerqht
    Observed Generation:   12
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
    Last Transition Time:  2024-06-28T01:38:05Z
    Message:               Successfully applied for resources
    Observed Generation:   12
    Reason:                ApplyResourcesSucceed
    Status:                True
    Type:                  ApplyResources
    Last Transition Time:  2024-06-28T01:56:13Z
    Message:               pods are not ready in Components: [fe], refer to related component message in Cluster.status.components
    Reason:                ReplicasNotReady
    Status:                False
    Type:                  ReplicasReady
    Last Transition Time:  2024-06-28T01:56:13Z
    Message:               pods are unavailable in Components: [fe], refer to related component message in Cluster.status.components
    Reason:                ComponentsNotReady
    Status:                False
    Type:                  Ready
  Observed Generation:     12
  Phase:                   Updating
Events:
  Type     Reason                    Age                 From                  Message
  ----     ------                    ----                ----                  -------
  Normal   ComponentPhaseTransition  47m (x2 over 47m)   cluster-controller    component is Creating
  Warning  Unhealthy                 45m (x5 over 46m)   event-controller      Pod strsent-nerqht-be-0: Startup probe failed: Get "http://10.128.2.63:8040/api/health": dial tcp 10.128.2.63:8040: connect: connection refused
  Normal   AllReplicasReady          45m                 cluster-controller    all pods of components are ready, waiting for the probe detection successful
  Normal   ClusterReady              45m                 cluster-controller    Cluster: strsent-nerqht is ready, current phase is Running
  Normal   Running                   45m                 cluster-controller    Cluster: strsent-nerqht is ready, current phase is Running
  Warning  ComponentsNotReady        43m (x2 over 45m)   cluster-controller    pods are unavailable in Components: [be], refer to related component message in Cluster.status.components
  Warning  ReplicasNotReady          43m (x2 over 45m)   cluster-controller    pods are not ready in Components: [be], refer to related component message in Cluster.status.components
  Normal   ApplyResourcesSucceed     43m (x6 over 47m)   cluster-controller    Successfully applied for resources
  Warning  ReplicasNotReady          43m                 cluster-controller    pods are not ready in Components: [be fe], refer to related component message in Cluster.status.components
  Normal   ComponentPhaseTransition  43m (x2 over 43m)   cluster-controller    component is Updating
  Normal   HorizontalScale           41m (x2 over 41m)   component-controller  start horizontal scale component fe of cluster strsent-nerqht from 1 to 2
  Normal   HorizontalScale           33m                 component-controller  start horizontal scale component fe of cluster strsent-nerqht from 2 to 0
  Normal   HorizontalScale           33m                 component-controller  start horizontal scale component be of cluster strsent-nerqht from 1 to 0
  Normal   HorizontalScale           32m                 component-controller  start horizontal scale component fe of cluster strsent-nerqht from 0 to 2
  Normal   HorizontalScale           32m                 component-controller  start horizontal scale component be of cluster strsent-nerqht from 0 to 1
  Normal   ComponentPhaseTransition  32m (x12 over 45m)  cluster-controller    component is Running
  Normal   PreCheckSucceed           26m (x11 over 47m)  cluster-controller    The operator has started the provisioning of Cluster: strsent-nerqht
  Normal   HorizontalScale           26m (x2 over 26m)   component-controller  start horizontal scale component be of cluster strsent-nerqht from 1 to 2

describe pod-0 fe

kubectl describe pod strsent-nerqht-fe-0 
Name:         strsent-nerqht-fe-0
Namespace:    default
Priority:     0
Node:         gke-infracreate-gke-kbdata-e2-standar-25c8fd47-9yic/10.10.0.70
Start Time:   Fri, 28 Jun 2024 09:56:57 +0800
Labels:       app.kubernetes.io/component=starrocks-fe-sn
              app.kubernetes.io/instance=strsent-nerqht
              app.kubernetes.io/managed-by=kubeblocks
              app.kubernetes.io/name=starrocks-fe-sn
              app.kubernetes.io/version=starrocks-fe-sn
              apps.kubeblocks.io/cluster-uid=9b4a3e93-5c2c-4d6a-8240-f6d742bd3e4c
              apps.kubeblocks.io/component-name=fe
              apps.kubeblocks.io/pod-name=strsent-nerqht-fe-0
              componentdefinition.kubeblocks.io/name=starrocks-fe-sn
              controller-revision-hash=6bc67dbc6c
              workloads.kubeblocks.io/instance=strsent-nerqht-fe
              workloads.kubeblocks.io/managed-by=InstanceSet
Annotations:  apps.kubeblocks.io/component-replicas: 2
              kubeblocks.io/restart: 2024-06-28T01:50:32Z
Status:       Pending
IP:           10.128.2.114
IPs:
  IP:           10.128.2.114
Controlled By:  InstanceSet/strsent-nerqht-fe
Init Containers:
  init-lorry:
    Container ID:  containerd://8826873260aa831e8f604d768454b96fb08f28bc16c85bff916afc13dd365130
    Image:         docker.io/apecloud/kubeblocks-tools:0.9.0-beta.39
    Image ID:      docker.io/apecloud/kubeblocks-tools@sha256:5c137c9ae94ef615be726bbd35df0a31217a3701b1c64e5773321b88e287afa8
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /bin/lorry
      /config
      /kubeblocks/
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 28 Jun 2024 09:56:59 +0800
      Finished:     Fri, 28 Jun 2024 09:57:01 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      strsent-nerqht-fe-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:      <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:  <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      MYSQL_PWD:           <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      KB_POD_NAME:         strsent-nerqht-fe-0 (v1:metadata.name)
      KB_POD_UID:           (v1:metadata.uid)
      KB_NAMESPACE:        default (v1:metadata.namespace)
      KB_SA_NAME:           (v1:spec.serviceAccountName)
      KB_NODENAME:          (v1:spec.nodeName)
      KB_HOST_IP:           (v1:status.hostIP)
      KB_POD_IP:            (v1:status.podIP)
      KB_POD_IPS:           (v1:status.podIPs)
      KB_HOSTIP:            (v1:status.hostIP)
      KB_PODIP:             (v1:status.podIP)
      KB_PODIPS:            (v1:status.podIPs)
      KB_POD_FQDN:         $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc
    Mounts:
      /kubeblocks from kubeblocks (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro)
  starrocks-tools:
    Container ID:  containerd://5882df76793f3385db1ad3625455e5199cb8b92e6d128d75d0b1f5e68e938a7c
    Image:         docker.io/apecloud/starrocks-tools:3.2.2
    Image ID:      docker.io/apecloud/starrocks-tools@sha256:fd9b4e989932b172368cdd1de986845ea96c0d5c19efd4c7fe3bea11bd7aa0f5
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      /bin/mysql
      /kb_tools/mysql
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 28 Jun 2024 09:57:03 +0800
      Finished:     Fri, 28 Jun 2024 09:57:04 +0800
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      strsent-nerqht-fe-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:      <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:  <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      MYSQL_PWD:           <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      KB_POD_NAME:         strsent-nerqht-fe-0 (v1:metadata.name)
      KB_POD_UID:           (v1:metadata.uid)
      KB_NAMESPACE:        default (v1:metadata.namespace)
      KB_SA_NAME:           (v1:spec.serviceAccountName)
      KB_NODENAME:          (v1:spec.nodeName)
      KB_HOST_IP:           (v1:status.hostIP)
      KB_POD_IP:            (v1:status.podIP)
      KB_POD_IPS:           (v1:status.podIPs)
      KB_HOSTIP:            (v1:status.hostIP)
      KB_PODIP:             (v1:status.podIP)
      KB_PODIPS:            (v1:status.podIPs)
      KB_POD_FQDN:         $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc
      TOOLS_SCRIPTS_PATH:  /opt/kb-tools/reload/fe-cm
    Mounts:
      /kb_tools from kb-tools (rw)
      /opt/config-manager from config-manager-config (rw)
      /opt/kb-tools/reload/fe-cm from cm-script-fe-cm (rw)
      /opt/starrocks/fe/conf from fe-cm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro)
Containers:
  fe:
    Container ID:  
    Image:         docker.io/starrocks/fe-ubuntu:3.2.2
    Image ID:      
    Ports:         8030/TCP, 9020/TCP, 9030/TCP, 9010/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      bash
      -c
      /opt/starrocks/fe_entrypoint.sh ${FE_DISCOVERY_SERVICE_NAME}
      
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1100m
      memory:  2Gi
    Requests:
      cpu:      1100m
      memory:   2Gi
    Liveness:   http-get http://:8030/api/health delay=0s timeout=1s period=5s #success=1 #failure=3
    Readiness:  http-get http://:8030/api/health delay=0s timeout=1s period=5s #success=1 #failure=3
    Startup:    http-get http://:8030/api/health delay=0s timeout=1s period=5s #success=1 #failure=60
    Environment Variables from:
      strsent-nerqht-fe-env      ConfigMap  Optional: false
      strsent-nerqht-fe-rsm-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:        <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:    <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      MYSQL_PWD:             <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      KB_POD_NAME:           strsent-nerqht-fe-0 (v1:metadata.name)
      KB_POD_UID:             (v1:metadata.uid)
      KB_NAMESPACE:          default (v1:metadata.namespace)
      KB_SA_NAME:             (v1:spec.serviceAccountName)
      KB_NODENAME:            (v1:spec.nodeName)
      KB_HOST_IP:             (v1:status.hostIP)
      KB_POD_IP:              (v1:status.podIP)
      KB_POD_IPS:             (v1:status.podIPs)
      KB_HOSTIP:              (v1:status.hostIP)
      KB_PODIP:               (v1:status.podIP)
      KB_PODIPS:              (v1:status.podIPs)
      KB_POD_FQDN:           $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc
      TZ:                    Asia/Shanghai
      POD_NAME:              strsent-nerqht-fe-0 (v1:metadata.name)
      POD_IP:                 (v1:status.podIP)
      HOST_IP:                (v1:status.hostIP)
      POD_NAMESPACE:         default (v1:metadata.namespace)
      HOST_TYPE:             FQDN
      COMPONENT_NAME:        fe
      CONFIGMAP_MOUNT_PATH:  /etc/starrocks/fe/conf
      SERVICE_PORT:          8030
    Mounts:
      /kb_tools from kb-tools (rw)
      /opt/starrocks/fe/conf from fe-cm (rw)
      /opt/starrocks/fe/log from log (rw)
      /opt/starrocks/fe/meta from data (rw)
      /scripts from scripts (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro)
  lorry:
    Container ID:  
    Image:         docker.io/starrocks/fe-ubuntu:3.2.2
    Image ID:      
    Ports:         3501/TCP, 50001/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /kubeblocks/lorry
      --port
      3501
      --grpcport
      50001
      --config-path
      /kubeblocks/config/lorry/components/
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Startup:   tcp-socket :3501 delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      strsent-nerqht-fe-env      ConfigMap  Optional: false
      strsent-nerqht-fe-rsm-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:        <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:    <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      MYSQL_PWD:             <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      KB_POD_NAME:           strsent-nerqht-fe-0 (v1:metadata.name)
      KB_POD_UID:             (v1:metadata.uid)
      KB_NAMESPACE:          default (v1:metadata.namespace)
      KB_SA_NAME:             (v1:spec.serviceAccountName)
      KB_NODENAME:            (v1:spec.nodeName)
      KB_HOST_IP:             (v1:status.hostIP)
      KB_POD_IP:              (v1:status.podIP)
      KB_POD_IPS:             (v1:status.podIPs)
      KB_HOSTIP:              (v1:status.hostIP)
      KB_PODIP:               (v1:status.podIP)
      KB_PODIPS:              (v1:status.podIPs)
      KB_POD_FQDN:           $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc
      KB_BUILTIN_HANDLER:    custom
      KB_SERVICE_USER:       <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      KB_SERVICE_PASSWORD:   <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      KB_SERVICE_PORT:       8030
      KB_DATA_PATH:          /opt/starrocks/fe/meta
      KB_ACTION_COMMANDS:    {"memberLeave":["/bin/bash","-c","#!/usr/bin/env bash\n\nset -x\nset -o errexit\n\nleader_host=\"\"\nleave_member_host=\"\"\nleave_member_port=\"\"\nhelper_endpoints=\"\"\ncandidate_names=\"\"\n\nfunction info() {\n    echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $*\"\n}\n\n# root@x-fe-0:/opt/starrocks#  mysql -h 127.0.0.1 -P 9030 -e \"show frontends\"\n# +-------------------------------------------------------------------------------+------------------------------------------------------------+-------------+----------+-----------+---------+----------+------------+------+-------+-------------------+---------------------+----------+--------+---------------------+---------------+\n# | Name                                                                          | IP                                                         | EditLogPort | HttpPort | QueryPort | RpcPort | Role     | ClusterId  | Join | Alive | ReplayedJournalId | LastHeartbeat       | IsHelper | ErrMsg | StartTime           | Version       |\n# +-------------------------------------------------------------------------------+------------------------------------------------------------+-------------+----------+-----------+---------+----------+------------+------+-------+-------------------+---------------------+----------+--------+---------------------+---------------+\n# | x-fe-1.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local_9010_1717662978660 | x-fe-1.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local | 9010        | 8030     | 9030      | 9020    | FOLLOWER | 1847720530 | true | true  | 179               | 2024-06-06 16:42:30 | true     |        | 2024-06-06 16:36:30 | 3.2.2-269e832 |\n# | x-fe-0.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local_9010_1717662806744 | x-fe-0.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local | 9010        | 8030     | 9030      | 9020    | LEADER   | 1847720530 | true | true  | 180               | 2024-06-06 16:42:30 | true     |        | 2024-06-06 16:33:47 | 3.2.2-269e832 |\n# | x-fe-2.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local_9010_1717662978644 | x-fe-2.x-fe-headless.kubeblocks-cloud-ns.svc.cluster.local | 9010        | 8030     | 9030      | 9020    | FOLLOWER | 1847720530 | true | true  | 179               | 2024-06-06 16:42:30 | true     |        | 2024-06-06 16:36:41 | 3.2.2-269e832 |\n# +-------------------------------------------------------------------------------+------------------------------------------------------------+-------------+----------+-----------+---------+----------+------------+------+-------+-------------------+---------------------+----------+--------+---------------------+---------------+\nfunction show_frontends() {\n    mysql -N -B -h 127.0.0.1 -P 9030 -e \"show frontends\"\n}\n\nfunction switch_leader() {\n    java -jar /opt/starrocks/fe/lib/starrocks-bdb-je*.jar DbGroupAdmin -helperHosts \"${helper_endpoints}\" -groupName PALO_JOURNAL_GROUP -transferMaster -force \"${candidate_names}\" 5000\n}\n\nfunction wait_for_leader_switched() {\n    until [[ $(show_frontends | grep 'LEADER' | awk '{print $2}') != ${KB_LEAVE_MEMBER_POD_NAME}* ]]; do\n        sleep 5\n        info \"waiting for leader to be switched\"\n    done\n}\n\n# execute a mysql command and iterate the output line by line\noutput=$(show_frontends)\nwhile IFS= read -r line; do\n    name=$(echo \"$line\" | awk '{print $1}')\n    ip=$(echo \"$line\" | awk '{print $2}')\n    edit_log_port=$(echo \"$line\" | awk '{print $3}')\n    role=$(echo \"$line\" | awk '{print $7}')\n    is_leaving=False\n    if [[ ${ip} == ${KB_LEAVE_MEMBER_POD_NAME}* ]]; then\n        is_leaving=True\n        leave_member_host=${ip}\n        leave_member_port=${edit_log_port}\n    fi\n    if [ \"${role}\" == \"LEADER\" ]; then\n        leader_host=${ip}\n    fi\n    if [ \"${is_leaving}\" == \"False\" ]; then\n        if [ -n \"${helper_endpoints}\" ]; then\n            helper_endpoints=${helper_endpoints},${ip}:${edit_log_port}\n            candidate_names=${candidate_names},${name}\n        else\n            helper_endpoints=${ip}:${edit_log_port}\n            candidate_names=${name}\n        fi\n    fi\ndone \u003c\u003c\u003c \"$output\"\n\ninfo \"leave member: ${leave_member_host}:${leave_member_port}\"\ninfo \"leader: ${leader_host}\"\ninfo \"helper hosts: ${helper_endpoints}\"\ninfo \"candidate hosts: ${candidate_names}\"\n\n# The leader will exit if lost it's leader role\nif [[ ${leader_host} == ${KB_LEAVE_MEMBER_POD_NAME}* ]]; then\n    switch_leader\n    wait_for_leader_switched\nfi\n\nmysql -h \"${leader_host}\" -P 9030 -e \"alter system drop follower '${leave_member_host}:${leave_member_port}';\"\n"]}
      TZ:                    Asia/Shanghai
      POD_NAME:              strsent-nerqht-fe-0 (v1:metadata.name)
      POD_IP:                 (v1:status.podIP)
      HOST_IP:                (v1:status.hostIP)
      POD_NAMESPACE:         default (v1:metadata.namespace)
      HOST_TYPE:             FQDN
      COMPONENT_NAME:        fe
      CONFIGMAP_MOUNT_PATH:  /etc/starrocks/fe/conf
      SERVICE_PORT:          8030
    Mounts:
      /kubeblocks from kubeblocks (rw)
      /opt/starrocks/fe/meta from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro)
  config-manager:
    Container ID:  
    Image:         docker.io/apecloud/kubeblocks-tools:0.9.0-beta.39
    Image ID:      
    Port:          9901/TCP
    Host Port:     0/TCP
    Command:
      env
    Args:
      PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$(TOOLS_PATH)
      /bin/reloader
      --log-level
      info
      --operator-update-enable
      --tcp
      9901
      --config
      /opt/config-manager/config-manager.yaml
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Environment Variables from:
      strsent-nerqht-fe-env      ConfigMap  Optional: false
      strsent-nerqht-fe-rsm-env  ConfigMap  Optional: false
    Environment:
      STARROCKS_USER:         <set to the key 'username' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      STARROCKS_PASSWORD:     <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      MYSQL_PWD:              <set to the key 'password' in secret 'strsent-nerqht-fe-account-root'>  Optional: false
      KB_POD_NAME:            strsent-nerqht-fe-0 (v1:metadata.name)
      KB_POD_UID:              (v1:metadata.uid)
      KB_NAMESPACE:           default (v1:metadata.namespace)
      KB_SA_NAME:              (v1:spec.serviceAccountName)
      KB_NODENAME:             (v1:spec.nodeName)
      KB_HOST_IP:              (v1:status.hostIP)
      KB_POD_IP:               (v1:status.podIP)
      KB_POD_IPS:              (v1:status.podIPs)
      KB_HOSTIP:               (v1:status.hostIP)
      KB_PODIP:                (v1:status.podIP)
      KB_PODIPS:               (v1:status.podIPs)
      KB_POD_FQDN:            $(KB_POD_NAME).strsent-nerqht-fe-headless.$(KB_NAMESPACE).svc
      CONFIG_MANAGER_POD_IP:   (v1:status.podIP)
      TOOLS_PATH:             /opt/kb-tools/reload/fe-cm:/opt/config-manager:/kb_tools
    Mounts:
      /kb_tools from kb-tools (rw)
      /opt/config-manager from config-manager-config (rw)
      /opt/kb-tools/reload/fe-cm from cm-script-fe-cm (rw)
      /opt/starrocks/fe/conf from fe-cm (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jljd4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  log:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  fe-cm:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      strsent-nerqht-fe-fe-cm
    Optional:  false
  scripts:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      strsent-nerqht-fe-scripts
    Optional:  false
  cm-script-fe-cm:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sidecar-starrocks-scripts-strsent-nerqht
    Optional:  false
  config-manager-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      sidecar-strsent-nerqht-fe-config-manager-config
    Optional:  false
  kb-tools:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-strsent-nerqht-fe-0
    ReadOnly:   false
  kubeblocks:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-jljd4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 kb-data=true:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  21m   default-scheduler  Successfully assigned default/strsent-nerqht-fe-0 to gke-infracreate-gke-kbdata-e2-standar-25c8fd47-9yic
  Normal  Pulled     21m   kubelet            Container image "docker.io/apecloud/kubeblocks-tools:0.9.0-beta.39" already present on machine
  Normal  Created    21m   kubelet            Created container init-lorry
  Normal  Started    21m   kubelet            Started container init-lorry
  Normal  Pulled     20m   kubelet            Container image "docker.io/apecloud/starrocks-tools:3.2.2" already present on machine
  Normal  Created    20m   kubelet            Created container starrocks-tools
  Normal  Started    20m   kubelet            Started container starrocks-tools
  Normal  Pulled     20m   kubelet            Container image "docker.io/starrocks/fe-ubuntu:3.2.2" already present on machine
  Normal  Created    20m   kubelet            Created container fe
  Normal  Started    20m   kubelet            Started container fe

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@JashBook JashBook added the kind/bug Something isn't working label Jun 28, 2024
@JashBook JashBook added this to the Release 0.9.0 milestone Jun 28, 2024
@JashBook JashBook added the severity/major Great chance user will encounter the same problem label Jun 28, 2024
@JashBook JashBook changed the title [BUG] [BUG] starrocks ent fe pod always PodInitializing after hscale out fe and be then restart Jun 28, 2024
@iziang
Copy link
Contributor

iziang commented Jun 29, 2024

The FE pod has a post-start hook script used to set the root account password. There is an SQL command in the script that is getting stuck: mysql --connect-timeout=1 -h127.0.0.1 -uroot -P9030 -px xxxxxxxx -e show databases.
img_v3_02c9_b6a1599b-e320-4f50-95a8-340301c0304g

Attempting to establish a new connection using the MySQL client also gets stuck.
img_v3_02c9_31fd0796-8222-4d6f-a561-f2d8bde51a5g

The fe-1 pod is functioning normally, and using the MySQL client to connect and execute the SQL command show frontends shows that both FEs are operating normally.

img_v3_02c9_ac475a85-c1f4-467e-a155-6ed280ce059g

The log of fe-0:
fe.log

The stack of fe-0:
stack.log

The gc stat of fe-0:
img_v3_02c9_fd581ebf-374b-4354-8450-e5c245086d0g

The jvm flags of fe-0:

root@strsent-nerqht-fe-0:/opt/starrocks# jcmd 10 VM.flags
10:
-XX:-AlwaysTenure -XX:CICompilerCount=2 -XX:+CMSClassUnloadingEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:-CMSParallelRemarkEnabled -XX:ConcGCThreads=1 -XX:G1ConcRefinementThreads=2 -XX:G1HeapRegionSize=2097152 -XX:GCDrainStackTargetSize=64 -XX:InitialHeapSize=33554432 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=8589934592 -XX:MaxNewSize=5152702464 -XX:MaxTenuringThreshold=7 -XX:MinHeapDeltaBytes=2097152 -XX:-NeverTenure -XX:NonNMethodCodeHeapSize=5825164 -XX:NonProfiledCodeHeapSize=122916538 -XX:ProfiledCodeHeapSize=122916538 -XX:ReservedCodeCacheSize=251658240 -XX:+SegmentedCodeCache -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC

The fe.conf of fe-0:
img_v3_02c9_81fff07e-997c-4543-a7cf-11b347f50aeg

@JashBook JashBook removed the severity/major Great chance user will encounter the same problem label Jul 2, 2024
@JashBook
Copy link
Collaborator Author

JashBook commented Jul 5, 2024

shared-data cluster error

  1. create cluster
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: strsent-uwjbus
  namespace: default
spec:
  terminationPolicy: WipeOut
  componentSpecs:
    - name: cn
      componentDef: starrocks-cn
      replicas: 2
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
    - name: fe
      componentDef: starrocks-fe-sd
      replicas: 2
      resources:
        requests:
          cpu: 3000m
          memory: 8Gi
        limits:
          cpu: 3000m
          memory: 8Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
  1. stop start cluster
kbcli cluster stop strsent-uwjbus --auto-approve --force=true  --namespace default

kbcli cluster start strsent-uwjbus --force=true --namespace default 
  1. see error
kubectl get cluster strsent-uwjbus
NAME             CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS     AGE
strsent-uwjbus                                  WipeOut              Updating   17m

kubectl get pod -l app.kubernetes.io/instance=strsent-uwjbus
NAME                  READY   STATUS            RESTARTS   AGE
strsent-uwjbus-cn-0   3/3     Running           0          9m51s
strsent-uwjbus-cn-1   3/3     Running           0          9m51s
strsent-uwjbus-fe-0   0/3     PodInitializing   0          9m50s
strsent-uwjbus-fe-1   3/3     Running           0          9m51s

@shanshanying
Copy link
Contributor

This seems to be an issue with StarRocks FrontEnd. Maybe has been fixed in their latest version.

@ahjing99 ahjing99 modified the milestones: Release 0.9.0, Release 0.9.1 Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants