Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After the operator was upgraded from v0.47.3 to v0.48.0, all pods in the original cr vmcluster restarted #1116

Closed
pity77 opened this issue Sep 26, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@pity77
Copy link

pity77 commented Sep 26, 2024

Hi bro!

I installed victoria-metrics-operator-0.34.8.tgz (operator:v0.47.3) using Helm and created a vmcluster. Everything was working fine at that point. However, this afternoon I noticed that a new version of the operator (v0.48.0) was available.

After upgrading, I found that all the pods in mzc/vmcluster-01 were restarted, and some containers were in ImagePullBackOff status because my images are stored in a private registry, and some of the worker nodes didn't have the images cached.

Upon running kubectl describe, I discovered that the vmcluster.spec.imagePullSecrets configuration was missing, and the StatefulSet also lacked this configuration.

I suspect that after upgrading the operator, it performed a validation check on the vmcluster, which resulted in the loss of the vmcluster.spec.imagePullSecrets configuration. This may have triggered the restart mechanism for both the StatefulSet and Deployment.

After my test, adding or subtracting imagePullSecrets in kubernetes v1.27.7 does cause StatefulSet and Deployment to restart its pods.

Looking forward to your reply.
[email protected]

Below is my vmcluster configuration : )
---
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMCluster
metadata:
  name: vmcluster-01
  namespace: mzc
spec:
  imagePullSecrets: 
    - name: "2000102274.private.registrykey"
  clusterVersion: "v1.103.0-cluster"
  retentionPeriod: "1"
  replicationFactor: 2
  vmstorage:
    replicaCount: 2
    storageDataPath: "/vm-data"
    image:
      repository: "hub-cn-beijing-6.kce.ksyun.com/victoriametrics/vmstorage"
    storage:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: "10Gi"
    resources:
      limits:
        cpu: "0.5"
        memory: "500Mi"
  vmselect:
    replicaCount: 2
    cacheMountPath: "/select-cache"
    image:
      repository: "hub-cn-beijing-6.kce.ksyun.com/victoriametrics/vmselect"
    storage:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: "10Gi"
    resources:
      limits:
        cpu: "0.5"
        memory: "500Mi"
      requests:
        cpu: "0.5"
        memory: "500Mi"
  vminsert:
    replicaCount: 2
    image:
      repository: "hub-cn-beijing-6.kce.ksyun.com/victoriametrics/vminsert"
    resources:
      limits:
        cpu: "0.5"
        memory: "500Mi"
      requests:
        cpu: "0.5"
        memory: "500Mi"
@f41gh7 f41gh7 added the bug Something isn't working label Sep 26, 2024
@f41gh7
Copy link
Collaborator

f41gh7 commented Sep 26, 2024

Thanks for reporting! The issue will be fixed soon.

@f41gh7 f41gh7 self-assigned this Sep 26, 2024
@f41gh7
Copy link
Collaborator

f41gh7 commented Sep 29, 2024

Issue was fixed at v0.48.3 release.

@f41gh7 f41gh7 closed this as completed Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants