generated from onedr0p/cluster-template
-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Victoria Metrics... Again #2546
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
area/kubernetes
Changes made in the kubernetes directory
cluster/main
cluster/utility
labels
Jun 19, 2024
haraldkoch
reviewed
Jun 19, 2024
kubernetes/main/apps/observability/grafana/app/helmrelease.yaml
Outdated
Show resolved
Hide resolved
joryirving
force-pushed
the
feat/vmetrics-again
branch
from
June 19, 2024 17:44
ef51b67
to
424f149
Compare
--- kubernetes/utility/flux Kustomization: flux-system/cluster HelmRepository: flux-system/victoria-metrics
+++ kubernetes/utility/flux Kustomization: flux-system/cluster HelmRepository: flux-system/victoria-metrics
@@ -0,0 +1,13 @@
+---
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: HelmRepository
+metadata:
+ labels:
+ kustomize.toolkit.fluxcd.io/name: cluster
+ kustomize.toolkit.fluxcd.io/namespace: flux-system
+ name: victoria-metrics
+ namespace: flux-system
+spec:
+ interval: 1h
+ url: https://victoriametrics.github.io/helm-charts/
+
--- kubernetes/utility/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/kube-prometheus-stack
+++ kubernetes/utility/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/kube-prometheus-stack
@@ -1,38 +0,0 @@
----
-apiVersion: kustomize.toolkit.fluxcd.io/v1
-kind: Kustomization
-metadata:
- labels:
- kustomize.toolkit.fluxcd.io/name: cluster-apps
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: kube-prometheus-stack
- namespace: flux-system
-spec:
- commonMetadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- decryption:
- provider: sops
- secretRef:
- name: sops-age
- dependsOn:
- - name: external-secrets-stores
- interval: 30m
- path: ./kubernetes/utility/apps/observability/kube-prometheus-stack/app
- postBuild:
- substitute:
- THANOS_VERSION: v0.35.1
- substituteFrom:
- - kind: ConfigMap
- name: cluster-settings
- - kind: Secret
- name: cluster-secrets
- prune: true
- retryInterval: 1m
- sourceRef:
- kind: GitRepository
- name: home-kubernetes
- targetNamespace: observability
- timeout: 5m
- wait: false
-
--- kubernetes/utility/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/victoria-metrics
+++ kubernetes/utility/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/victoria-metrics
@@ -0,0 +1,34 @@
+---
+apiVersion: kustomize.toolkit.fluxcd.io/v1
+kind: Kustomization
+metadata:
+ labels:
+ kustomize.toolkit.fluxcd.io/name: cluster-apps
+ kustomize.toolkit.fluxcd.io/namespace: flux-system
+ name: victoria-metrics
+ namespace: flux-system
+spec:
+ commonMetadata:
+ labels:
+ app.kubernetes.io/name: victoria-metrics
+ decryption:
+ provider: sops
+ secretRef:
+ name: sops-age
+ interval: 30m
+ path: ./kubernetes/utility/apps/observability/victoria-metrics/app
+ postBuild:
+ substituteFrom:
+ - kind: ConfigMap
+ name: cluster-settings
+ - kind: Secret
+ name: cluster-secrets
+ prune: true
+ retryInterval: 1m
+ sourceRef:
+ kind: GitRepository
+ name: home-kubernetes
+ targetNamespace: observability
+ timeout: 5m
+ wait: false
+
--- kubernetes/utility/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ExternalSecret: observability/thanos-objstore-config
+++ kubernetes/utility/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ExternalSecret: observability/thanos-objstore-config
@@ -1,30 +0,0 @@
----
-apiVersion: external-secrets.io/v1beta1
-kind: ExternalSecret
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: thanos-objstore-config
- namespace: observability
-spec:
- dataFrom:
- - extract:
- key: thanos
- secretStoreRef:
- kind: ClusterSecretStore
- name: bitwarden-secrets-manager
- target:
- name: thanos-objstore-config
- template:
- data:
- config: |-
- type: s3
- config:
- bucket: thanos
- endpoint: rgw.
- access_key: {{ .AWS_ACCESS_KEY_ID }}
- secret_key: {{ .AWS_SECRET_ACCESS_KEY }}
- engineVersion: v2
-
--- kubernetes/utility/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack HelmRelease: observability/kube-prometheus-stack
+++ kubernetes/utility/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack HelmRelease: observability/kube-prometheus-stack
@@ -1,193 +0,0 @@
----
-apiVersion: helm.toolkit.fluxcd.io/v2
-kind: HelmRelease
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: kube-prometheus-stack
- namespace: observability
-spec:
- chart:
- spec:
- chart: kube-prometheus-stack
- sourceRef:
- kind: HelmRepository
- name: prometheus-community
- namespace: flux-system
- version: 60.2.0
- dependsOn:
- - name: prometheus-operator-crds
- namespace: observability
- - name: longhorn
- namespace: storage
- install:
- crds: Skip
- remediation:
- retries: 3
- interval: 30m
- timeout: 15m
- upgrade:
- cleanupOnFail: true
- crds: Skip
- remediation:
- retries: 3
- strategy: rollback
- values:
- alertmanager:
- enabled: false
- cleanPrometheusOperatorObjectNames: true
- crds:
- enabled: false
- grafana:
- enabled: false
- kube-state-metrics:
- fullnameOverride: kube-state-metrics
- metricLabelsAllowlist:
- - pods=[*]
- - deployments=[*]
- - persistentvolumeclaims=[*]
- kubeApiServer:
- enabled: true
- serviceMonitor:
- metricRelabelings:
- - action: keep
- regex: (aggregator_openapi|aggregator_unavailable|apiextensions_openapi|apiserver_admission|apiserver_audit|apiserver_cache|apiserver_cel|apiserver_client|apiserver_crd|apiserver_current|apiserver_envelope|apiserver_flowcontrol|apiserver_init|apiserver_kube|apiserver_longrunning|apiserver_request|apiserver_requested|apiserver_response|apiserver_selfrequest|apiserver_storage|apiserver_terminated|apiserver_tls|apiserver_watch|apiserver_webhooks|authenticated_user|authentication|disabled_metric|etcd_bookmark|etcd_lease|etcd_request|field_validation|get_token|go|grpc_client|hidden_metric|kube_apiserver|kubernetes_build|kubernetes_feature|node_authorizer|pod_security|process_cpu|process_max|process_open|process_resident|process_start|process_virtual|registered_metric|rest_client|scrape_duration|scrape_samples|scrape_series|serviceaccount_legacy|serviceaccount_stale|serviceaccount_valid|watch_cache|workqueue)_(.+)
- sourceLabels:
- - __name__
- - action: drop
- regex: (apiserver|etcd|rest_client)_request(|_sli|_slo)_duration_seconds_bucket
- sourceLabels:
- - __name__
- - action: drop
- regex: (apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket)
- sourceLabels:
- - __name__
- kubeControllerManager:
- enabled: true
- endpoints:
- - 10.69.1.121
- serviceMonitor:
- metricRelabelings:
- - action: keep
- regex: (apiserver_audit|apiserver_client|apiserver_delegated|apiserver_envelope|apiserver_storage|apiserver_webhooks|attachdetach_controller|authenticated_user|authentication|cronjob_controller|disabled_metric|endpoint_slice|ephemeral_volume|garbagecollector_controller|get_token|go|hidden_metric|job_controller|kubernetes_build|kubernetes_feature|leader_election|node_collector|node_ipam|process_cpu|process_max|process_open|process_resident|process_start|process_virtual|pv_collector|registered_metric|replicaset_controller|rest_client|retroactive_storageclass|root_ca|running_managed|scrape_duration|scrape_samples|scrape_series|service_controller|storage_count|storage_operation|ttl_after|volume_operation|workqueue)_(.+)
- sourceLabels:
- - __name__
- kubeEtcd:
- enabled: true
- endpoints:
- - 10.69.1.121
- kubeProxy:
- enabled: false
- kubeScheduler:
- enabled: true
- endpoints:
- - 10.69.1.121
- serviceMonitor:
- metricRelabelings:
- - action: keep
- regex: (apiserver_audit|apiserver_client|apiserver_delegated|apiserver_envelope|apiserver_storage|apiserver_webhooks|authenticated_user|authentication|disabled_metric|go|hidden_metric|kubernetes_build|kubernetes_feature|leader_election|process_cpu|process_max|process_open|process_resident|process_start|process_virtual|registered_metric|rest_client|scheduler|scrape_duration|scrape_samples|scrape_series|workqueue)_(.+)
- sourceLabels:
- - __name__
- kubeStateMetrics:
- enabled: true
- kubelet:
- enabled: true
- serviceMonitor:
- metricRelabelings:
- - action: keep
- regex: (apiserver_audit|apiserver_client|apiserver_delegated|apiserver_envelope|apiserver_storage|apiserver_webhooks|authentication_token|cadvisor_version|container_blkio|container_cpu|container_fs|container_last|container_memory|container_network|container_oom|container_processes|container|csi_operations|disabled_metric|get_token|go|hidden_metric|kubelet_certificate|kubelet_cgroup|kubelet_container|kubelet_containers|kubelet_cpu|kubelet_device|kubelet_graceful|kubelet_http|kubelet_lifecycle|kubelet_managed|kubelet_node|kubelet_pleg|kubelet_pod|kubelet_run|kubelet_running|kubelet_runtime|kubelet_server|kubelet_started|kubelet_volume|kubernetes_build|kubernetes_feature|machine_cpu|machine_memory|machine_nvm|machine_scrape|node_namespace|plugin_manager|prober_probe|process_cpu|process_max|process_open|process_resident|process_start|process_virtual|registered_metric|rest_client|scrape_duration|scrape_samples|scrape_series|storage_operation|volume_manager|volume_operation|workqueue)_(.+)
- sourceLabels:
- - __name__
- - action: replace
- sourceLabels:
- - node
- targetLabel: instance
- - action: labeldrop
- regex: (uid)
- - action: labeldrop
- regex: (id|name)
- - action: drop
- regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
- sourceLabels:
- - __name__
- nodeExporter:
- enabled: true
- prometheus:
- ingress:
- annotations:
- external-dns.alpha.kubernetes.io/target: internal-utility.
- enabled: true
- hosts:
- - prometheus-utility.
- ingressClassName: internal
- pathType: Prefix
- prometheusSpec:
- additionalAlertManagerConfigs:
- - static_configs:
- - targets:
- - alertmanager.
- enableAdminAPI: true
- enableFeatures:
- - auto-gomemlimit
- - memory-snapshot-on-shutdown
- - new-service-discovery-manager
- externalLabels:
- cluster: utility
- podMetadata:
- annotations:
- secret.reloader.stakater.com/reload: thanos-objstore-config
- podMonitorSelectorNilUsesHelmValues: false
- probeSelectorNilUsesHelmValues: false
- replicaExternalLabelName: __replica__
- replicas: 1
- resources:
- limits:
- memory: 1500Mi
- requests:
- cpu: 100m
- retention: 2d
- retentionSize: 15GB
- ruleSelectorNilUsesHelmValues: false
- scrapeConfigSelectorNilUsesHelmValues: false
- scrapeInterval: 1m
- serviceMonitorSelectorNilUsesHelmValues: false
- storageSpec:
- volumeClaimTemplate:
- spec:
- resources:
- requests:
- storage: 20Gi
- storageClassName: local-hostpath
- thanos:
- image: quay.io/thanos/thanos:v0.35.1
- objectStorageConfig:
- existingSecret:
- key: config
- name: thanos-objstore-config
- version: 0.35.1
- thanosService:
- enabled: true
- thanosServiceExternal:
- annotations:
- external-dns.alpha.kubernetes.io/hostname: thanos-svc.
- io.cilium/lb-ipam-ips: temp
- enabled: true
- externalTrafficPolicy: Cluster
- type: LoadBalancer
- thanosServiceMonitor:
- enabled: true
- prometheus-node-exporter:
- fullnameOverride: node-exporter
- prometheus:
- monitor:
- enabled: true
- relabelings:
- - action: replace
- regex: (.*)
- replacement: $1
- sourceLabels:
- - __meta_kubernetes_pod_node_name
- targetLabel: kubernetes_node
-
--- kubernetes/utility/apps/observability/victoria-metrics/app Kustomization: flux-system/victoria-metrics HelmRelease: observability/victoria-metrics
+++ kubernetes/utility/apps/observability/victoria-metrics/app Kustomization: flux-system/victoria-metrics HelmRelease: observability/victoria-metrics
@@ -0,0 +1,49 @@
+---
+apiVersion: helm.toolkit.fluxcd.io/v2
+kind: HelmRelease
+metadata:
+ labels:
+ app.kubernetes.io/name: victoria-metrics
+ kustomize.toolkit.fluxcd.io/name: victoria-metrics
+ kustomize.toolkit.fluxcd.io/namespace: flux-system
+ name: victoria-metrics
+ namespace: observability
+spec:
+ chart:
+ spec:
+ chart: victoria-metrics-agent
+ sourceRef:
+ kind: HelmRepository
+ name: victoria-metrics
+ namespace: flux-system
+ version: 0.10.9
+ install:
+ remediation:
+ retries: 3
+ interval: 30m
+ upgrade:
+ cleanupOnFail: true
+ remediation:
+ retries: 3
+ strategy: rollback
+ values:
+ config:
+ global:
+ scrape_interval: 30s
+ deployment:
+ enabled: true
+ ingress:
+ annotations:
+ external-dns.alpha.kubernetes.io/target: internal-utility.${SECRET_DOMAIN}
+ enabled: true
+ hosts:
+ - name: vmagent-utility.${SECRET_DOMAIN}
+ path: /
+ port: http
+ ingressClassName: internal
+ remoteWriteUrls:
+ - https://victoria-metrics.${SECRET_DOMAIN}/insert/0/prometheus
+ replicaCount: 1
+ service:
+ enabled: true
+ |
--- kubernetes/main/flux Kustomization: flux-system/cluster HelmRepository: flux-system/victoria-metrics
+++ kubernetes/main/flux Kustomization: flux-system/cluster HelmRepository: flux-system/victoria-metrics
@@ -0,0 +1,13 @@
+---
+apiVersion: source.toolkit.fluxcd.io/v1
+kind: HelmRepository
+metadata:
+ labels:
+ kustomize.toolkit.fluxcd.io/name: cluster
+ kustomize.toolkit.fluxcd.io/namespace: flux-system
+ name: victoria-metrics
+ namespace: flux-system
+spec:
+ interval: 1h
+ url: https://victoriametrics.github.io/helm-charts/
+
--- kubernetes/main/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/kube-prometheus-stack
+++ kubernetes/main/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/kube-prometheus-stack
@@ -1,38 +0,0 @@
----
-apiVersion: kustomize.toolkit.fluxcd.io/v1
-kind: Kustomization
-metadata:
- labels:
- kustomize.toolkit.fluxcd.io/name: cluster-apps
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: kube-prometheus-stack
- namespace: flux-system
-spec:
- commonMetadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- decryption:
- provider: sops
- secretRef:
- name: sops-age
- dependsOn:
- - name: external-secrets-stores
- interval: 30m
- path: ./kubernetes/main/apps/observability/kube-prometheus-stack/app
- postBuild:
- substitute:
- THANOS_VERSION: v0.35.1
- substituteFrom:
- - kind: ConfigMap
- name: cluster-settings
- - kind: Secret
- name: cluster-secrets
- prune: true
- retryInterval: 1m
- sourceRef:
- kind: GitRepository
- name: home-kubernetes
- targetNamespace: observability
- timeout: 5m
- wait: false
-
--- kubernetes/main/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/thanos
+++ kubernetes/main/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/thanos
@@ -1,37 +0,0 @@
----
-apiVersion: kustomize.toolkit.fluxcd.io/v1
-kind: Kustomization
-metadata:
- labels:
- kustomize.toolkit.fluxcd.io/name: cluster-apps
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: thanos
- namespace: flux-system
-spec:
- commonMetadata:
- labels:
- app.kubernetes.io/name: thanos
- decryption:
- provider: sops
- secretRef:
- name: sops-age
- dependsOn:
- - name: dragonfly-cluster
- - name: external-secrets-stores
- interval: 30m
- path: ./kubernetes/main/apps/observability/thanos/app
- postBuild:
- substituteFrom:
- - kind: ConfigMap
- name: cluster-settings
- - kind: Secret
- name: cluster-secrets
- prune: true
- retryInterval: 1m
- sourceRef:
- kind: GitRepository
- name: home-kubernetes
- targetNamespace: observability
- timeout: 15m
- wait: false
-
--- kubernetes/main/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/victoria-metrics
+++ kubernetes/main/apps Kustomization: flux-system/cluster-apps Kustomization: flux-system/victoria-metrics
@@ -0,0 +1,34 @@
+---
+apiVersion: kustomize.toolkit.fluxcd.io/v1
+kind: Kustomization
+metadata:
+ labels:
+ kustomize.toolkit.fluxcd.io/name: cluster-apps
+ kustomize.toolkit.fluxcd.io/namespace: flux-system
+ name: victoria-metrics
+ namespace: flux-system
+spec:
+ commonMetadata:
+ labels:
+ app.kubernetes.io/name: victoria-metrics
+ decryption:
+ provider: sops
+ secretRef:
+ name: sops-age
+ interval: 30m
+ path: ./kubernetes/main/apps/observability/victoria-metrics/app
+ postBuild:
+ substituteFrom:
+ - kind: ConfigMap
+ name: cluster-settings
+ - kind: Secret
+ name: cluster-secrets
+ prune: true
+ retryInterval: 1m
+ sourceRef:
+ kind: GitRepository
+ name: home-kubernetes
+ targetNamespace: observability
+ timeout: 5m
+ wait: false
+
--- kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ExternalSecret: observability/alertmanager-secret
+++ kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ExternalSecret: observability/alertmanager-secret
@@ -1,86 +0,0 @@
----
-apiVersion: external-secrets.io/v1beta1
-kind: ExternalSecret
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: alertmanager-secret
- namespace: observability
-spec:
- dataFrom:
- - extract:
- key: alertmanager
- - extract:
- key: discord
- refreshInterval: 15m
- secretStoreRef:
- kind: ClusterSecretStore
- name: bitwarden-secrets-manager
- target:
- name: alertmanager-secret
- template:
- data:
- alertmanager.yaml: |
- global:
- resolve_timeout: 5m
- route:
- group_by: ["alertname", "job"]
- group_interval: 10m
- group_wait: 1m
- receiver: discord
- repeat_interval: 12h
- routes:
- - receiver: heartbeat
- group_interval: 5m
- group_wait: 0s
- matchers:
- - alertname =~ "Watchdog"
- repeat_interval: 5m
- - receiver: "null"
- matchers:
- - severity = "none"
- - alertname =~ "InfoInhibitor|Watchdog"
- - receiver: discord
- continue: true
- matchers:
- - severity = "critical"
- inhibit_rules:
- - equal: ["alertname", "namespace"]
- source_matchers:
- - severity = "critical"
- target_matchers:
- - severity = "warning"
- receivers:
- - name: heartbeat
- webhook_configs:
- - send_resolved: true
- url: "{{ .ALERTMANAGER_HEARTBEAT_URL }}"
- - name: "null"
- - name: discord
- discord_configs:
- - send_resolved: true
- webhook_url: "{{ .DISCORD_WEBHOOK_URL }}"
- title: >-
- {{ "{{" }} .CommonLabels.alertname {{ "}}" }}
- [{{ "{{" }} .Status | toUpper {{ "}}" }}{{ "{{" }} if eq .Status "firing" {{ "}}" }}:{{ "{{" }} .Alerts.Firing | len {{ "}}" }}{{ "{{" }} end {{ "}}" }}]
- message: |-
- {{ "{{-" }} range .Alerts {{ "}}" }}
- {{ "{{-" }} if ne .Annotations.description "" {{ "}}" }}
- {{ "{{" }} .Annotations.description {{ "}}" }}
- {{ "{{-" }} else if ne .Annotations.summary "" {{ "}}" }}
- {{ "{{" }} .Annotations.summary {{ "}}" }}
- {{ "{{-" }} else if ne .Annotations.message "" {{ "}}" }}
- {{ "{{" }} .Annotations.message {{ "}}" }}
- {{ "{{-" }} else {{ "}}" }}
- Alert description not available
- {{ "{{-" }} end {{ "}}" }}
- {{ "{{-" }} if gt (len .Labels.SortedPairs) 0 {{ "}}" }}
- {{ "{{-" }} range .Labels.SortedPairs {{ "}}" }}
- **{{ "{{" }} .Name {{ "}}" }}:** {{ "{{" }} .Value {{ "}}" }}
- {{ "{{-" }} end {{ "}}" }}
- {{ "{{-" }} end {{ "}}" }}
- {{ "{{-" }} end {{ "}}" }}
- engineVersion: v2
-
--- kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack HelmRelease: observability/kube-prometheus-stack
+++ kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack HelmRelease: observability/kube-prometheus-stack
@@ -1,223 +0,0 @@
----
-apiVersion: helm.toolkit.fluxcd.io/v2
-kind: HelmRelease
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: kube-prometheus-stack
- namespace: observability
-spec:
- chart:
- spec:
- chart: kube-prometheus-stack
- sourceRef:
- kind: HelmRepository
- name: prometheus-community
- namespace: flux-system
- version: 60.2.0
- dependsOn:
- - name: prometheus-operator-crds
- namespace: observability
- - name: rook-ceph-cluster
- namespace: rook-ceph
- - name: thanos
- namespace: observability
- install:
- crds: Skip
- remediation:
- retries: 3
- interval: 30m
- timeout: 15m
- upgrade:
- cleanupOnFail: true
- crds: Skip
- remediation:
- retries: 3
- strategy: rollback
- values:
- alertmanager:
- alertmanagerSpec:
- configSecret: alertmanager-secret
- replicas: 2
- storage:
- volumeClaimTemplate:
- spec:
- resources:
- requests:
- storage: 1Gi
- storageClassName: ceph-block
- useExistingSecret: true
- ingress:
- annotations:
- external-dns.alpha.kubernetes.io/target: internal.
- enabled: true
- hosts:
- - alertmanager.
- ingressClassName: internal
- pathType: Prefix
- cleanPrometheusOperatorObjectNames: true
- crds:
- enabled: false
- grafana:
- enabled: false
- forceDeployDashboards: true
- sidecar:
- dashboards:
- annotations:
- grafana_folder: Kubernetes
- multicluster:
- etcd:
- enabled: true
- kube-state-metrics:
- fullnameOverride: kube-state-metrics
- metricLabelsAllowlist:
- - pods=[*]
- - deployments=[*]
- - persistentvolumeclaims=[*]
- kubeApiServer:
- enabled: true
- serviceMonitor:
- metricRelabelings:
- - action: keep
- regex: (aggregator_openapi|aggregator_unavailable|apiextensions_openapi|apiserver_admission|apiserver_audit|apiserver_cache|apiserver_cel|apiserver_client|apiserver_crd|apiserver_current|apiserver_envelope|apiserver_flowcontrol|apiserver_init|apiserver_kube|apiserver_longrunning|apiserver_request|apiserver_requested|apiserver_response|apiserver_selfrequest|apiserver_storage|apiserver_terminated|apiserver_tls|apiserver_watch|apiserver_webhooks|authenticated_user|authentication|disabled_metric|etcd_bookmark|etcd_lease|etcd_request|field_validation|get_token|go|grpc_client|hidden_metric|kube_apiserver|kubernetes_build|kubernetes_feature|node_authorizer|pod_security|process_cpu|process_max|process_open|process_resident|process_start|process_virtual|registered_metric|rest_client|scrape_duration|scrape_samples|scrape_series|serviceaccount_legacy|serviceaccount_stale|serviceaccount_valid|watch_cache|workqueue)_(.+)
- sourceLabels:
- - __name__
- - action: drop
- regex: (apiserver|etcd|rest_client)_request(|_sli|_slo)_duration_seconds_bucket
- sourceLabels:
- - __name__
- - action: drop
- regex: (apiserver_response_sizes_bucket|apiserver_watch_events_sizes_bucket)
- sourceLabels:
- - __name__
- kubeControllerManager:
- enabled: true
- endpoints:
- - 10.69.1.21
- - 10.69.1.22
- - 10.69.1.23
- serviceMonitor:
- metricRelabelings:
- - action: keep
- regex: (apiserver_audit|apiserver_client|apiserver_delegated|apiserver_envelope|apiserver_storage|apiserver_webhooks|attachdetach_controller|authenticated_user|authentication|cronjob_controller|disabled_metric|endpoint_slice|ephemeral_volume|garbagecollector_controller|get_token|go|hidden_metric|job_controller|kubernetes_build|kubernetes_feature|leader_election|node_collector|node_ipam|process_cpu|process_max|process_open|process_resident|process_start|process_virtual|pv_collector|registered_metric|replicaset_controller|rest_client|retroactive_storageclass|root_ca|running_managed|scrape_duration|scrape_samples|scrape_series|service_controller|storage_count|storage_operation|ttl_after|volume_operation|workqueue)_(.+)
- sourceLabels:
- - __name__
- kubeEtcd:
- enabled: true
- endpoints:
- - 10.69.1.21
- - 10.69.1.22
- - 10.69.1.23
- kubeProxy:
- enabled: false
- kubeScheduler:
- enabled: true
- endpoints:
- - 10.69.1.21
- - 10.69.1.22
- - 10.69.1.23
- serviceMonitor:
- metricRelabelings:
- - action: keep
- regex: (apiserver_audit|apiserver_client|apiserver_delegated|apiserver_envelope|apiserver_storage|apiserver_webhooks|authenticated_user|authentication|disabled_metric|go|hidden_metric|kubernetes_build|kubernetes_feature|leader_election|process_cpu|process_max|process_open|process_resident|process_start|process_virtual|registered_metric|rest_client|scheduler|scrape_duration|scrape_samples|scrape_series|workqueue)_(.+)
- sourceLabels:
- - __name__
- kubeStateMetrics:
- enabled: true
- kubelet:
- enabled: true
- serviceMonitor:
- metricRelabelings:
- - action: keep
- regex: (apiserver_audit|apiserver_client|apiserver_delegated|apiserver_envelope|apiserver_storage|apiserver_webhooks|authentication_token|cadvisor_version|container_blkio|container_cpu|container_fs|container_last|container_memory|container_network|container_oom|container_processes|container|csi_operations|disabled_metric|get_token|go|hidden_metric|kubelet_certificate|kubelet_cgroup|kubelet_container|kubelet_containers|kubelet_cpu|kubelet_device|kubelet_graceful|kubelet_http|kubelet_lifecycle|kubelet_managed|kubelet_node|kubelet_pleg|kubelet_pod|kubelet_run|kubelet_running|kubelet_runtime|kubelet_server|kubelet_started|kubelet_volume|kubernetes_build|kubernetes_feature|machine_cpu|machine_memory|machine_nvm|machine_scrape|node_namespace|plugin_manager|prober_probe|process_cpu|process_max|process_open|process_resident|process_start|process_virtual|registered_metric|rest_client|scrape_duration|scrape_samples|scrape_series|storage_operation|volume_manager|volume_operation|workqueue)_(.+)
- sourceLabels:
- - __name__
- - action: replace
- sourceLabels:
- - node
- targetLabel: instance
- - action: labeldrop
- regex: (uid)
- - action: labeldrop
- regex: (id|name)
- - action: drop
- regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
- sourceLabels:
- - __name__
- nodeExporter:
- enabled: true
- prometheus:
- ingress:
- annotations:
- external-dns.alpha.kubernetes.io/target: internal.
- gethomepage.dev/description: Monitoring Scrape Service
- gethomepage.dev/enabled: 'true'
- gethomepage.dev/group: Observability
- gethomepage.dev/icon: prometheus.png
- gethomepage.dev/name: Prometheus
- gethomepage.dev/widget.type: prometheus
- gethomepage.dev/widget.url: http://kube-prometheus-stack-prometheus.observability:9090
- enabled: true
- hosts:
- - prometheus.
- ingressClassName: internal
- pathType: Prefix
- prometheusSpec:
- enableAdminAPI: true
- enableFeatures:
- - auto-gomemlimit
- - memory-snapshot-on-shutdown
- - new-service-discovery-manager
- externalLabels:
- cluster: main
- podMetadata:
- annotations:
- secret.reloader.stakater.com/reload: thanos-objstore-config
- podMonitorSelectorNilUsesHelmValues: false
- probeSelectorNilUsesHelmValues: false
- replicaExternalLabelName: __replica__
- replicas: 2
- resources:
- limits:
- memory: 1500Mi
- requests:
- cpu: 100m
- retention: 2d
- retentionSize: 15GB
- ruleSelectorNilUsesHelmValues: false
- scrapeConfigSelectorNilUsesHelmValues: false
- scrapeInterval: 1m
- serviceMonitorSelectorNilUsesHelmValues: false
- storageSpec:
- volumeClaimTemplate:
- spec:
- resources:
- requests:
- storage: 20Gi
- storageClassName: ceph-block
- thanos:
- image: quay.io/thanos/thanos:v0.35.1
- objectStorageConfig:
- existingSecret:
- key: config
- name: thanos-objstore-config
- version: 0.35.1
- thanosService:
- enabled: true
- thanosServiceMonitor:
- enabled: true
- prometheus-node-exporter:
- fullnameOverride: node-exporter
- prometheus:
- monitor:
- enabled: true
- relabelings:
- - action: replace
- regex: (.*)
- replacement: $1
- sourceLabels:
- - __meta_kubernetes_pod_node_name
- targetLabel: kubernetes_node
-
--- kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack PrometheusRule: observability/miscellaneous-rules
+++ kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack PrometheusRule: observability/miscellaneous-rules
@@ -1,38 +0,0 @@
----
-apiVersion: monitoring.coreos.com/v1
-kind: PrometheusRule
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- prometheus: k8s
- role: alert-rules
- name: miscellaneous-rules
- namespace: observability
-spec:
- groups:
- - name: dockerhub
- rules:
- - alert: BootstrapRateLimitRisk
- annotations:
- summary: Kubernetes cluster at risk of being rate limited by dockerhub on
- bootstrap
- expr: count(time() - container_last_seen{image=~"(docker.io).*",container!=""}
- < 30) > 100
- for: 15m
- labels:
- severity: critical
- - name: oom
- rules:
- - alert: OOMKilled
- annotations:
- description: Container {{ $labels.container }} in pod {{ $labels.namespace
- }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10
- minutes.
- expr: (kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total
- offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m])
- == 1
- labels:
- severity: critical
-
--- kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/node-exporter
+++ kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/node-exporter
@@ -1,21 +0,0 @@
----
-apiVersion: monitoring.coreos.com/v1alpha1
-kind: ScrapeConfig
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: node-exporter
- namespace: observability
-spec:
- metricsPath: /metrics
- relabelings:
- - action: replace
- replacement: node-exporter
- targetLabel: job
- staticConfigs:
- - targets:
- - voyager.internal:9100
- - pikvm.internal:9100
-
--- kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/smartctl-exporter
+++ kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/smartctl-exporter
@@ -1,20 +0,0 @@
----
-apiVersion: monitoring.coreos.com/v1alpha1
-kind: ScrapeConfig
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: smartctl-exporter
- namespace: observability
-spec:
- metricsPath: /metrics
- relabelings:
- - action: replace
- replacement: smartctl-exporter
- targetLabel: job
- staticConfigs:
- - targets:
- - voyager.internal:9633
-
--- kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/pikvm
+++ kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/pikvm
@@ -1,20 +0,0 @@
----
-apiVersion: monitoring.coreos.com/v1alpha1
-kind: ScrapeConfig
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: pikvm
- namespace: observability
-spec:
- metricsPath: /api/export/prometheus/metrics
- relabelings:
- - action: replace
- replacement: pikvm
- targetLabel: job
- staticConfigs:
- - targets:
- - pikvm.internal
-
--- kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/blocky
+++ kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/blocky
@@ -1,20 +0,0 @@
----
-apiVersion: monitoring.coreos.com/v1alpha1
-kind: ScrapeConfig
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: blocky
- namespace: observability
-spec:
- metricsPath: /metrics
- relabelings:
- - action: replace
- replacement: blocky
- targetLabel: job
- staticConfigs:
- - targets:
- - blocky.
-
--- kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/minio-job
+++ kubernetes/main/apps/observability/kube-prometheus-stack/app Kustomization: flux-system/kube-prometheus-stack ScrapeConfig: observability/minio-job
@@ -1,20 +0,0 @@
----
-apiVersion: monitoring.coreos.com/v1alpha1
-kind: ScrapeConfig
-metadata:
- labels:
- app.kubernetes.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/name: kube-prometheus-stack
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: minio-job
- namespace: observability
-spec:
- metricsPath: /minio/v2/metrics/cluster
- relabelings:
- - action: replace
- replacement: minio-job
- targetLabel: job
- staticConfigs:
- - targets:
- - s3.
-
--- kubernetes/main/apps/observability/grafana/app Kustomization: flux-system/grafana HelmRelease: observability/grafana
+++ kubernetes/main/apps/observability/grafana/app Kustomization: flux-system/grafana HelmRelease: observability/grafana
@@ -97,19 +97,18 @@
folder: System
name: system
options:
path: /var/lib/grafana/dashboards/system
orgId: 1
type: file
- - allowUiUpdates: true
- disableDeletion: false
- editable: true
- folder: Thanos
- name: thanos
- options:
- path: /var/lib/grafana/dashboards/thanos
+ - disableDeletion: false
+ editable: true
+ folder: VictoriaMetrics
+ name: victoriametrics
+ options:
+ path: /var/lib/grafana/dashboards/victoriametrics-folder
orgId: 1
type: file
dashboards:
data:
crunchy-pgbackrest:
datasource:
@@ -303,53 +302,50 @@
spegel:
datasource:
- name: DS_PROMETHEUS
value: Prometheus
gnetId: 18089
revision: 1
- thanos:
- thanos-bucket-replicate:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/bucket-replicate.json
- thanos-compact:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/compact.json
- thanos-overview:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/overview.json
- thanos-query:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/query.json
- thanos-query-frontend:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/query-frontend.json
- thanos-receieve:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/receive.json
- thanos-rule:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/rule.json
- thanos-sidecar:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/sidecar.json
- thanos-store:
- datasource: Prometheus
- url: https://raw.githubusercontent.com/monitoring-mixins/website/master/assets/thanos/dashboards/store.json
+ victoriametrics:
+ vm-cluster:
+ datasource:
+ - name: DS_PROMETHEUS
+ value: Prometheus
+ gnetId: 11176
+ revision: 37
+ vm-operator:
+ datasource:
+ - name: DS_PROMETHEUS
+ value: Prometheus
+ gnetId: 17869
+ revision: 2
+ vm-vmagent:
+ datasource:
+ - name: DS_PROMETHEUS
+ value: Prometheus
+ gnetId: 12683
+ revision: 18
+ vm-vmalert:
+ datasource:
+ - name: DS_PROMETHEUS
+ value: Prometheus
+ gnetId: 14950
+ revision: 11
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- access: proxy
isDefault: true
jsonData:
- prometheusType: Thanos
+ prometheusType: Prometheus
timeInterval: 1m
name: Prometheus
type: prometheus
uid: prometheus
- url: http://thanos-query-frontend.observability.svc.cluster.local:10902
+ url: http://vmsingle-victoria-metrics.observability.svc.cluster.local:8429
- access: proxy
jsonData:
implementation: prometheus
name: Alertmanager
type: alertmanager
url: http://alertmanager-operated.observability.svc.cluster.local:9093
--- kubernetes/main/apps/observability/thanos/app Kustomization: flux-system/thanos ObjectBucketClaim: observability/thanos-bucket
+++ kubernetes/main/apps/observability/thanos/app Kustomization: flux-system/thanos ObjectBucketClaim: observability/thanos-bucket
@@ -1,14 +0,0 @@
----
-apiVersion: objectbucket.io/v1alpha1
-kind: ObjectBucketClaim
-metadata:
- labels:
- app.kubernetes.io/name: thanos
- kustomize.toolkit.fluxcd.io/name: thanos
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: thanos-bucket
- namespace: observability
-spec:
- bucketName: thanos
- storageClassName: ceph-bucket
-
--- kubernetes/main/apps/observability/thanos/app Kustomization: flux-system/thanos HelmRelease: observability/thanos
+++ kubernetes/main/apps/observability/thanos/app Kustomization: flux-system/thanos HelmRelease: observability/thanos
@@ -1,149 +0,0 @@
----
-apiVersion: helm.toolkit.fluxcd.io/v2
-kind: HelmRelease
-metadata:
- labels:
- app.kubernetes.io/name: thanos
- kustomize.toolkit.fluxcd.io/name: thanos
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: thanos
- namespace: observability
-spec:
- chart:
- spec:
- chart: thanos
- sourceRef:
- kind: HelmRepository
- name: stevehipwell
- namespace: flux-system
- version: 1.17.2
- dependsOn:
- - name: openebs
- namespace: storage
- - name: rook-ceph-cluster
- namespace: rook-ceph
- install:
- remediation:
- retries: 3
- interval: 30m
- timeout: 15m
- upgrade:
- cleanupOnFail: true
- remediation:
- retries: 3
- strategy: rollback
- values:
- additionalEndpoints:
- - dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.observability.svc.cluster.local
- additionalReplicaLabels:
- - __replica__
- compact:
- enabled: true
- extraArgs:
- - --compact.concurrency=4
- - --delete-delay=30m
- - --retention.resolution-raw=14d
- - --retention.resolution-5m=30d
- - --retention.resolution-1h=60d
- persistence:
- enabled: true
- size: 20Gi
- storageClass: ceph-block
- objstoreConfig:
- value:
- config:
- insecure: true
- type: s3
- query:
- additionalStores:
- - thanos-svc.:10901
- extraArgs:
- - --alert.query-url=https://thanos.
- replicas: 2
- queryFrontend:
- enabled: true
- extraArgs:
- - --query-range.response-cache-config=$(THANOS_CACHE_CONFIG)
- extraEnv:
- - name: THANOS_CACHE_CONFIG
- valueFrom:
- configMapKeyRef:
- key: cache.yaml
- name: thanos-cache-configmap
- ingress:
- annotations:
- external-dns.alpha.kubernetes.io/target: internal.
- enabled: true
- hosts:
- - thanos.
- ingressClassName: internal
- podAnnotations:
- configmap.reloader.stakater.com/reload: thanos-cache-configmap
- replicas: 2
- rule:
- alertmanagersConfig:
- value: |-
- alertmanagers:
- - api_version: v2
- static_configs:
- - dnssrv+_http-web._tcp.alertmanager-operated.observability.svc.cluster.local
- enabled: true
- extraArgs:
- - --web.prefix-header=X-Forwarded-Prefix
- persistence:
- enabled: true
- size: 20Gi
- storageClass: ceph-block
- replicas: 2
- rules:
- value: |-
- groups:
- - name: PrometheusWatcher
- rules:
- - alert: PrometheusDown
- annotations:
- summary: A Prometheus has disappeared from Prometheus target discovery
- expr: absent(up{job="kube-prometheus-stack-prometheus"})
- for: 5m
- labels:
- severity: critical
- serviceMonitor:
- enabled: true
- storeGateway:
- extraArgs:
- - --index-cache.config=$(THANOS_CACHE_CONFIG)
- extraEnv:
- - name: THANOS_CACHE_CONFIG
- valueFrom:
- configMapKeyRef:
- key: cache.yaml
- name: thanos-cache-configmap
- persistence:
- enabled: true
- size: 20Gi
- storageClass: ceph-block
- podAnnotations:
- configmap.reloader.stakater.com/reload: thanos-cache-configmap
- replicas: 2
- valuesFrom:
- - kind: ConfigMap
- name: thanos-bucket
- targetPath: objstoreConfig.value.config.bucket
- valuesKey: BUCKET_NAME
- - kind: ConfigMap
- name: thanos-bucket
- targetPath: objstoreConfig.value.config.endpoint
- valuesKey: BUCKET_HOST
- - kind: ConfigMap
- name: thanos-bucket
- targetPath: objstoreConfig.value.config.region
- valuesKey: BUCKET_REGION
- - kind: Secret
- name: thanos-bucket
- targetPath: objstoreConfig.value.config.access_key
- valuesKey: AWS_ACCESS_KEY_ID
- - kind: Secret
- name: thanos-bucket
- targetPath: objstoreConfig.value.config.secret_key
- valuesKey: AWS_SECRET_ACCESS_KEY
-
--- kubernetes/main/apps/observability/thanos/app Kustomization: flux-system/thanos ConfigMap: observability/thanos-cache-configmap
+++ kubernetes/main/apps/observability/thanos/app Kustomization: flux-system/thanos ConfigMap: observability/thanos-cache-configmap
@@ -1,18 +0,0 @@
----
-apiVersion: v1
-data:
- cache.yaml: |
- ---
- type: REDIS
- config:
- addr: dragonfly.database.svc.cluster.local:6379
- db: 2
-kind: ConfigMap
-metadata:
- labels:
- app.kubernetes.io/name: thanos
- kustomize.toolkit.fluxcd.io/name: thanos
- kustomize.toolkit.fluxcd.io/namespace: flux-system
- name: thanos-cache-configmap
- namespace: observability
-
--- kubernetes/main/apps/observability/victoria-metrics/app Kustomization: flux-system/victoria-metrics HelmRelease: observability/victoria-metrics
+++ kubernetes/main/apps/observability/victoria-metrics/app Kustomization: flux-system/victoria-metrics HelmRelease: observability/victoria-metrics
@@ -0,0 +1,224 @@
+---
+apiVersion: helm.toolkit.fluxcd.io/v2
+kind: HelmRelease
+metadata:
+ labels:
+ app.kubernetes.io/name: victoria-metrics
+ kustomize.toolkit.fluxcd.io/name: victoria-metrics
+ kustomize.toolkit.fluxcd.io/namespace: flux-system
+ name: victoria-metrics
+ namespace: observability
+spec:
+ chart:
+ spec:
+ chart: victoria-metrics-k8s-stack
+ sourceRef:
+ kind: HelmRepository
+ name: victoria-metrics
+ namespace: flux-system
+ version: 0.23.2
+ install:
+ remediation:
+ retries: 3
+ interval: 30m
+ upgrade:
+ cleanupOnFail: true
+ remediation:
+ retries: 3
+ strategy: rollback
+ values:
+ alertmanager:
+ enabled: true
+ ingress:
+ annotations:
+ external-dns.alpha.kubernetes.io/target: internal.${SECRET_DOMAIN}
+ enabled: true
+ hosts:
+ - alertmanager.${SECRET_DOMAIN}
+ ingressClassName: internal
+ pathType: Prefix
+ spec:
+ configSecret: alertmanager-secret
+ replicaCount: 2
+ storage:
+ volumeClaimTemplate:
+ spec:
+ resources:
+ requests:
+ storage: 1Gi
+ storageClassName: ceph-block
+ coreDns:
+ enabled: true
+ defaultDashboardsEnabled: true
+ defaultRules:
+ create: true
+ rules:
+ alertmanager: true
+ etcd: true
+ general: true
+ k8s: true
+ kubeApiserver: true
+ kubeApiserverAvailability: true
+ kubeApiserverBurnrate: true
+ kubeApiserverHistogram: true
+ kubeApiserverSlos: true
+ kubePrometheusGeneral: true
+ kubePrometheusNodeRecording: true
+ kubeScheduler: true
+ kubeStateMetrics: true
+ kubelet: true
+ kubernetesApps: true
+ kubernetesResources: true
+ kubernetesStorage: true
+ kubernetesSystem: true
+ network: true
+ node: true
+ vmagent: true
+ vmhealth: true
+ vmsingle: true
+ experimentalDashboardsEnabled: true
+ fullnameOverride: victoria-metrics
+ grafana:
+ enabled: false
+ forceDeployDashboards: true
+ sidecar:
+ dashboards:
+ annotations:
+ grafana_folder: Kubernetes
+ multicluster:
+ etcd:
+ enabled: true
+ kube-state-metrics:
+ enabled: true
+ fullnameOverride: kube-state-metrics
+ metricLabelsAllowlist:
+ - pods=[*]
+ - deployments=[*]
+ - persistentvolumeclaims=[*]
+ kubeApiServer:
+ enabled: true
+ kubeControllerManager:
+ enabled: true
+ endpoints:
+ - 10.69.1.21
+ - 10.69.1.22
+ - 10.69.1.23
+ kubeDns:
+ enabled: false
+ kubeEtcd:
+ enabled: true
+ endpoints:
+ - 10.69.1.21
+ - 10.69.1.22
+ - 10.69.1.23
+ kubeProxy:
+ enabled: false
+ kubeScheduler:
+ enabled: true
+ endpoints:
+ - 10.69.1.21
+ - 10.69.1.22
+ - 10.69.1.23
+ kubelet:
+ enabled: true
+ prometheus-node-exporter:
+ enabled: true
+ fullnameOverride: node-exporter
+ prometheus:
+ monitor:
+ enabled: true
+ relabelings:
+ - action: replace
+ regex: (.*)
+ replacement: $1
+ sourceLabels:
+ - __meta_kubernetes_pod_node_name
+ targetLabel: kubernetes_node
+ victoria-metrics-operator:
+ enabled: true
+ operator:
+ disable_prometheus_converter: false
+ enable_converter_ownership: true
+ vmagent:
+ enabled: true
+ ingress:
+ annotations:
+ external-dns.alpha.kubernetes.io/target: internal.${SECRET_DOMAIN}
+ enabled: true
+ hosts:
+ - vmagent.${SECRET_DOMAIN}
+ ingressClassName: internal
+ spec:
+ additionalScrapeConfigs:
+ key: prometheus-additional.yaml
+ name: vm-additional-scrape-configs
+ externalLabels:
+ cluster: ${CLUSTER_NAME}
+ replicaCount: 1
+ resources:
+ limits:
+ cpu: 400m
+ memory: 512Mi
+ requests:
+ cpu: 50m
+ memory: 256Mi
+ scrapeInterval: 30s
+ shardCount: 2
+ topologySpreadConstraints:
+ - labelSelector:
+ matchLabels:
+ app.kubernetes.io/name: vmagent
+ maxSkew: 1
+ topologyKey: kubernetes.io/hostname
+ whenUnsatisfiable: DoNotSchedule
+ vmalert:
+ enabled: true
+ ingress:
+ annotations:
+ external-dns.alpha.kubernetes.io/target: internal.${SECRET_DOMAIN}
+ enabled: true
+ hosts:
+ - vmalert.${SECRET_DOMAIN}
+ ingressClassName: internal
+ spec:
+ extraArgs:
+ external.url: https://vmalert.${SECRET_DOMAIN}
+ replicaCount: 2
+ resources:
+ limits:
+ cpu: 150m
+ memory: 256Mi
+ requests:
+ cpu: 50m
+ memory: 128Mi
+ topologySpreadConstraints:
+ - labelSelector:
+ matchLabels:
+ app.kubernetes.io/name: vmalert
+ maxSkew: 1
+ topologyKey: kubernetes.io/hostname
+ whenUnsatisfiable: DoNotSchedule
+ vmsingle:
+ enabled: true
+ ingress:
+ annotations:
+ external-dns.alpha.kubernetes.io/target: internal.${SECRET_DOMAIN}
+ enabled: true
+ hosts:
+ - victoria-metrics.${SECRET_DOMAIN}
+ ingressClassName: internal
+ spec:
+ extraArgs:
+ dedup.minScrapeInterval: 30s
+ maxLabelsPerTimeseries: '90'
+ search.minStalenessInterval: 5m
+ vmalert.proxyURL: http://vmalert-victoria-metrics.observability.svc.cluster.local:8080
+ retentionPeriod: 1y
+ storage:
+ accessModes:
+ - ReadWriteOnce
+ resources:
+ requests:
+ storage: 50Gi
+ storageClassName: ceph-block
+ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Welp.