-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(helm): Add health probes to long-lived containers #15359
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
livenessProbe
s and startupProbe
sd281b9d
to
3c99bd4
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1d4bc12
to
bb82bc9
Compare
This comment has been minimized.
This comment has been minimized.
The failing test worked before I rebased. I'll see if I can understand why it's failing, but I'd guess it's not my fault. |
visually it looks ok (did not actually test it) but you are changing some behaviors that I'm not sure you should be doing, for example: readiness and liveness only run after startup (not in parallel) so now instead of loki-gateway (or any other you configured) taking 15 seconds to be available ( I'm not a maintainer, just some feedback from our discussion on Slack |
Thank you, that is great input. I agree, we should set the new probes to I'll fix that. |
bb82bc9
to
9aba722
Compare
This comment has been minimized.
This comment has been minimized.
I have now updated the default values so that I introduce no new probes via the values set in values.yaml. I think there may be some changes where probes used to not exist but now they use the shared Looking forward to more input! |
Apparently, empty probes for Deployments and DaemonSets are no good. I'll fix it. |
9aba722
to
a770361
Compare
This comment has been minimized.
This comment has been minimized.
a770361
to
59b7323
Compare
This comment has been minimized.
This comment has been minimized.
The helper template definitions really got me scratching my head: - I had never seen a `with` statement with an `else` clause before and I cannot find any reference indicating that it is valid. Helm's documentation never mentions it. - There's an extra `{{- end -}}` with not apparent start, yet it did not render an error in the templates (which is very surprising). While it kind of seemed to work, I'm convinced it was more by chance than by design. The old logic should not work, I suspect that the errors got masked by the overly inclusive `with` statement. This change rewrites the template definitions so it's easier to follow and uses valid Helm template flow control. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes it possibe to configure `livenessProbe` and `startupProbe` to the nginx container in the gateway pods. Also adds a guard for when the `readinessProbe` is unset. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes it possible to configure `livenessProbe` and `startupProbe` to the admin-api container in the admin-api pods. Also adds a guard for when the `readinessProbe` is unset. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes it possible to configure `startupProbe` for the loki-sc-rules sidecar container in the backend pods. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes it possible to configure `livenessProbe` and `startupProbe` to the loki container in the backend pods. Also adds a guard for when the `readinessProbe` is unset. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `livenessProbe` and `startupProbe` to the bloom-builder container in the bloom-builder pods. Also adds a guard for when the `readinessProbe` is unset. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `livenessProbe` and `startupProbe` to the bloom-gateway container in the bloom-gateway pods. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `livenessProbe` and `startupProbe` to the bloom-planner container in the bloom-planner pods. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes it possible to configure `livenessProbe`, `readinessProbe` and `startupProbe` to the exporter container in the memcached pods. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes it possible to configure `livenessProbe`, `readinessProbe` and `startupProbe` to the memcached container in the memcached pods. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `startupProbe` to the compactor container in the compactor pods. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `startupProbe` to the distributor container in the distributor pods. Adds guards for when the `readinessProbe` or `livenessProbe` are unset. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes it possible to configure `livenessProbe` and `startupProbe` to the gateway container in the enterprise-gateway pods. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `startupProbe` to the index-gateway container in the index-gateway pods. It also adds guards for the `readinessProbe` and `livenessProbe`. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `startupProbe` to the ingester container in the ingester pods. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes the previously hard-coded `readinessProbe` in loki-canary configurable via Helm values. Signed-off-by: Andreas Lindhé <[email protected]>
This change makes it possible to configure `livenessProbe` and `startupProbe` to the loki-canary container in the loki-canary pods. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `livenessProbe` and `startupProbe` to the pattern-ingester container in the pattern-ingester pods. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `startupProbe` to the querier container in the querier pods. It also adds guards for the `readinessProbe` and `livenessProbe`. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `livenessProbe` and `startupProbe` to the query-frontend container in the query-frontend pods. It also adds a guard for when the `readinessProbe` is unset. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `startupProbe` to the query-scheduler container in the query-scheduler pods. Also adds guards for when `livenessProbe` or `readinessProbe` are unset. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `livenessProbe` and `startupProbe` to the loki container in the read pods. Also adds a guard for when the `readinessProbe` is unset. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `livenessProbe` and `startupProbe` to the ruler container in the ruler pods. Also adds a guard for when the `readinessProbe` is unset. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `startupProbe` to the loki-sc-rules sidecar container in the single-binary pods. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `startupProbe` to the table-manager container in the table-manager pods. Also adds guards for when `livenessProbe` or `readinessProbe` are unset. Signed-off-by: Andreas Lindhé <[email protected]>
Adds configurable `livenessProbe` and `startupProbe` to the loki container in the write pods. Also adds a guard for when the `readinessProbe` is unset. Signed-off-by: Andreas Lindhé <[email protected]>
Done by running this command: ``` make -C docs sources/setup/install/helm/reference.md ``` Signed-off-by: Andreas Lindhé <[email protected]>
59b7323
to
d6b22d2
Compare
Kubernetes Manifest Diff SummaryScenario: default-single-binary-values (Added: 0, Modified: 8, Removed: 0)
Summary:
Added FilesNo added files Modified Filesloki/templates/compactor/statefulset-compactor.yaml*** /home/runner/work/loki/loki/output/base/default-single-binary-values/loki/templates/compactor/statefulset-compactor.yaml 2024-12-14 15:44:10.919877504 +0000
--- /home/runner/work/loki/loki/output/pr/default-single-binary-values/loki/templates/compactor/statefulset-compactor.yaml 2024-12-14 15:44:13.991894339 +0000
***************
*** 74,79 ****
--- 74,80 ----
initialDelaySeconds: 30
timeoutSeconds: 1
+
volumeMounts:
- name: temp
mountPath: /tmp loki/templates/distributor/deployment-distributor.yaml*** /home/runner/work/loki/loki/output/base/default-single-binary-values/loki/templates/distributor/deployment-distributor.yaml 2024-12-14 15:44:10.918877499 +0000
--- /home/runner/work/loki/loki/output/pr/default-single-binary-values/loki/templates/distributor/deployment-distributor.yaml 2024-12-14 15:44:13.991894339 +0000
***************
*** 72,79 ****
port: http-metrics
initialDelaySeconds: 30
timeoutSeconds: 1
- livenessProbe:
- null
volumeMounts:
- name: config
mountPath: /etc/loki/config
--- 72,77 ---- loki/templates/querier/deployment-querier.yaml*** /home/runner/work/loki/loki/output/base/default-single-binary-values/loki/templates/querier/deployment-querier.yaml 2024-12-14 15:44:10.918877499 +0000
--- /home/runner/work/loki/loki/output/pr/default-single-binary-values/loki/templates/querier/deployment-querier.yaml 2024-12-14 15:44:13.991894339 +0000
***************
*** 79,86 ****
port: http-metrics
initialDelaySeconds: 30
timeoutSeconds: 1
- livenessProbe:
- null
volumeMounts:
- name: config
mountPath: /etc/loki/config
--- 79,84 ---- loki/templates/index-gateway/statefulset-index-gateway.yaml*** /home/runner/work/loki/loki/output/base/default-single-binary-values/loki/templates/index-gateway/statefulset-index-gateway.yaml 2024-12-14 15:44:10.919877504 +0000
--- /home/runner/work/loki/loki/output/pr/default-single-binary-values/loki/templates/index-gateway/statefulset-index-gateway.yaml 2024-12-14 15:44:13.991894339 +0000
***************
*** 69,76 ****
port: http-metrics
initialDelaySeconds: 30
timeoutSeconds: 1
- livenessProbe:
- null
volumeMounts:
- name: config
mountPath: /etc/loki/config
--- 69,74 ---- loki/templates/ingester/statefulset-ingester-zone-b.yaml*** /home/runner/work/loki/loki/output/base/default-single-binary-values/loki/templates/ingester/statefulset-ingester-zone-b.yaml 2024-12-14 15:44:10.919877504 +0000
--- /home/runner/work/loki/loki/output/pr/default-single-binary-values/loki/templates/ingester/statefulset-ingester-zone-b.yaml 2024-12-14 15:44:13.991894339 +0000
***************
*** 91,96 ****
--- 91,97 ----
initialDelaySeconds: 30
timeoutSeconds: 1
+
volumeMounts:
- name: config
mountPath: /etc/loki/config loki/templates/ingester/statefulset-ingester-zone-c.yaml*** /home/runner/work/loki/loki/output/base/default-single-binary-values/loki/templates/ingester/statefulset-ingester-zone-c.yaml 2024-12-14 15:44:10.919877504 +0000
--- /home/runner/work/loki/loki/output/pr/default-single-binary-values/loki/templates/ingester/statefulset-ingester-zone-c.yaml 2024-12-14 15:44:13.991894339 +0000
***************
*** 91,96 ****
--- 91,97 ----
initialDelaySeconds: 30
timeoutSeconds: 1
+
volumeMounts:
- name: config
mountPath: /etc/loki/config loki/templates/ingester/statefulset-ingester-zone-a.yaml*** /home/runner/work/loki/loki/output/base/default-single-binary-values/loki/templates/ingester/statefulset-ingester-zone-a.yaml 2024-12-14 15:44:10.919877504 +0000
--- /home/runner/work/loki/loki/output/pr/default-single-binary-values/loki/templates/ingester/statefulset-ingester-zone-a.yaml 2024-12-14 15:44:13.991894339 +0000
***************
*** 91,96 ****
--- 91,97 ----
initialDelaySeconds: 30
timeoutSeconds: 1
+
volumeMounts:
- name: config
mountPath: /etc/loki/config loki/templates/query-scheduler/deployment-query-scheduler.yaml*** /home/runner/work/loki/loki/output/base/default-single-binary-values/loki/templates/query-scheduler/deployment-query-scheduler.yaml 2024-12-14 15:44:10.919877504 +0000
--- /home/runner/work/loki/loki/output/pr/default-single-binary-values/loki/templates/query-scheduler/deployment-query-scheduler.yaml 2024-12-14 15:44:13.991894339 +0000
***************
*** 70,77 ****
port: http-metrics
initialDelaySeconds: 30
timeoutSeconds: 1
- livenessProbe:
- null
volumeMounts:
- name: config
mountPath: /etc/loki/config
--- 70,75 ---- Removed FilesNo removed files Scenario: default-values (Added: 0, Modified: 0, Removed: 0)
Summary:
Added FilesNo added files Modified FilesNo modified files Removed FilesNo removed files Scenario: ingress-values (Added: 0, Modified: 0, Removed: 0)
Summary:
Added FilesNo added files Modified FilesNo modified files Removed FilesNo removed files Scenario: legacy-monitoring-values (Added: 0, Modified: 0, Removed: 0)
Summary:
Added FilesNo added files Modified FilesNo modified files Removed FilesNo removed files Scenario: simple-scalable-aws-kube-irsa-values (Added: 0, Modified: 1, Removed: 0)
Summary:
Added FilesNo added files Modified Filesloki/templates/gateway/deployment-gateway-enterprise.yaml*** /home/runner/work/loki/loki/output/base/simple-scalable-aws-kube-irsa-values/loki/templates/gateway/deployment-gateway-enterprise.yaml 2024-12-14 15:44:11.325879722 +0000
--- /home/runner/work/loki/loki/output/pr/simple-scalable-aws-kube-irsa-values/loki/templates/gateway/deployment-gateway-enterprise.yaml 2024-12-14 15:44:14.403896602 +0000
***************
*** 28,34 ****
app.kubernetes.io/component: gateway
annotations:
checksum/config: 8cfa6bbdb9a4dbb7b195fc7162e2fddac7836db36a03aa494881c1fc01f7c3ac
! spec:
serviceAccountName: enterprise-logs
securityContext:
fsGroup: 10001
--- 28,34 ----
app.kubernetes.io/component: gateway
annotations:
checksum/config: 8cfa6bbdb9a4dbb7b195fc7162e2fddac7836db36a03aa494881c1fc01f7c3ac
! spec:
serviceAccountName: enterprise-logs
securityContext:
fsGroup: 10001 Removed FilesNo removed files |
What this PR does / why we need it:
This PR attempts to add configurable health probes to all long-lived containers where they are currently missing, to improve Kubernetes' lifecycle management of Loki's pods.
Which issue(s) this PR fixes:
Fixes #13967
Partially addresses #13678
Special notes for your reviewer:
readinessProbe
) is done in slightly different ways. I think it might be a good thing to eventually streamline the code style, but this too belongs in another PR..Values.bloomGateway.livenessProbe
before this PR). It's probably worth auditing the values and attempt to catch any such cases (in another PR).Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR