Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add liveness and readiness probes to koperator's manager pod #1050

Merged
merged 21 commits into from
Aug 31, 2023

Conversation

matewolf
Copy link
Contributor

Description

Implementation of readiness and liveness probe of koperators' manager pod. Referring to #661.

Type of Change

  • New Feature

Checklist

  • I have read the contributing guidelines
  • Existing issues have been referenced (where applicable)
  • I have verified this change is not present in other open pull requests
  • Functionality is documented
  • All code style checks pass

@matewolf matewolf requested a review from a team as a code owner August 24, 2023 13:11
Kuvesz
Kuvesz previously approved these changes Aug 24, 2023
Copy link
Contributor

@Kuvesz Kuvesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a comment

Copy link
Member

@pregnor pregnor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, there was one misplaced import line and a couple questions.

config/base/manager/manager.yaml Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@matewolf matewolf requested review from pregnor and panyuenlau August 24, 2023 14:50
@szykes
Copy link
Contributor

szykes commented Aug 25, 2023

I did some investigation around the readiness and liveness probes, specifically on the implementation side, so within the manager.

I did not know, when the controller will report readyz for the first time. Luckily, the kubebuilder did a good work here I think. The HTTP endpoints registration will happen in mgr.Start(...) of main.go of koperator and if any error happens in that function, the koperator process will exit due to error check in the main function. This means you can do whatever you want in the main function because the koperator will not report the ready state wrongly. The koperator process can even hang and if it cannot continue its operation, the kubernetes will restart the pod due to missing readyz.

On the other hand the liveness probe may not do what you think. The healthyz probe is registered this way: go cm.httpServe("health probe", cm.logger, server, cm.healthProbeListener) from
sigs.k8s.io/[email protected]/pkg/manager/internal.go:362. So, if the kopeartor hangs due to whatever reason, it will report healthz because the healthz.Ping is just a { return nil }. Or at least this is my understanding.

I have found the original request: Support health and liveness probes #297 and the correspoding PR: added health probes
#419
. This is nice but as I understand this implementation does not check the controllers. I miss that function that is described here Need a way to notify contoller-manager from each controller #387.

Do you know what happens when the reconcile function hangs?

@matewolf
Copy link
Contributor Author

Thank you for the detailed investigation @szykes! :))
I also checked the detailed implementation of bult-in readiness and liveness function and I raised this question to myself. I was also considering that the liveness probe should be failed if one of the controller stop running. After that I consulted @bartam1 about the desired operation of liveness probe and I think it's a correct solution if the liveness probe answers if the manager is running because @bartam1 told me there were no case when manager was running but one of the reconcile functions was hanging.

@matewolf matewolf requested a review from Kuvesz August 30, 2023 13:00
@matewolf matewolf merged commit a6f7678 into master Aug 31, 2023
@matewolf matewolf deleted the feat_liveness_readiness branch August 31, 2023 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants