-
Notifications
You must be signed in to change notification settings - Fork 306
Description
Logging is a crucial to make it easier to understand controller behavior and troubleshoot issues.
While onboarding to CAPV we identified a few quickwins which should improve log quality significantly:
- It is a best practice in Kubernetes to add k/v pairs of the involved objects to a log line (https://cluster-api.sigs.k8s.io/developer/logging.html#keyvalue-pairs)
- We should also add k/v pairs for all involved objects (and their owner hierarchy) that we retrieve at the beginning of the Reconcile
- We should also consistently use contextual logging (i.e. taking the logger from the ctx). This allows propagation of k/v pairs across the entire call stack
- This also includes using the ctx that controller-runtime passes into the Reconcile func instead of some global ctx that is the same across Reconcile calls.
The overall goal of this issue is to improve the logs so troubleshooting becomes easier. Adding k/v pairs across the board will make it very easy to correlate logs e.g. for a specific Machine across controllers.
Additional notes:
- Let's take a look at how event recorders are setup in core CAPI vs CAPV
- Take a look at controller logger setup (e.g. regarding name). IIRC we shouldn't need any logger on the controllers though
- Audit all log calls for additional k/v pairs
- Ensure "Named" is set correctly on all controllers
- Let's double check the k/v pairs we add in core CAPI to ensure we can cross-reference everything
Prior art in core CAPI:
- 🌱 Improve key value pairs consistency in logging cluster-api#6150
- ✨ Improve key value pairs consistency in logging (II) cluster-api#7075
- 🌱 logging: adjust reconcilers to log object owners cluster-api#7152
- 🐛 logging: Avoid adding multiple objects to the same logger in for loops cluster-api#7534
Tasks
Concrete tasks for now (I have some follow-ups to audit, but let's do this afterwards):
-
Wait until context refactoring is done: Refactor CAPV controller context #2295
-
Refactor controllers (concrete tasks below):
- vmware/vspherecluster_reconciler.go
- clustermodule_reconciler.go
- serviceaccount_controller.go
- servicediscovery_controller.go
- vspherecluster_controller.go
- vspherecluster_reconciler.go
- vsphereclusteridentity_controller.go @Madhur97
- vspheredeploymentzone_controller.go & vspheredeploymentzone_controller_domain.go @Ankitasw
- vspheremachine_controller.go @Ankitasw
- vspherevm_controller.go & vspherevm_ipaddress_reconciler.go @adityabhatia
-
Final audit (@sbueringer)
I took a quick look and there should be no overlap, so every tasks should ideally be a separate PR.
Tasks per controller:
- Adjust controller setup:
- ControllerContext & Logger fields should be dropped from the Reconciler struct
- Add Client & Recorder fields instead
- If necessary we can add ControllerManagerContext (with a field name, no embedding to make the usage of the context explicit)
- Use logger from context.Context. Drop client & logger fields from structs like ClusterContext, MachineContext, ...
- If there is currently no ctx available to get the logger from, add a ctx parameter to the current func
- This will probably also lead to some compile errors when e.g. a MachineContext is passed into a client.Get. Please then pass in a context.Context instead.
- Add k/v pairs where appropriate and for all related object we "get" early in
Reconcile
For more details see #2352