Skip to content

Improve CAPV logging #2076

@sbueringer

Description

@sbueringer

Logging is a crucial to make it easier to understand controller behavior and troubleshoot issues.

While onboarding to CAPV we identified a few quickwins which should improve log quality significantly:

  • It is a best practice in Kubernetes to add k/v pairs of the involved objects to a log line (https://cluster-api.sigs.k8s.io/developer/logging.html#keyvalue-pairs)
    • We should also add k/v pairs for all involved objects (and their owner hierarchy) that we retrieve at the beginning of the Reconcile
  • We should also consistently use contextual logging (i.e. taking the logger from the ctx). This allows propagation of k/v pairs across the entire call stack
    • This also includes using the ctx that controller-runtime passes into the Reconcile func instead of some global ctx that is the same across Reconcile calls.

The overall goal of this issue is to improve the logs so troubleshooting becomes easier. Adding k/v pairs across the board will make it very easy to correlate logs e.g. for a specific Machine across controllers.

Additional notes:

  • Let's take a look at how event recorders are setup in core CAPI vs CAPV
  • Take a look at controller logger setup (e.g. regarding name). IIRC we shouldn't need any logger on the controllers though
  • Audit all log calls for additional k/v pairs
  • Ensure "Named" is set correctly on all controllers
  • Let's double check the k/v pairs we add in core CAPI to ensure we can cross-reference everything

Prior art in core CAPI:

Tasks

Concrete tasks for now (I have some follow-ups to audit, but let's do this afterwards):

I took a quick look and there should be no overlap, so every tasks should ideally be a separate PR.

Tasks per controller:

  • Adjust controller setup:
    • ControllerContext & Logger fields should be dropped from the Reconciler struct
    • Add Client & Recorder fields instead
    • If necessary we can add ControllerManagerContext (with a field name, no embedding to make the usage of the context explicit)
  • Use logger from context.Context. Drop client & logger fields from structs like ClusterContext, MachineContext, ...
    • If there is currently no ctx available to get the logger from, add a ctx parameter to the current func
    • This will probably also lead to some compile errors when e.g. a MachineContext is passed into a client.Get. Please then pass in a context.Context instead.
  • Add k/v pairs where appropriate and for all related object we "get" early in Reconcile

For more details see #2352

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions