Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics support #217

Merged
merged 4 commits into from
Oct 8, 2021
Merged

Add metrics support #217

merged 4 commits into from
Oct 8, 2021

Conversation

viveklak
Copy link
Contributor

@viveklak viveklak commented Oct 8, 2021

Proposed changes

Fixes: #123

The current implementation explicitly tracks the following metrics:

  1. stacks_active - gauge that tracks the number of currently registered stacks managed by the system
  2. stacks_failing - gaugevec that provides information about stacks currently failing

The output of curl to :8383 on the pod results in something like this:

# HELP stacks_active Number of stacks currently tracked by the Pulumi Kubernetes Operator
# TYPE stacks_active gauge
stacks_active 3

...
# HELP stacks_failing Number of stacks currently registered where the last reconcile failed
# TYPE stacks_failing gauge
stacks_failing{name="stack-test-aws-s3-commit-change-5wuczx",namespace="default"} 1

In addition to the above, users have access to the following metrics emitted by controller runtime:

  1. controller_runtime_active_workers{controller="stack-controller"} - gauge that tracks the number of concurrent stacks being processed
  2. controller_runtime_max_concurrent_reconciles{controller="stack-controller"} - gauge that tracks the max concurrent stack reconciles configured. This defaults to 10 but can be controlled through MAX_CONCURRENT_RECONCILES environment variable added in Make max reconciles configurable #213
  3. controller_runtime_reconcile_errors_total{controller="stack-controller"} - counter of errored reconciles
  4. controller_runtime_reconcile_time_seconds_*{controller="stack-controller"} - histogram providing latency information for reconciles
  5. controller_runtime_reconcile_total{controller="stack-controller",result="error"} - counter for errored reconciles
  6. controller_runtime_reconcile_total{controller="stack-controller",result="requeue"} - counter for requeued reconciles
  7. controller_runtime_reconcile_total{controller="stack-controller",result="success"} - counter for successful reconciles

Together these should provide sufficient coverage for monitoring basic operation of the controller. Additional metrics can be added as necessary.

Related issues (optional)

#123

@@ -195,11 +208,15 @@ func (r *ReconcileStack) Reconcile(ctx context.Context, request reconcile.Reques

// Step 2. If there are extra environment variables, read them in now and use them for subsequent commands.
if err = sess.SetEnvs(stack.Envs, request.Namespace); err != nil {
reqLogger.Error(err, "Could not find ConfigMap for Envs")
Copy link
Contributor Author

@viveklak viveklak Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized we weren't marking the stack failed in sufficient places. This improves things a bit. I am happy to move this to a separate PR if necessary.

@viveklak
Copy link
Contributor Author

viveklak commented Oct 8, 2021

Opened #218 to track adding docs.

Copy link
Member

@lblackstone lblackstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to check the expected metrics in a test, but we can defer that in the interest of time if you've checked it manually.

@@ -17,6 +17,10 @@ import (
"strings"
"time"

"github.com/operator-framework/operator-lib/handler"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import sorting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@viveklak viveklak merged commit 04d5ab0 into master Oct 8, 2021
@viveklak viveklak deleted the vl/Metrics branch October 8, 2021 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose metrics
2 participants