Skip to content

Bug 1780170: Expose metrics#270

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
benjaminapetersen:monitoring/expose-metrics
Sep 19, 2019
Merged

Bug 1780170: Expose metrics#270
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
benjaminapetersen:monitoring/expose-metrics

Conversation

@benjaminapetersen
Copy link
Copy Markdown
Contributor

@benjaminapetersen benjaminapetersen commented Aug 14, 2019

We should expose /metrics to cluster monitor for both console & console-operator. Working out the details of this in this PR.

  • console-operator via library-go already has a /metrics endpoint running, its just a matter of wiring it up
  • console does not have a /metrics endpoint at this time. This may require a follow on set of PRs, one to the console to enable a prom-client for the console server & collect at least basic heap metrics

Validate that the operator is responding to the /metrics endpoint via:

curl --insecure -H "Authorization: Bearer $(oc whoami --show-token)" https://$(oc get route metrics --template "{{.spec.host}}"/metrics) | grep console_url

Validate that the console_url metric is being reported in the monitoring UI:

open $(oc get route prometheus-k8s -n openshift-monitoring -o jsonpath="{.spec.host}")

Console URL

The console URL metric is tracked in the form of:
console_url{url="https://<url>"} 1

Queried in the prometheus dashboard, will look like:
console_url{endpoint="https",instance="<ip>:8443",job="metrics",namespace="openshift-console-operator",pod="console-operator-<id>",service="metrics",url="https://console-openshift-console.apps.<domain>.devcluster.openshift.com"}

Prometheus UI:

Screen Shot 2019-09-18 at 4 39 03 PM

Our UI:

Screen Shot 2019-09-18 at 4 47 12 PM

Related PRs to enable telemetry:

Note that this process will be simplified into a simpler config for 4.3.

Gist for the RBAC/manifest items to add in the future when we enable console itself:

Gist tracking steps for getting an metric whitelisted for telemetry

@benjaminapetersen benjaminapetersen changed the title Expose metrics [WIP] Expose metrics Aug 14, 2019
@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 14, 2019
Comment thread manifests/0000_90_console-operator_02_servicemonitor.yaml Outdated
@benjaminapetersen
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@benjaminapetersen
Copy link
Copy Markdown
Contributor Author

/retest

Comment thread manifests/05-config.yaml
Comment thread pkg/console/operator/sync_v400.go
Comment thread pkg/console/operator/sync_v400.go
Comment thread manifests/04-rbac-rolebinding.yaml
kind: Role
metadata:
name: prometheus-k8s
namespace: openshift-console-operator
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable metrics to scrape in console-operator and console namespaces.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont actually have a /metrics endpoint exposed for console, yet.

Comment thread manifests/0000_90_console-operator_02_servicemonitor.yaml
Comment thread manifests/0000_90_console-operator_02_servicemonitor.yaml Outdated
Comment thread manifests/05-service.yaml
@benjaminapetersen benjaminapetersen force-pushed the monitoring/expose-metrics branch 2 times, most recently from 413aa05 to a041171 Compare September 17, 2019 14:25
Comment thread manifests/07-operator.yaml
Comment thread manifests/03-rbac-role-cluster.yaml Outdated
Comment thread manifests/0000_90_console-operator_02_servicemonitor.yaml Outdated
Comment thread manifests/04-rbac-rolebinding.yaml
Comment thread manifests/04-rbac-rolebinding-cluster.yaml
@benjaminapetersen benjaminapetersen changed the title [WIP] Expose metrics Expose metrics Sep 18, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2019
@benjaminapetersen
Copy link
Copy Markdown
Contributor Author

@jhadvig @spadgett ready for review, we are wired up.

A snippet of the other metrics we get automatically:

# HELP Console_adds (Deprecated) Total number of adds handled by workqueue: Console
# TYPE Console_adds counter
Console_adds 17
...
# HELP Console_queue_latency (Deprecated) How long an item stays in workqueueConsole before being requested.
# TYPE Console_queue_latency summary
Console_queue_latency{quantile="0.5"} 9
Console_queue_latency{quantile="0.9"} 83151
Console_queue_latency{quantile="0.99"} 84765
Console_queue_latency_sum 432139
Console_queue_latency_count 17
...
# TYPE ManagementStateController_console_queue_latency summary
ManagementStateController_console_queue_latency{quantile="0.5"} 5
ManagementStateController_console_queue_latency{quantile="0.9"} 5
...
# TYPE ManagementStateController_console_retries counter
ManagementStateController_console_retries 0
...
workqueue_work_duration_seconds_sum{name="ManagementStateController_console"} 0.0017626229999999998
workqueue_work_duration_seconds_count{name="ManagementStateController_console"} 3
...
etc
etc

Copy link
Copy Markdown
Member

@spadgett spadgett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be creating role bindings for things we're not using yet. Let's remove any references to metrics in openshift-console until it's needed.

Comment thread manifests/02-namespace.yaml Outdated
Comment thread manifests/0000_90_console-operator_01_prometheusrbac.yaml Outdated
Copy link
Copy Markdown
Member

@jhadvig jhadvig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions, otherwise looking good 👍

Comment thread pkg/console/operator/sync_v400.go
Comment thread pkg/console/operator/sync_v400.go
@brancz
Copy link
Copy Markdown

brancz commented Sep 19, 2019

Metrics should be all lower case and word boundaries separated by underscore. In general we adhere to the Kubernetes Instrumentation Guidelines in OpenShift, so we should do the same here.

@benjaminapetersen
Copy link
Copy Markdown
Contributor Author

@brancz I see that, so I did:

	// metric: console_url{url="https://<url>"} 1
	consoleURLMetric = prometheus.NewGaugeVec(prometheus.GaugeOpts{
		Namespace: "console",
		Name:      "url",
		Help:      "URL of the console exposed on the cluster",
		// one label
	}, []string{"url"})

to give us console_url (using the Namespace). Do you prefer we skip the Namespace & just hard-code console_url as Name?

@benjaminapetersen
Copy link
Copy Markdown
Contributor Author

@brancz disregard, i'll update to that per reading the doc you shared, thx!

- Add manifest for ServiceMonitor object
- Authorize prometheus to scrape metrics from console & console-operator
  - NOTE: console disabled until we support the /metrics endpoint
- RBAC
  - reenable delegated auth
  - add rolebinding to kube-system for
extension-apiserver-authentication-reader for console-operatoro
  - add clusterrole system:auth-delegator to console-operator
  - allows authentication (tokenreview) and authorization
(subjectaccessreview)
  - required as the /metrics endpoint is protected
- mount serving-cert in operator deployment
- add labels to namespaces to tell openshift monitoring to begin
monitoring
@spadgett
Copy link
Copy Markdown
Member

/retest

Copy link
Copy Markdown
Member

@spadgett spadgett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 19, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: benjaminapetersen, spadgett

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [benjaminapetersen,spadgett]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@benjaminapetersen
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-4.1

@openshift-cherrypick-robot
Copy link
Copy Markdown

@benjaminapetersen: #270 failed to apply on top of branch "release-4.1":

.git/rebase-apply/patch:80: new blank line at EOF.
+
.git/rebase-apply/patch:174: new blank line at EOF.
+
warning: 2 lines add whitespace errors.
error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
A	manifests/04-rbac-rolebinding-cluster.yaml
M	manifests/04-rbac-rolebinding.yaml
M	manifests/07-operator.yaml
M	pkg/console/operator/sync_v400.go
Falling back to patching base and 3-way merge...
Auto-merging vendor/k8s.io/apiextensions-apiserver/artifacts/example/auth-delegator.yaml
CONFLICT (content): Merge conflict in vendor/k8s.io/apiextensions-apiserver/artifacts/example/auth-delegator.yaml
Auto-merging pkg/console/operator/sync_v400.go
CONFLICT (content): Merge conflict in pkg/console/operator/sync_v400.go
Auto-merging manifests/07-operator.yaml
Auto-merging manifests/04-rbac-rolebinding.yaml
CONFLICT (content): Merge conflict in manifests/04-rbac-rolebinding.yaml
Patch failed at 0001 Expose /metrics endpoint for monitoring

Details

In response to this:

/cherry-pick release-4.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@benjaminapetersen
Copy link
Copy Markdown
Contributor Author

kinda figured.

@spadgett spadgett changed the title Expose metrics Bug 1780170: Expose metrics Dec 5, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@benjaminapetersen: All pull requests linked via external trackers have merged. Bugzilla bug 1780170 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1780170: Expose metrics

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

zonggen pushed a commit to zonggen/console-operator that referenced this pull request Oct 5, 2021
Adds a new ServiceMonitor to allow the Helm metrics being scraped from
console /metrics endpoint by prometheus-k8s.

Closes: https://issues.redhat.com/browse/HELM-235
Reference: openshift#270
Signed-off-by: Allen Bai <abai@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants