-
Notifications
You must be signed in to change notification settings - Fork 43
add grafana views for tekton chains controller workque and latency #992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
yeah I've never seen that being able to reopen a PR before @jkhelil .... the same thing shows up for me. No idea what is going on. in any event, yes, the screen shot at #989 (comment) was along the lines of what I was talking about. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the queries look OK, but some of the titles etc. that you copied over from the results panel were not updated to say tekton chains @jkhelil
| ], | ||
| "title": "Tekton Chains CPU Fraction Use", | ||
| "type": "timeseries" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any idea why this Tekton Chains CPU Fraction Use was repeated @jkhelil .... perhaps the title is wrong and it needs to changes to reflect what the new panel is ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate, removed now
| "id": 79, | ||
| "panels": [ | ||
| { | ||
| "description": "Watcher work queue depth", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "description": "Watcher work queue depth", | |
| "description": "Tekton Chains work queue depth", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have to this one to update @jkhelil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| "refId": "work queue depth" | ||
| } | ||
| ], | ||
| "title": "Watcher Work Queue Depth", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "title": "Watcher Work Queue Depth", | |
| "title": "Tekton Chains Work Queue Depth", |
| "refId": "reconcile latency" | ||
| } | ||
| ], | ||
| "title": "Watcher Reconcile Latency", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "title": "Watcher Reconcile Latency", | |
| "title": "Tekton Chains Reconcile Latency", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closer but a couple of more items @jkhelil
also, I suggest you try some of the queried from Konflux prod or stage if you have not already for practice
| { | ||
| "editorMode": "code", | ||
| "exemplar": true, | ||
| "expr": "sum(watcher_workqueue_depth{container='tekton-chains-controller',app='tekton-chains-controller'})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I validated this query works in prod-rh01 from the OCP metrics console.
In theory you should have permissions to do that as well @jkhelil
If you get a sec, try logging into the OCP console using the red hat SSO and try your query over at https://console-openshift-console.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com/monitoring/query-browser?query0=
| "id": 79, | ||
| "panels": [ | ||
| { | ||
| "description": "Watcher work queue depth", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have to this one to update @jkhelil
| { | ||
| "editorMode": "code", | ||
| "exemplar": true, | ||
| "expr": "histogram_quantile(0.99, sum(rate(watcher_reconcile_latency_bucket{job=\"tekton-chains\"}[30m])) by (le) ) / 1000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to convert \" to just " @jkhelil but histogram_quantile(0.99, sum(rate(watcher_reconcile_latency_bucket{job="tekton-chains"}[30m])) by (le) ) / 1000 also works from the OCP metrics console ... see if you can do it as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is just a way to escape doublequote inside the doublequote, it is used all over the json file
| { | ||
| "editorMode": "builder", | ||
| "expr": "rate(watcher_go_gc_cpu_fraction{app=\"tekton-chains-controller\"}[5m])", | ||
| "expr": "rate(watcher_workqueue_longest_running_processor_seconds_count{app=\"tekton-chains-controller\"}[5m])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm necessarily opposed to this change, but this was not one we discussed @jkhelil ... explain to me why you went in this direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was no change,just inversed the order for the two metrics, it is now ok
| { | ||
| "editorMode": "builder", | ||
| "expr": "rate(watcher_workqueue_longest_running_processor_seconds_count{app=\"tekton-chains-controller\"}[5m])", | ||
| "expr": "rate(watcher_go_gc_cpu_fraction{app=\"tekton-chains-controller\"}[5m])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I'm necessarily opposed to this change, but this was not one we discussed @jkhelil ... explain to me why you went in this direction
|
looks good diff wise @jkhelil |

A followup to #989 accidently, and not able to reopen it)