-
Notifications
You must be signed in to change notification settings - Fork 112
metrics: gather cluster_installer series #189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: gather cluster_installer series #189
Conversation
|
/cc @smarterclayton |
|
@crawford do you have a running cluster you tested this on? if so, could you also run: make test/timeseries.txt
make generateto get the new metric in the If you don't have one on hand, don't worry, we can do a follow up :) |
|
@squat my generated timeseries data is significantly different than the last. Is that a concern? |
|
@crawford not an issue at all. It’s expected that pretty much every line should be different as timestamps and values will all change. |
|
@squat updated. |
|
Sorry to complicate things. The makefile has recently changed. Plz try |
|
@squat Okay, |
|
There is an internal approval chain for that. Currently, the monitoring team working on Telemeter and I approve new metrics that people like to push back. The engineering team makes sure that the metric is ok to send (specifically taking cardinality into account) and I just want to understand what the use cases are and if that aligns with Telemeters goal. TD;LR: you need two LGTM: 1) telemeter engineer and 2) me :) |
|
/hold From telemeter engineering team, this looks good to me. The new time series do not present a burden and are well within our current capacity. |
|
/lgtm |
|
I am actually not really sure what the following means:
What information are we sending off-cluster? Do you have an example? |
|
Hmm, let me take a closer look. I'll update the commit message with an example as well. |
|
@squat Okay, this should be ready for review now. |
|
/retest |
|
/retest |
|
/approve This metric is approved at a product level (high value in both triage and identification of unique issues) |
squat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Spoke with Alex about this and looks good :) I’m glad we got this in
|
/hold cancel |
Rather than hardcoding the path to bash (which is not always in /bin), use `env` to find it.
This series will allow us to determine whether a cluster was created
using IPI or UPI. Once Hive and CI are updated, we'll also be able to
determine if a cluster was created by CI or if it was created by Hive.
The series captures the install type, the version of the tool, and its
invoker (in addition to the standard labels):
cluster_installer{
type="openshift-install",
version="unreleased-master-1209-gfd08f44181f2111",
invoker="alex",
} 0 1562168623759
This new sample includes the new cluster_installer series.
|
Rebased onto master. @squat can you take one more look? |
|
ahh we had another conflict |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, smarterclayton, squat The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@smarterclayton yes it does. Where did the server manifests go? |
|
It looks like last week the telemeter-server manifests changed home to their new v2 location in github.com/observatorium/configuration. The important thing is to update the metrics.json file. This is the source of truth and will get picked up in the new v2 configuration repo (including the telemeter-server whitelist) the next time we bump. However, it seems that the implications of this change were not completely considered. We’ll need to either revert or adjust the CI job to reflect this. |
|
@squat cool. Do you know what's up with those two failed tests? Did I do something wrong in the PR or should I just kick them again? |
|
@crawford let me dig and take a looksie |
|
/retest |
|
/cherrypick release-4.1 |
|
@crawford: #189 failed to apply on top of branch "release-4.1": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
No description provided.