-
Notifications
You must be signed in to change notification settings - Fork 531
stability: git event watch and visualization #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
|
||
| When debugging a failed e2e test (or a string of them), one common question is, "what is the status of clusteroperator/foo | ||
| when this particular test was running". | ||
| While we could consider one-off solutions to this, we have a solution for storing this information inside of a local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you link to the tools metioned here? This document as written is so vague as to not be understandable :|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you link to the tools metioned here? This document as written is so vague as to not be understandable :|
It captured the idea for @damemi who I think has found the tool and has a PR to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevekuznetsov the tool being referred to is https://github.com/mfojtik/ci-monitor-operator and we are working on adding it in openshift/origin#24845
This has inspired a longer-term goal for me to add distributed tracing throughout our components
|
|
||
| ### Goals | ||
|
|
||
| 1. Know the state of clusteroperators, events, and pod at any given time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Know the state of clusteroperators, events, and pod at any given time
What kind of state do you need to know, curious what metrics are missing that should be sent from CI clusters, as we plan on adding an ability to search through CI cluster metrics at some point in the near future. I am curious if that would be useful to connect the different traces, the metrics we send and this what you are proposing?
|
|
||
| ## Proposal | ||
|
|
||
| 1. Install Michal's tool in every cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we plan on enabling Prometheus remote write for CI clusters to send some metrics and alerts in pending state where they can be queried in a timeline. Would love to get your feedback on which metrics should be included in the first batch to send out, thanks!
https://docs.google.com/document/d/1_ILVUYNBC07EHaIlqel9EL1UCWLQlKlMJtTz2Xq9Tmo/edit
| approvers: | ||
| creation-date: yyyy-mm-dd | ||
| last-updated: yyyy-mm-dd | ||
| status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to pick values for these headers right? Or should we just drop them from the template?
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This helps solve problems with flakes in CI tests that can be due to operator problems and re-uses existing and valuable visualization for the run.