-
Notifications
You must be signed in to change notification settings - Fork 533
stability: git event watch and visualization #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| --- | ||
| title: stability-history-of-resources | ||
| authors: | ||
| - "@deads2k" | ||
| reviewers: | ||
| approvers: | ||
| creation-date: yyyy-mm-dd | ||
| last-updated: yyyy-mm-dd | ||
| status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced | ||
| see-also: | ||
| - "/enhancements/this-other-neat-thing.md" | ||
| replaces: | ||
| - "/enhancements/that-less-than-great-idea.md" | ||
| superseded-by: | ||
| - "/enhancements/our-past-effort.md" | ||
| --- | ||
|
|
||
| # Stability: Keep History of Resources | ||
|
|
||
| ## Release Signoff Checklist | ||
|
|
||
| - [ ] Enhancement is `implementable` | ||
| - [ ] Design details are appropriately documented from clear requirements | ||
| - [ ] Test plan is defined | ||
| - [ ] Graduation criteria for dev preview, tech preview, GA | ||
| - [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
|
||
| ## Open Questions [optional] | ||
|
|
||
| This is where to call out areas of the design that require closure before deciding | ||
| to implement the design. For instance, | ||
| > 1. This requires exposing previously private resources which contain sensitive | ||
| information. Can we do this? | ||
|
|
||
| ## Summary | ||
|
|
||
| When debugging a failed e2e test (or a string of them), one common question is, "what is the status of clusteroperator/foo | ||
| when this particular test was running". | ||
| While we could consider one-off solutions to this, we have a solution for storing this information inside of a local | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you link to the tools metioned here? This document as written is so vague as to not be understandable :|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It captured the idea for @damemi who I think has found the tool and has a PR to add it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @stevekuznetsov the tool being referred to is https://github.com/mfojtik/ci-monitor-operator and we are working on adding it in openshift/origin#24845 This has inspired a longer-term goal for me to add distributed tracing throughout our components |
||
| git repo that Michal Fojtik developed almost a year ago. | ||
| We should use this solution so we can effectively go back in time and see what happened. | ||
|
|
||
| When combined with tooling that Seth Jennings built a year ago to visualize various rollout flows, we can have powerful | ||
| correlative graphs. | ||
|
|
||
| ## Motivation | ||
|
|
||
| David's clairvoyance isn't strong enough to see the past with clarity. | ||
|
|
||
| ### Goals | ||
|
|
||
| 1. Know the state of clusteroperators, events, and pod at any given time. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What kind of state do you need to know, curious what metrics are missing that should be sent from CI clusters, as we plan on adding an ability to search through CI cluster metrics at some point in the near future. I am curious if that would be useful to connect the different traces, the metrics we send and this what you are proposing? |
||
| 2. Visualize that state with Seth's tools via a simple link like promecieus. | ||
|
|
||
| ### Non-Goals | ||
|
|
||
| 1. Make a durable record intended for production, but that would be a really cool product. | ||
|
|
||
| ## Proposal | ||
|
|
||
| 1. Install Michal's tool in every cluster | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently we plan on enabling Prometheus remote write for CI clusters to send some metrics and alerts in pending state where they can be queried in a timeline. Would love to get your feedback on which metrics should be included in the first batch to send out, thanks! https://docs.google.com/document/d/1_ILVUYNBC07EHaIlqel9EL1UCWLQlKlMJtTz2Xq9Tmo/edit |
||
| 2. Harvest the git repo in CI artifacts | ||
| 3. Build a git integration for Seth's tool | ||
| 4. Wire up a way to have a simple link for timeline visualization. | ||
| 5. Profit. | ||
|
|
||
| ### User Stories [optional] | ||
|
|
||
| Detail the things that people will be able to do if this is implemented. | ||
| Include as much detail as possible so that people can understand the "how" of | ||
| the system. The goal here is to make this feel real for users without getting | ||
| bogged down. | ||
|
|
||
| #### Story 1 | ||
|
|
||
| #### Story 2 | ||
|
|
||
| ### Implementation Details/Notes/Constraints [optional] | ||
|
|
||
| What are the caveats to the implementation? What are some important details that | ||
| didn't come across above. Go in to as much detail as necessary here. This might | ||
| be a good place to talk about core concepts and how they relate. | ||
|
|
||
| ### Risks and Mitigations | ||
|
|
||
| What are the risks of this proposal and how do we mitigate. Think broadly. For | ||
| example, consider both security and how this will impact the larger OKD | ||
| ecosystem. | ||
|
|
||
| How will security be reviewed and by whom? How will UX be reviewed and by whom? | ||
|
|
||
| Consider including folks that also work outside your immediate sub-project. | ||
|
|
||
| ## Design Details | ||
|
|
||
| ### Test Plan | ||
|
|
||
| **Note:** *Section not required until targeted at a release.* | ||
|
|
||
| Consider the following in developing a test plan for this enhancement: | ||
| - Will there be e2e and integration tests, in addition to unit tests? | ||
| - How will it be tested in isolation vs with other components? | ||
|
|
||
| No need to outline all of the test cases, just the general strategy. Anything | ||
| that would count as tricky in the implementation and anything particularly | ||
| challenging to test should be called out. | ||
|
|
||
| All code is expected to have adequate tests (eventually with coverage | ||
| expectations). | ||
|
|
||
| ### Graduation Criteria | ||
|
|
||
| **Note:** *Section not required until targeted at a release.* | ||
|
|
||
| Define graduation milestones. | ||
|
|
||
| These may be defined in terms of API maturity, or as something else. Initial proposal | ||
| should keep this high-level with a focus on what signals will be looked at to | ||
| determine graduation. | ||
|
|
||
| Consider the following in developing the graduation criteria for this | ||
| enhancement: | ||
| - Maturity levels - `Dev Preview`, `Tech Preview`, `GA` | ||
| - Deprecation | ||
|
|
||
| Clearly define what graduation means. | ||
|
|
||
| #### Examples | ||
|
|
||
| These are generalized examples to consider, in addition to the aforementioned | ||
| [maturity levels][maturity-levels]. | ||
|
|
||
| ##### Dev Preview -> Tech Preview | ||
|
|
||
| - Ability to utilize the enhancement end to end | ||
| - End user documentation, relative API stability | ||
| - Sufficient test coverage | ||
| - Gather feedback from users rather than just developers | ||
|
|
||
| ##### Tech Preview -> GA | ||
|
|
||
| - More testing (upgrade, downgrade, scale) | ||
| - Sufficient time for feedback | ||
| - Available by default | ||
|
|
||
| **For non-optional features moving to GA, the graduation criteria must include | ||
| end to end tests.** | ||
|
|
||
| ##### Removing a deprecated feature | ||
|
|
||
| - Announce deprecation and support policy of the existing feature | ||
| - Deprecate the feature | ||
|
|
||
| ### Upgrade / Downgrade Strategy | ||
|
|
||
| If applicable, how will the component be upgraded and downgraded? Make sure this | ||
| is in the test plan. | ||
|
|
||
| Consider the following in developing an upgrade/downgrade strategy for this | ||
| enhancement: | ||
| - What changes (in invocations, configurations, API use, etc.) is an existing | ||
| cluster required to make on upgrade in order to keep previous behavior? | ||
| - What changes (in invocations, configurations, API use, etc.) is an existing | ||
| cluster required to make on upgrade in order to make use of the enhancement? | ||
|
|
||
| ### Version Skew Strategy | ||
|
|
||
| How will the component handle version skew with other components? | ||
| What are the guarantees? Make sure this is in the test plan. | ||
|
|
||
| Consider the following in developing a version skew strategy for this | ||
| enhancement: | ||
| - During an upgrade, we will always have skew among components, how will this impact your work? | ||
| - Does this enhancement involve coordinating behavior in the control plane and | ||
| in the kubelet? How does an n-2 kubelet without this feature available behave | ||
| when this feature is used? | ||
| - Will any other components on the node change? For example, changes to CSI, CRI | ||
| or CNI may require updating that component before the kubelet. | ||
|
|
||
| ## Implementation History | ||
|
|
||
| Major milestones in the life cycle of a proposal should be tracked in `Implementation | ||
| History`. | ||
|
|
||
| ## Drawbacks | ||
|
|
||
| The idea is to find the best form of an argument why this enhancement should _not_ be implemented. | ||
|
|
||
| ## Alternatives | ||
|
|
||
| Similar to the `Drawbacks` section the `Alternatives` section is used to | ||
| highlight and record other possible approaches to delivering the value proposed | ||
| by an enhancement. | ||
|
|
||
| ## Infrastructure Needed [optional] | ||
|
|
||
| Use this section if you need things from the project. Examples include a new | ||
| subproject, repos requested, github details, and/or testing infrastructure. | ||
|
|
||
| Listing these here allows the community to get the process for these resources | ||
| started right away. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to pick values for these headers right? Or should we just drop them from the template?