Skip to content

Conversation

@MarSik
Copy link
Contributor

@MarSik MarSik commented Jun 4, 2021

No description provided.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 4, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign spadgett after the PR has been reviewed.
You can assign the PR to them by writing /assign @spadgett in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from crawford and smarterclayton June 4, 2021 14:59
@MarSik MarSik changed the title PAO render initial draft WIP: PAO render initial draft Jun 4, 2021
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2021
@dhellmann
Copy link
Contributor

/priority important-soon

@openshift-ci openshift-ci bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 4, 2021

### Goals

1) PAO can execute outside of a cluster via podman/docker, consume user manifests from a directory and generate additional manifests to another (or the same) directory
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an issue with saying that we would use a container image, but it feels like a level of detail that fits better in the Implementation section, rather than in the Goals. Placing it here narrows our options in a way that doesn't feel necessary. I might rewrite this as

PAO can be run outside of a cluster to generate manifests to pass to the OpenShift installer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you are right, but I think it should be part of the design. How would the doc writers know what the expected user flow looks like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we talk about the delivery mechanism down in either the "Proposal" or "Implementation Details/Notes/Constraints" section, all of the information is still available for the tech writers.

1) PAO takes all PerformanceProfile yamls from the input directory and generates all the usual manifests as yaml files to the output directory
1) When a command line option `--enable-workload-partitioning` is passed to PAO in render mode, it also generates the necessary configuration for the `management` partition to work properly
1) PAO at day 2 must not make any changes to the existing generated objects it owns
1) PAO sets the ownership references on the generated yamls so the generated files are properly linked to the PerformanceProfiles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 89-92 are implementation details. They could go into the general proposal section around line 101 with a little bit of text to tie them together into a paragraph, or they could go down to line 110.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I would treat these as acceptance criteria. Because it is the "API contract" of the flow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think you're being too specific, too early.

We usually try to set enough context about the requirements that a reviewer reading the details later can evaluate the choices we're making against the goals to see if they match or if we have gaps. If we set as a goal to implement something a specific way, with specific interfaces, then it's like we're not asking for feedback on the design.

So, for lines 89 and 90 we could say something like

  • The PAO running outside of a cluster can provide input to the OpenShift installer to enable workload partitioning during cluster deployment in a way that does not require changes to the OpenShift installer.

Then for 91 maybe

  • The configuration from the PAO that is passed to the installer is compatible with the configuration the PAO generates when running inside the cluster.

And after re-reading 92 with those proposed changes, I think it's OK to keep it as it is.

- 1) [optional] The cluster administrator will collect PAO must-gather output from a test cluster with no partitioning enabled
1) [optional] The cluster administrator will execute the Performance profile creator to get well formed PerformanceProfiles
1) The PerformanceProfiles are placed into a directory
1) PAO is executed using podman run -v /input/dir:/input -v /output/dir:/output registry/performance-addon-operator render [optional --enable-workload-partitioning]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1) PAO is executed using podman run -v /input/dir:/input -v /output/dir:/output registry/performance-addon-operator render [optional --enable-workload-partitioning]
1) PAO is executed using `podman run -v /input/dir:/input -v /output/dir:/output registry/performance-addon-operator render [optional --enable-workload-partitioning]`


This feature is activated locally by user and follows the lifecycle of PAO itself. There is no upgrade procedure to perform.

### Version Skew Strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about matching the PAO version and OCP version? I think we talked about that, but don't remember whether we came to a conclusion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the desired version could be passed into the render command so there could be validation against the running image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the desired version is part of the PAO pull spec, so the validation would basically compare whether a command line argument matches the compiled in version string. We do not support generating older versions, you need to use older PAO for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should list the skew as a risk. We're going to want to make sure the documentation at least explains that the versions have to match.


Similar to the `Drawbacks` section the `Alternatives` section is used to
highlight and record other possible approaches to delivering the value proposed
by an enhancement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did identify a couple of alternatives.

  1. Add PAO to the release payload so the render step could be run during bootstrapping. This was rejected because the PAO is not useful in all, or even most, clusters.
  2. Wait to enable partitioning until the PAO can be installed into the cluster on day 2. We may yet take this approach in the future, but we still consider rolling out the changes to enable partitioning on a running cluster potentially dangerous and certainly disruptive.

1) PAO is executed using podman run -v /input/dir:/input -v /output/dir:/output registry/performance-addon-operator render [optional --enable-workload-partitioning]
1) The files from both input and output directories are passed over to the installer

### Implementation Details/Notes/Constraints [optional]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we document the required inputs (performance profile(s), workload_partitioning CR) and the outputs manifests (kubeconfig, tuned etc). The performance profile CR will also need to be an output of running the render command.

This feature is about adding a `render` mode to PAO that would allow an admin to pre-compute all the Openshift manifests that are needed in a way that prevents any typing mistakes from happening.
These generated manifests can then be passed to the installer and later taken over by the PAO reconcile loop.

## Motivation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to providing a mechanism to enable to workload partioning feature, the application of the manifests at install time will reduce the e2e deployment time for solutions that use pao as there will be fewer reboots. Suggest we capture that as a motivation as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on some of the review feedback on #753, are we sure this will be true immediately? We may need to do some additional work in the MCO to get that benefit. We should still capture it as a goal, but not lose sight of the gap.


## Proposal

### User Stories
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrating into an external orchestration system

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 28, 2021

@MarSik: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/markdownlint 6769743 link /test markdownlint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dhellmann dhellmann removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 30, 2021
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 30d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 21, 2021
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 28, 2021
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Oct 5, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2021

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants