Skip to content

Added kustomization for performance#4

Merged
openshift-merge-robot merged 4 commits intoopenshift-kni:masterfrom
slintes:performance
Jan 14, 2020
Merged

Added kustomization for performance#4
openshift-merge-robot merged 4 commits intoopenshift-kni:masterfrom
slintes:performance

Conversation

@slintes
Copy link
Member

@slintes slintes commented Jan 10, 2020

Initial Implementation of the feature-deploy.sh script along with manifests for e2e and demo envs.

Deployment on a test cluster with FEATURES_ENVIRONMENT=demo make feature-deploy succeeded.

Heads up, since we don't have official public images of the performance operator yet, the demo env uses images from my quay.io account, that's why I consider this as WIP. Also that image contains unmerged code from openshift-kni/performance-addon-operators#34.

Also the feature-deploy.sh contains more imparative parts than we are aiming at. So consider this as a first iteration.

Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've got to figure out where this wait logic belongs in the blueprint workflow. i don't think we're following standard practice there.

other than that, this looks pretty solid. i just made some comments throughout.

kind: PerformanceProfile
metadata:
name: performance
namespace: openshift-performance-addon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a namespaced resource anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -0,0 +1,7 @@
apiVersion: v1
kind: Namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we'll have to put namespace in the performance-operator path. since the profiles aren't namespaced, i'm not aware of any value in making this external to the operator's path.

I know they respect creating namespaces before other resources now, but i don't know if that's respected if the dependent resources aren't in the same subdir. kubernetes-sigs/kustomize#65

in the future, if we need namespaces for multiple features, we might have to define potentially the same namespace in multiple subdirs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -0,0 +1,10 @@
apiVersion: kustomize.config.k8s.io/v1beta1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pick a more descriptive name for the "demo" environment. it's possible we might have more than one demo environment in teh future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, if you have an idea? ;)

spec:
channel: alpha
name: performance-addon-operators
source: performance-addon-operators-catalogsource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know how kustomize handles ordering here. Is it possible for the subscription to be posted before the operatorgroup? I think that would fail. we need to understand how these manifests that depend on one another are handled.

All i could find is that some well known resource types (like namespaces) would be posted before other resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they seem to have a priority list of objects to post first: kubernetes-sigs/kustomize#202 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, they don't, and rely on loosely coupling and retries.. :-/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can "they" (=kustomize?) do retries? kustomize just creates manifests locally. You need post them to the cluster yourself 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, so kustomize isn't going to do the retry loop for us. it's something we'll have to do.

for ordering, lets just see what order they get posted in. If this is an issue, maybe there's a way to use a transformer to influence ordering.

@@ -0,0 +1,8 @@
#!/bin/sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking through the kustomize examples and docs, this wait pattern doesn't appear to ever be used. That gives me the impression we're doing something that doesn't belong in the kustomize logic itself.

This operator resource is tricky though, because it has to come online before we can post the performance profile CR, otherwise that CRD isn't even registered.

any thoughts on how we can handle this without using a wait?

Maybe two sets of kustomize configs, one that installs all the preconditions like operators and another that contains all the actual configs that mutate the infrastructure? Then we'd have to build logic outside of kustomize to wait between those two steps

i'm unsure what the best practice is here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the staged approach is exactly what is being used indeed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And as Yuval just commented above, it seems the whole thing is just executed in a loop until the errors disappear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole thing is just executed in a loop until the errors disappear

by whom? By the deploy script?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

if [ -n "${OPENSHIFT_BUILD_NAMESPACE}" ]; then
echo "[INFO]: Openshift CI detected, deploying using image $FULL_REGISTRY_IMAGE"
FULL_REGISTRY_IMAGE="registry.svc.ci.openshift.org/${OPENSHIFT_BUILD_NAMESPACE}/stable:performance-addon-operators-registry"
cp feature-configs/e2e-gcp/performance-operator/operator_catalogsource.patch.yaml.in feature-configs/e2e-gcp/performance-operator/operator_catalogsource.patch.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, wow. that's kind of annoying. There's really no reasonable way around this using variables at build time that I can see. The approach you've done here is about the best we can do if we don't know the image name up front.

They have ways of transforming images for containers in pods, deployments, etc... but not for this custom catalog source resource.

Copy link
Member Author

@slintes slintes Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is very annoying

They have ways of transforming images for containers in pods, deployments

if you mean kustomize edit, that also just modifies manifests as we do here, so not much better.

But maybe I have a less uglier approach than this: using ${OPENSHIFT_BUILD_NAMESPACE} in the e2e manifest patch, and piping it through envsubst before oc apply. Will give it a try.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait...
We don't need to handle ${OPENSHIFT_BUILD_NAMESPACE} at all. We don't build the image here, we want to test an existing one :)
So due to lack of an existing upstream image I use my own here as well now, until we have official images available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still need the ability to pass in an image via an ENV VAR during make feature-deploy though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate for what usecase we still need env vars? I'm asking because if we don't need to support them, we could remove download and usage of kustomize and use oc -k .... But that would hinder usage of envsubst or similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what other options do we have for providing an image override for an image in a private repo that we don't want ot link to in a public repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good usecase indeed. I'm not aware of any other option to do that.
So I'll leave the kustomize | oc apply flow and don't replace it with oc -k, so we can simply put envsubst in between.
fyi @MarSik

kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker-rt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create a separate role for sctp as it is orthogonal to worker-rt. So we need manifests for MCP worker-sctp that reference worker-sctp node role and update the MC here to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarSik is it fine if I do the sctp stuff in a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 13, 2020
Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

we'll want to follow up with logic that waits for the cluster to stabilize after deploying features.

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 13, 2020
@davidvossel
Copy link
Member

/hold

waiting on some manual test results first.

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm Indicates that a PR is ready to be merged. labels Jan 13, 2020
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
…dling (we don't build here at all)

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
@slintes
Copy link
Member Author

slintes commented Jan 13, 2020

rebased


# Label 1 worker node
echo "[INFO]:labeling 1 worker node with worker-rt"
node=$(${OC_TOOL} get nodes --selector='node-role.kubernetes.io/worker' -o name | head -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question. If my master nodes also have "worker" label - the first master will be chosen, is it ok?

[root@dell-r730-012 dev-scripts]# oc get nodes --selector='node-role.kubernetes.io/worker' -o name | head -1
node/master-0

[root@dell-r730-012 dev-scripts]# oc get node
NAME STATUS ROLES
master-0 Ready master,worker
master-1 Ready master,worker
master-2 Ready master,worker
worker-0 Ready worker
worker-1 Ready worker

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
@slintes slintes changed the title WIP Added kustomization for performance Added kustomization for performance Jan 14, 2020
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
fi
set -e

done
Copy link
Member

@yuvalk yuvalk Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic can be better:

  1. afaik apply will return 0 regardless of other errors that might occur
  2. even if such errors occur, there's no need to re-apply, just wait for them to settle.
  3. which this loop doesnt do (something like kubectl rollout status, which isn't available in oc[??] not sure whats the alternative). ie we would probably be out of that loop before features declarations are really "done"

other then that, we can use a single oc apply -k, if we create a folder referencing all the wanted features.
which IMHO would be 'cleaner' than running multiple kustomize commands. we can even have an overlay dir with some supported/most relevant variations (all, networking, performance, etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for review!

  1. no, this works: as long as the CR can't be posted because the CRD isn't there yet, it returns an error
  2. we don't want custom wait logic, that's why we iterate until all succeeds. Custom wait would be "check if CRD exists already"
  3. yes, we might want to have a sanity check that everything works as expected. I'd like to leave that out of scope of this PR, in order to get it merged asap, so that others can add more features on top of this
  4. about oc -k, see Added kustomization for performance #4 (comment)

@MarSik
Copy link
Member

MarSik commented Jan 14, 2020

/lgtm

@slintes
Copy link
Member Author

slintes commented Jan 14, 2020

let's see what happens

/lgtm
/approve
/hold cancel

@openshift-ci-robot
Copy link
Collaborator

@slintes: you cannot LGTM your own PR.

Details

In response to this:

let's see waht happens

/lgtm
/approve
/hold cancel

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 14, 2020
Copy link
Member

@davidvossel davidvossel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2020
@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@slintes
Copy link
Member Author

slintes commented Jan 14, 2020

Seems merge hangs because test status wasn't reported back to github. Let's try again...

/test all

@davidvossel
Copy link
Member

/test all

@openshift-merge-robot openshift-merge-robot merged commit 6f54dca into openshift-kni:master Jan 14, 2020
abraham2512 added a commit to abraham2512/cnf-features-deploy that referenced this pull request Sep 21, 2025
abraham2512 added a commit to abraham2512/cnf-features-deploy that referenced this pull request Sep 22, 2025
abraham2512 added a commit to abraham2512/cnf-features-deploy that referenced this pull request Nov 28, 2025
abraham2512 added a commit to abraham2512/cnf-features-deploy that referenced this pull request Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants