Added kustomization for performance by slintes · Pull Request #4 · openshift-kni/cnf-features-deploy

slintes · 2020-01-10T14:09:36Z

Initial Implementation of the feature-deploy.sh script along with manifests for e2e and demo envs.

Deployment on a test cluster with FEATURES_ENVIRONMENT=demo make feature-deploy succeeded.

Heads up, since we don't have official public images of the performance operator yet, the demo env uses images from my quay.io account, that's why I consider this as WIP. Also that image contains unmerged code from openshift-kni/performance-addon-operators#34.

Also the feature-deploy.sh contains more imparative parts than we are aiming at. So consider this as a first iteration.

davidvossel

we've got to figure out where this wait logic belongs in the blueprint workflow. i don't think we're following standard practice there.

other than that, this looks pretty solid. i just made some comments throughout.

davidvossel · 2020-01-10T20:17:59Z

feature-configs/base/performance-profile/performance_profile.yaml

+kind: PerformanceProfile
+metadata:
+  name: performance
+  namespace: openshift-performance-addon


this isn't a namespaced resource anymore.

davidvossel · 2020-01-10T20:20:44Z

feature-configs/base/namespace/namespace.yaml

@@ -0,0 +1,7 @@
+apiVersion: v1
+kind: Namespace


i think we'll have to put namespace in the performance-operator path. since the profiles aren't namespaced, i'm not aware of any value in making this external to the operator's path.

I know they respect creating namespaces before other resources now, but i don't know if that's respected if the dependent resources aren't in the same subdir. kubernetes-sigs/kustomize#65

in the future, if we need namespaces for multiple features, we might have to define potentially the same namespace in multiple subdirs.

davidvossel · 2020-01-10T20:22:00Z

feature-configs/demo/performance-operator/kustomization.yaml

@@ -0,0 +1,10 @@
+apiVersion: kustomize.config.k8s.io/v1beta1


can we pick a more descriptive name for the "demo" environment. it's possible we might have more than one demo environment in teh future

sure, if you have an idea? ;)

davidvossel · 2020-01-10T20:51:13Z

feature-configs/base/performance-operator/operator_subscription.yaml

+spec:
+  channel: alpha
+  name: performance-addon-operators
+  source: performance-addon-operators-catalogsource


i don't know how kustomize handles ordering here. Is it possible for the subscription to be posted before the operatorgroup? I think that would fail. we need to understand how these manifests that depend on one another are handled.

All i could find is that some well known resource types (like namespaces) would be posted before other resources.

Yeah, they seem to have a priority list of objects to post first: kubernetes-sigs/kustomize#202 (comment)

Generally, they don't, and rely on loosely coupling and retries.. :-/

how can "they" (=kustomize?) do retries? kustomize just creates manifests locally. You need post them to the cluster yourself 🤔

right, so kustomize isn't going to do the retry loop for us. it's something we'll have to do.

for ordering, lets just see what order they get posted in. If this is an issue, maybe there's a way to use a transformer to influence ordering.

davidvossel · 2020-01-10T20:55:43Z

feature-configs/demo/performance-operator/wait_for_it.sh

@@ -0,0 +1,8 @@
+#!/bin/sh


looking through the kustomize examples and docs, this wait pattern doesn't appear to ever be used. That gives me the impression we're doing something that doesn't belong in the kustomize logic itself.

This operator resource is tricky though, because it has to come online before we can post the performance profile CR, otherwise that CRD isn't even registered.

any thoughts on how we can handle this without using a wait?

Maybe two sets of kustomize configs, one that installs all the preconditions like operators and another that contains all the actual configs that mutate the infrastructure? Then we'd have to build logic outside of kustomize to wait between those two steps

i'm unsure what the best practice is here

I think the staged approach is exactly what is being used indeed.

And as Yuval just commented above, it seems the whole thing is just executed in a loop until the errors disappear.

the whole thing is just executed in a loop until the errors disappear

by whom? By the deploy script?

davidvossel · 2020-01-10T21:07:01Z

hack/feature-deploy.sh

+if [ -n "${OPENSHIFT_BUILD_NAMESPACE}" ]; then
+        echo "[INFO]: Openshift CI detected, deploying using image $FULL_REGISTRY_IMAGE"
+        FULL_REGISTRY_IMAGE="registry.svc.ci.openshift.org/${OPENSHIFT_BUILD_NAMESPACE}/stable:performance-addon-operators-registry"
+        cp feature-configs/e2e-gcp/performance-operator/operator_catalogsource.patch.yaml.in feature-configs/e2e-gcp/performance-operator/operator_catalogsource.patch.yaml


ha, wow. that's kind of annoying. There's really no reasonable way around this using variables at build time that I can see. The approach you've done here is about the best we can do if we don't know the image name up front.

They have ways of transforming images for containers in pods, deployments, etc... but not for this custom catalog source resource.

yes this is very annoying

They have ways of transforming images for containers in pods, deployments

if you mean kustomize edit, that also just modifies manifests as we do here, so not much better.

But maybe I have a less uglier approach than this: using ${OPENSHIFT_BUILD_NAMESPACE} in the e2e manifest patch, and piping it through envsubst before oc apply. Will give it a try.

wait...
We don't need to handle ${OPENSHIFT_BUILD_NAMESPACE} at all. We don't build the image here, we want to test an existing one :)
So due to lack of an existing upstream image I use my own here as well now, until we have official images available.

we still need the ability to pass in an image via an ENV VAR during make feature-deploy though.

can you elaborate for what usecase we still need env vars? I'm asking because if we don't need to support them, we could remove download and usage of kustomize and use oc -k .... But that would hinder usage of envsubst or similar.

what other options do we have for providing an image override for an image in a private repo that we don't want ot link to in a public repo?

good usecase indeed. I'm not aware of any other option to do that.
So I'll leave the kustomize | oc apply flow and don't replace it with oc -k, so we can simply put envsubst in between.
fyi @MarSik

MarSik · 2020-01-13T09:44:18Z

feature-configs/demo/sctp/sctp_module_mc.patch.yaml

+kind: MachineConfig
+metadata:
+  labels:
+    machineconfiguration.openshift.io/role: worker-rt


We should create a separate role for sctp as it is orthogonal to worker-rt. So we need manifests for MCP worker-sctp that reference worker-sctp node role and update the MC here to use it.

@MarSik is it fine if I do the sctp stuff in a separate PR?

davidvossel

/lgtm

we'll want to follow up with logic that waits for the cluster to stabilize after deploying features.

davidvossel · 2020-01-13T16:38:13Z

/hold

waiting on some manual test results first.

Signed-off-by: Marc Sluiter <msluiter@redhat.com>

…dling (we don't build here at all) Signed-off-by: Marc Sluiter <msluiter@redhat.com>

slintes · 2020-01-13T20:03:06Z

rebased

dshchedr · 2020-01-13T21:11:56Z

hack/feature-deploy.sh

+
+# Label 1 worker node
+echo "[INFO]:labeling 1 worker node with worker-rt"
+node=$(${OC_TOOL} get nodes --selector='node-role.kubernetes.io/worker' -o name | head -1)


Small question. If my master nodes also have "worker" label - the first master will be chosen, is it ok?

[root@dell-r730-012 dev-scripts]# oc get nodes --selector='node-role.kubernetes.io/worker' -o name | head -1
node/master-0

[root@dell-r730-012 dev-scripts]# oc get node
NAME STATUS ROLES
master-0 Ready master,worker
master-1 Ready master,worker
master-2 Ready master,worker
worker-0 Ready worker
worker-1 Ready worker

Signed-off-by: Marc Sluiter <msluiter@redhat.com>

yuvalk · 2020-01-14T10:30:14Z

hack/feature-deploy.sh

+    fi
+    set -e
+
+  done


I think this logic can be better:

afaik apply will return 0 regardless of other errors that might occur

even if such errors occur, there's no need to re-apply, just wait for them to settle.

which this loop doesnt do (something like kubectl rollout status, which isn't available in oc[??] not sure whats the alternative). ie we would probably be out of that loop before features declarations are really "done"

other then that, we can use a single oc apply -k, if we create a folder referencing all the wanted features.
which IMHO would be 'cleaner' than running multiple kustomize commands. we can even have an overlay dir with some supported/most relevant variations (all, networking, performance, etc)

Thx for review!

no, this works: as long as the CR can't be posted because the CRD isn't there yet, it returns an error

we don't want custom wait logic, that's why we iterate until all succeeds. Custom wait would be "check if CRD exists already"

yes, we might want to have a sanity check that everything works as expected. I'd like to leave that out of scope of this PR, in order to get it merged asap, so that others can add more features on top of this

about oc -k, see Added kustomization for performance #4 (comment)

MarSik · 2020-01-14T10:50:55Z

/lgtm

slintes · 2020-01-14T10:57:00Z

let's see what happens

/lgtm
/approve
/hold cancel

openshift-ci-robot · 2020-01-14T10:57:01Z

@slintes: you cannot LGTM your own PR.

Details

In response to this:

let's see waht happens

/lgtm
/approve
/hold cancel

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

davidvossel

/lgtm

openshift-ci-robot · 2020-01-14T13:29:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: davidvossel, slintes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [davidvossel]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

slintes · 2020-01-14T14:26:29Z

Seems merge hangs because test status wasn't reported back to github. Let's try again...

/test all

davidvossel · 2020-01-14T14:34:40Z

/test all

davidvossel reviewed Jan 10, 2020

View reviewed changes

MarSik reviewed Jan 13, 2020

View reviewed changes

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 13, 2020

davidvossel approved these changes Jan 13, 2020

View reviewed changes

openshift-ci-robot assigned davidvossel Jan 13, 2020

openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 13, 2020

openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed lgtm Indicates that a PR is ready to be merged. labels Jan 13, 2020

slintes added 2 commits January 13, 2020 21:01

Added kustomization for performance

7e013c9

Signed-off-by: Marc Sluiter <msluiter@redhat.com>

Moved namespace to operator, removed ${OPENSHIFT_BUILD_NAMESPACE} han…

335f4cc

…dling (we don't build here at all) Signed-off-by: Marc Sluiter <msluiter@redhat.com>

dshchedr reviewed Jan 13, 2020

View reviewed changes

Merged performace-operator and -profile, retry instead wait

c02b052

Signed-off-by: Marc Sluiter <msluiter@redhat.com>

slintes changed the title ~~WIP Added kustomization for performance~~ Added kustomization for performance Jan 14, 2020

Separated worker-rt labeling and deployment, improved README

e2b912d

Signed-off-by: Marc Sluiter <msluiter@redhat.com>

yuvalk reviewed Jan 14, 2020

View reviewed changes

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 14, 2020

davidvossel approved these changes Jan 14, 2020

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 14, 2020

openshift-merge-robot merged commit 6f54dca into openshift-kni:master Jan 14, 2020

abraham2512 added a commit to abraham2512/cnf-features-deploy that referenced this pull request Sep 21, 2025

update openshift-kni#4

b155928

abraham2512 added a commit to abraham2512/cnf-features-deploy that referenced this pull request Sep 22, 2025

update CVE fixes openshift-kni#4

7035574

abraham2512 added a commit to abraham2512/cnf-features-deploy that referenced this pull request Nov 28, 2025

update openshift-kni#4

a01731b

abraham2512 added a commit to abraham2512/cnf-features-deploy that referenced this pull request Nov 28, 2025

update CVE fixes openshift-kni#4

7801a9a

+                  fi
+                  set -e
+                done

Conversation

slintes commented Jan 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davidvossel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slintes Jan 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidvossel left a comment

Choose a reason for hiding this comment

Uh oh!

davidvossel commented Jan 13, 2020

Uh oh!

slintes commented Jan 13, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuvalk Jan 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarSik commented Jan 14, 2020

Uh oh!

slintes commented Jan 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

slintes commented Jan 10, 2020 •

edited

Loading

slintes Jan 13, 2020 •

edited

Loading

yuvalk Jan 14, 2020 •

edited

Loading

slintes commented Jan 14, 2020 •

edited

Loading