Added kustomization for performance#4
Added kustomization for performance#4openshift-merge-robot merged 4 commits intoopenshift-kni:masterfrom slintes:performance
Conversation
davidvossel
left a comment
There was a problem hiding this comment.
we've got to figure out where this wait logic belongs in the blueprint workflow. i don't think we're following standard practice there.
other than that, this looks pretty solid. i just made some comments throughout.
| kind: PerformanceProfile | ||
| metadata: | ||
| name: performance | ||
| namespace: openshift-performance-addon |
There was a problem hiding this comment.
this isn't a namespaced resource anymore.
| @@ -0,0 +1,7 @@ | |||
| apiVersion: v1 | |||
| kind: Namespace | |||
There was a problem hiding this comment.
i think we'll have to put namespace in the performance-operator path. since the profiles aren't namespaced, i'm not aware of any value in making this external to the operator's path.
I know they respect creating namespaces before other resources now, but i don't know if that's respected if the dependent resources aren't in the same subdir. kubernetes-sigs/kustomize#65
in the future, if we need namespaces for multiple features, we might have to define potentially the same namespace in multiple subdirs.
| @@ -0,0 +1,10 @@ | |||
| apiVersion: kustomize.config.k8s.io/v1beta1 | |||
There was a problem hiding this comment.
can we pick a more descriptive name for the "demo" environment. it's possible we might have more than one demo environment in teh future
There was a problem hiding this comment.
sure, if you have an idea? ;)
| spec: | ||
| channel: alpha | ||
| name: performance-addon-operators | ||
| source: performance-addon-operators-catalogsource |
There was a problem hiding this comment.
i don't know how kustomize handles ordering here. Is it possible for the subscription to be posted before the operatorgroup? I think that would fail. we need to understand how these manifests that depend on one another are handled.
All i could find is that some well known resource types (like namespaces) would be posted before other resources.
There was a problem hiding this comment.
Yeah, they seem to have a priority list of objects to post first: kubernetes-sigs/kustomize#202 (comment)
There was a problem hiding this comment.
Generally, they don't, and rely on loosely coupling and retries.. :-/
There was a problem hiding this comment.
how can "they" (=kustomize?) do retries? kustomize just creates manifests locally. You need post them to the cluster yourself 🤔
There was a problem hiding this comment.
right, so kustomize isn't going to do the retry loop for us. it's something we'll have to do.
for ordering, lets just see what order they get posted in. If this is an issue, maybe there's a way to use a transformer to influence ordering.
| @@ -0,0 +1,8 @@ | |||
| #!/bin/sh | |||
There was a problem hiding this comment.
looking through the kustomize examples and docs, this wait pattern doesn't appear to ever be used. That gives me the impression we're doing something that doesn't belong in the kustomize logic itself.
This operator resource is tricky though, because it has to come online before we can post the performance profile CR, otherwise that CRD isn't even registered.
any thoughts on how we can handle this without using a wait?
Maybe two sets of kustomize configs, one that installs all the preconditions like operators and another that contains all the actual configs that mutate the infrastructure? Then we'd have to build logic outside of kustomize to wait between those two steps
i'm unsure what the best practice is here
There was a problem hiding this comment.
I think the staged approach is exactly what is being used indeed.
There was a problem hiding this comment.
And as Yuval just commented above, it seems the whole thing is just executed in a loop until the errors disappear.
There was a problem hiding this comment.
the whole thing is just executed in a loop until the errors disappear
by whom? By the deploy script?
hack/feature-deploy.sh
Outdated
| if [ -n "${OPENSHIFT_BUILD_NAMESPACE}" ]; then | ||
| echo "[INFO]: Openshift CI detected, deploying using image $FULL_REGISTRY_IMAGE" | ||
| FULL_REGISTRY_IMAGE="registry.svc.ci.openshift.org/${OPENSHIFT_BUILD_NAMESPACE}/stable:performance-addon-operators-registry" | ||
| cp feature-configs/e2e-gcp/performance-operator/operator_catalogsource.patch.yaml.in feature-configs/e2e-gcp/performance-operator/operator_catalogsource.patch.yaml |
There was a problem hiding this comment.
ha, wow. that's kind of annoying. There's really no reasonable way around this using variables at build time that I can see. The approach you've done here is about the best we can do if we don't know the image name up front.
They have ways of transforming images for containers in pods, deployments, etc... but not for this custom catalog source resource.
There was a problem hiding this comment.
yes this is very annoying
They have ways of transforming images for containers in pods, deployments
if you mean kustomize edit, that also just modifies manifests as we do here, so not much better.
But maybe I have a less uglier approach than this: using ${OPENSHIFT_BUILD_NAMESPACE} in the e2e manifest patch, and piping it through envsubst before oc apply. Will give it a try.
There was a problem hiding this comment.
wait...
We don't need to handle ${OPENSHIFT_BUILD_NAMESPACE} at all. We don't build the image here, we want to test an existing one :)
So due to lack of an existing upstream image I use my own here as well now, until we have official images available.
There was a problem hiding this comment.
we still need the ability to pass in an image via an ENV VAR during make feature-deploy though.
There was a problem hiding this comment.
can you elaborate for what usecase we still need env vars? I'm asking because if we don't need to support them, we could remove download and usage of kustomize and use oc -k .... But that would hinder usage of envsubst or similar.
There was a problem hiding this comment.
what other options do we have for providing an image override for an image in a private repo that we don't want ot link to in a public repo?
There was a problem hiding this comment.
good usecase indeed. I'm not aware of any other option to do that.
So I'll leave the kustomize | oc apply flow and don't replace it with oc -k, so we can simply put envsubst in between.
fyi @MarSik
| kind: MachineConfig | ||
| metadata: | ||
| labels: | ||
| machineconfiguration.openshift.io/role: worker-rt |
There was a problem hiding this comment.
We should create a separate role for sctp as it is orthogonal to worker-rt. So we need manifests for MCP worker-sctp that reference worker-sctp node role and update the MC here to use it.
There was a problem hiding this comment.
@MarSik is it fine if I do the sctp stuff in a separate PR?
davidvossel
left a comment
There was a problem hiding this comment.
/lgtm
we'll want to follow up with logic that waits for the cluster to stabilize after deploying features.
|
/hold waiting on some manual test results first. |
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
…dling (we don't build here at all) Signed-off-by: Marc Sluiter <msluiter@redhat.com>
|
rebased |
hack/feature-deploy.sh
Outdated
|
|
||
| # Label 1 worker node | ||
| echo "[INFO]:labeling 1 worker node with worker-rt" | ||
| node=$(${OC_TOOL} get nodes --selector='node-role.kubernetes.io/worker' -o name | head -1) |
There was a problem hiding this comment.
Small question. If my master nodes also have "worker" label - the first master will be chosen, is it ok?
[root@dell-r730-012 dev-scripts]# oc get nodes --selector='node-role.kubernetes.io/worker' -o name | head -1
node/master-0
[root@dell-r730-012 dev-scripts]# oc get node
NAME STATUS ROLES
master-0 Ready master,worker
master-1 Ready master,worker
master-2 Ready master,worker
worker-0 Ready worker
worker-1 Ready worker
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
| fi | ||
| set -e | ||
|
|
||
| done |
There was a problem hiding this comment.
I think this logic can be better:
- afaik apply will return 0 regardless of other errors that might occur
- even if such errors occur, there's no need to re-apply, just wait for them to settle.
- which this loop doesnt do (something like kubectl rollout status, which isn't available in oc[??] not sure whats the alternative). ie we would probably be out of that loop before features declarations are really "done"
other then that, we can use a single oc apply -k, if we create a folder referencing all the wanted features.
which IMHO would be 'cleaner' than running multiple kustomize commands. we can even have an overlay dir with some supported/most relevant variations (all, networking, performance, etc)
There was a problem hiding this comment.
Thx for review!
- no, this works: as long as the CR can't be posted because the CRD isn't there yet, it returns an error
- we don't want custom wait logic, that's why we iterate until all succeeds. Custom wait would be "check if CRD exists already"
- yes, we might want to have a sanity check that everything works as expected. I'd like to leave that out of scope of this PR, in order to get it merged asap, so that others can add more features on top of this
- about
oc -k, see Added kustomization for performance #4 (comment)
|
/lgtm |
|
let's see what happens /lgtm |
|
@slintes: you cannot LGTM your own PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidvossel, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Seems merge hangs because test status wasn't reported back to github. Let's try again... /test all |
|
/test all |
Initial Implementation of the
feature-deploy.shscript along with manifests for e2e and demo envs.Deployment on a test cluster with
FEATURES_ENVIRONMENT=demo make feature-deploysucceeded.Heads up, since we don't have official public images of the performance operator yet, the demo env uses images from my quay.io account, that's why I consider this as WIP. Also that image contains unmerged code from openshift-kni/performance-addon-operators#34.
Also the
feature-deploy.shcontains more imparative parts than we are aiming at. So consider this as a first iteration.