Skip to content

Conversation

@runcom
Copy link
Member

@runcom runcom commented Feb 11, 2019

Close #259

First and third commits are auxiliary, the second one is the one actually adding overrides for the CVO in the hack/cluster-push-* scripts.

Could you test it out? It works fine on AWS.

Signed-off-by: Antonio Murdaca <runcom@linux.com>
Signed-off-by: Antonio Murdaca <runcom@linux.com>
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 11, 2019
HACKING.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But...we want to be able to develop on the operator too right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting unmanaged for the operator itself is already enough to change the image for the operator and and the deployment itself w/o the CVO stomping on us. If you read some lines above, this is just for the server,controller and daemon (unless I didn't make that clear)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm retesting this flow on a new cluster though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, worked as I described, if you overrides the Deployment for the operator, you can edit it w/o doing anything else (nor scaling) as the CVO ignores any change to it. e.g you can make deploy-operator just fine or oc edit deployment/machine-config-operator and the CVO doesn't stomp on you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of using overrides to be clear. My objection is to recommending scaling the operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, it makes sense indeed, I just grabbed that from the section below as it's really the way to actually develop at least the daemon w/o the operator messing with it

HACKING.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's why the current make deploy- model bounces (scales down+up) the operator - because we don't wait to wait for it to notice the images change. (Though the slowness there must really be a bug right?)

I think this text is incorrect - we shouldn't recommend disabling the operator when developing things as some functionality depends on it...my recent osImageURL work depended on patching both the operator and daemon for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that indeed, we need to remove that wording from the section below as well though and since i.e. the daemon isn't CVO managed there's no real way to edit it without the operator replacing it with its own pristine copy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the daemon isn't CVO managed

The daemon is managed by the operator.

there's no real way to edit it without the operator replacing it with its own pristine copy

That's not true; the current make deploy- scripts have been doing this just fine right?

Again I think the problem is some sort of race condition where even changing images.json the operator doesn't notice the change right away and tries to change the deployment back based on its old cached copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no real way to edit it without the operator replacing it with its own pristine copy

That's not true; the current make deploy- scripts have been doing this just fine right?

I need to dig deeper but I think, from my testing, the current make deploy-* works just because you also patch the images and the operator re-syncing still result in the new built image but if you try editing something else (with the overrides) the daemon will be replaced again

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to jump in, but I'm confused by what you wrote @runcom I dont currently use the hack/* tools but when I work on MC*, I only scale down the CVO, I've never had to do anything to the MCO. Once I scale, my changes to each image "stick" and I have no problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I was talking about moving away from scaling the CVO to zero and just set some objects to unmanaged!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha ty!

@runcom
Copy link
Member Author

runcom commented Feb 11, 2019

Ok, I've dropped the last commit while I keep testing - rest should be ok to review and test I guess

@runcom
Copy link
Member Author

runcom commented Feb 11, 2019

I want to be sure that overrides vs disabling the CVO doesn't result in any development regression for any of you, so please, take this PR :P

@runcom
Copy link
Member Author

runcom commented Feb 11, 2019

flake

/test e2e-aws

# XXX: --type merge completely overrides any previous "overrides" array
# find a way to just append? json op: add isn't working at all
# if there's not an overrides array already, that's why we use merge
oc patch clusterversions.config.openshift.io/version --type merge -p '{"spec":{"overrides": [{"kind": "Deployment","name": "machine-config-operator", "namespace": "openshift-machine-config-operator", "unmanaged": true}, {"kind": "ConfigMap","name": "machine-config-operator-images", "namespace": "openshift-machine-config-operator", "unmanaged": true}]}}'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don't need to add the Deployment for the operator here or we're going to miss the wiring with the CVO (status report and whatnot). @cgwalters is this your understanding as well?

@runcom
Copy link
Member Author

runcom commented Feb 11, 2019

Still poking with this

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 11, 2019
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 3, 2019
@kikisdeliveryservice
Copy link
Contributor

/skip

@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 27, 2019
@openshift-ci-robot
Copy link
Contributor

@runcom: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws 6e50ddb link /test e2e-aws
ci/prow/e2e-aws-disruptive 6e50ddb link /test e2e-aws-disruptive
ci/prow/e2e-gcp-op 6e50ddb link /test e2e-gcp-op
ci/prow/e2e-aws-upgrade 6e50ddb link /test e2e-aws-upgrade
ci/prow/e2e-gcp-upgrade 6e50ddb link /test e2e-gcp-upgrade
ci/prow/e2e-vsphere 6e50ddb link /test e2e-vsphere

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kikisdeliveryservice
Copy link
Contributor

In an effort to clean up the MCO repo, closing old open PRs with no recent activity.

Feel free to reopen.

@kikisdeliveryservice kikisdeliveryservice added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tweak hack/cluster-push.sh to use CVO overrides

4 participants