Skip to content

Conversation

@kikisdeliveryservice
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice commented Jun 12, 2019

This is the 3rd pr in a series to fold all MC* images into one.
PR does the following as step 3:

  • update dockerfile.machine-config-operator to generates all component
    images (mco/mcc/mcd/mcs/setup-etcd-env)
  • update dockerfile.machine-config-operator.rhel7 the same as sbove
  • update image-references, configmap and manifests to use MCO image
  • update MCO image flag in bootstrap.go to be required

TO-DO in follow-on prs:

  • removes other dockerfiles to have 1 dockerfile generating image
  • make another PR in release to only reference the above dockerfile
  • make another PR in installer later to remove individual refs in bootkube
  • make another PR here to remove the component image refs
  • remove component image refs in ART(??)

Related-to: #847 (merged)
Requires: openshift/installer#1847 (merged)
Related-to: #739 (issue will only close when final installer pr merges)

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 12, 2019
@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Jun 12, 2019

Ahhh:
go run hack/e2e.go -v -test --test_args='--ginkgo.focus=operator\sBuild\simage\smachine\-config\-server\sfrom\sthe\srepository$

So i think i have to fix the test since those images don't build anymore

Update: I think this is from the Makefile and/or i need openshift/installer#1847

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 12, 2019
@kikisdeliveryservice
Copy link
Contributor Author

/retest

@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Jun 14, 2019

I don't really understand the e2e so well, but it seems to be running through expecting all of the dockerfiles that I deleted. So I changed my Dockerfile back to Dockerfile.machine-config-operator and it built:

33
2019/06/14 20:57:52 Build machine-config-operator succeeded after 1m0s
34
2019/06/14 20:57:52 Tagging machine-config-operator into stable
35
2019/06/14 20:57:53 Ran for 1m48s

Perhaps I'll re-add the other ones and see if this gets me past this part of CI

@runcom
Copy link
Member

runcom commented Jun 14, 2019

Yep, we have jobs setup in openshift/release that must be changed as well, let's find a non breaking way to do that.

@kikisdeliveryservice
Copy link
Contributor Author

Yep, we have jobs setup in openshift/release that must be changed as well, let's find a non breaking way to do that.

sounds good!

@runcom
Copy link
Member

runcom commented Jun 17, 2019

so the non-breaking way to land support in openshift/release would be to modify https://github.com/openshift/release/blob/master/ci-operator/config/openshift/machine-config-operator/openshift-machine-config-operator-master.yaml#L13 to add the new Dockerfile while leaving the others here and there till we're able to make the switch

@runcom
Copy link
Member

runcom commented Jun 17, 2019

Also, I suggest building a custom payload from this PR and installing it on a cluster to debug why the bootstrap is failing.

@kikisdeliveryservice
Copy link
Contributor Author

Also, I suggest building a custom payload from this PR and installing it on a cluster to debug why the bootstrap is failing.

have a few leads on this going to investigate and post an update! 😃

@kikisdeliveryservice kikisdeliveryservice force-pushed the e-pluribus-unum branch 2 times, most recently from 25b1cca to 2f651b9 Compare June 18, 2019 01:40
@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Jun 18, 2019

so bootstrap is failing bc (from bootkube.log:

Jun 17 17:47:35 ip-10-0-8-130 bootkube.sh[22762]: F0617 17:47:35.618796       1 image.go:32] error: error: Unknown name requested, could not find machine-config-controller in UpdatePayload
Jun 17 17:47:35 ip-10-0-8-130 systemd[1]: bootkube.service: Main process exited, code=exited, status=255/n/a
Jun 17 17:47:35 ip-10-0-8-130 systemd[1]: bootkube.service: Failed with result 'exit-code'.
Jun 17 17:47:40 ip-10-0-8-130 systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Jun 17 17:47:40 ip-10-0-8-130 systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 60.
Jun 17 17:47:40 ip-10-0-8-130 systemd[1]: Stopped Bootstrap a Kubernetes cluster.

I'm pretty sure this is bc I overzealously removed all of the other images from image-references (in favor of the machine-config-operator) when the installer still requires them and I underestimated how many ping-pong prs i'd need to get this done.

@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Jun 18, 2019

Ok cool, I got past that error by reverting the image references.
Onto my new error:
Jun 18 02:19:42 ip-10-0-10-57 bootkube.sh[28210]: F0618 02:19:42.380734 1 bootstrap.go:104] error rendering bootstrap manifests: failed to execute template: template: manifests/bootstrap-pod-v2.yaml:9:20: executing "manifests/bootstrap-pod-v2.yaml" at <.Images.MachineConfigOperator>: can't evaluate field MachineConfigOperator in type *operator.Images

Pushing a fix for that MCO image in the manifests...

@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Jun 18, 2019

So that fixed the images!

Jun 18 03:09:47 ip-10-0-10-128 bootkube.sh[1407]: I0618 03:09:47.451667       1 bootstrap.go:83] Version: 4.0.0-alpha.0-537-g00a2fb42-dirty (00a2fb4249e10f5e6b30f2fd583969bdc7c7e23b)
Jun 18 03:09:47 ip-10-0-10-128 bootkube.sh[1407]: I0618 03:09:47.454182       1 bootstrap.go:142] manifests/machineconfigcontroller/controllerconfig.yaml
Jun 18 03:09:47 ip-10-0-10-128 bootkube.sh[1407]: I0618 03:09:47.457724       1 bootstrap.go:142] manifests/master.machineconfigpool.yaml
Jun 18 03:09:47 ip-10-0-10-128 bootkube.sh[1407]: I0618 03:09:47.458122       1 bootstrap.go:142] manifests/worker.machineconfigpool.yaml
Jun 18 03:09:47 ip-10-0-10-128 bootkube.sh[1407]: I0618 03:09:47.458483       1 bootstrap.go:142] manifests/bootstrap-pod-v2.yaml
Jun 18 03:09:47 ip-10-0-10-128 bootkube.sh[1407]: I0618 03:09:47.458949       1 bootstrap.go:142] manifests/machineconfigserver/csr-bootstrap-role-binding.yaml
Jun 18 03:09:47 ip-10-0-10-128 bootkube.sh[1407]: I0618 03:09:47.459366       1 bootstrap.go:142] manifests/machineconfigserver/kube-apiserver-serving-ca-configmap.yaml

Not sure about moving the setup-etcd-env into the MCO image, reverting that configmap change now bc I see:

Jun 18 03:09:52 ip-10-0-10-128 bootkube.sh[1407]: cca47d3c28b0b8ed8c87d66ab62f1f1dd575ac91b85d720ca4a915e6adf94586
Jun 18 03:09:52 ip-10-0-10-128 bootkube.sh[1407]: Waiting for etcd cluster...
Jun 18 03:19:55 ip-10-0-10-128 bootkube.sh[1407]: https://etcd-0.ci-op-vlhiz6r0-57a9f.origin-ci-int-aws.dev.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp 10.0.133.36:2379: connect: connection refused
Jun 18 03:19:55 ip-10-0-10-128 bootkube.sh[1407]: https://etcd-1.ci-op-vlhiz6r0-57a9f.origin-ci-int-aws.dev.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp 10.0.158.181:2379: connect: connection refused
Jun 18 03:19:55 ip-10-0-10-128 bootkube.sh[1407]: https://etcd-2.ci-op-vlhiz6r0-57a9f.origin-ci-int-aws.dev.rhcloud.com:2379 is unhealthy: failed to connect: dial tcp 10.0.135.171:2379: connect: connection refused
Jun 18 03:19:55 ip-10-0-10-128 bootkube.sh[1407]: Error: unhealthy cluster
Jun 18 03:19:55 ip-10-0-10-128 bootkube.sh[1407]: etcdctl failed. Retrying in 5 seconds...

Which eventually causes a timeout. Will pick this up tomorrow.

@kikisdeliveryservice kikisdeliveryservice force-pushed the e-pluribus-unum branch 2 times, most recently from 587e35e to 08cede1 Compare June 18, 2019 18:15
@cgwalters
Copy link
Member

I was testing out make deploy-server e.g. with this PR and it still seems to work fine, because...we still have all of the previous Dockerfiles there. And we need to do that to "ratchet" this change - i.e. we land this then change release/ART to stop building the others, then we drop them?

At that point it will also require changes to the Makefile but it's fine by me to keep it as is. So...
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 20, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, kikisdeliveryservice, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,kikisdeliveryservice,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kikisdeliveryservice
Copy link
Contributor Author

I was testing out make deploy-server e.g. with this PR and it still seems to work fine, because...we still have all of the previous Dockerfiles there. And we need to do that to "ratchet" this change - i.e. we land this then change release/ART to stop building the others, then we drop them?

At that point it will also require changes to the Makefile but it's fine by me to keep it as is. So...
/lgtm

Right! I'm going to pull together the next phase of WIP PRs across repos, link them to the issue and we can decide what else we need before pulling the trigger on any of them since they will be breaking changes. 😄

@kikisdeliveryservice
Copy link
Contributor Author

trying this upgrade test (that should work and has worked) again:
/test e2e-aws-upgrade

@openshift-merge-robot openshift-merge-robot merged commit 88147c9 into openshift:master Jun 20, 2019
kikisdeliveryservice added a commit to kikisdeliveryservice/installer that referenced this pull request Jun 21, 2019
Remove references to mcc/mcd/mcs and setupetcd images now that the MCO has 1
super-image containing all of the sub-images

Requires: openshift/machine-config-operator#850
Related-to: openshift/machine-config-operator#739
@kikisdeliveryservice kikisdeliveryservice deleted the e-pluribus-unum branch June 26, 2019 17:28
kikisdeliveryservice added a commit to kikisdeliveryservice/machine-config-operator that referenced this pull request Jul 2, 2019
Template & bootstrap.go need to reference the MCO image instead of the
old setupetcdenv image.

Required-for: openshift/installer#1875
Related-to: openshift#850
Related-to: openshift#739 (issue will only close when final installer pr merges)
kikisdeliveryservice added a commit to kikisdeliveryservice/machine-config-operator that referenced this pull request Jul 3, 2019
Template & bootstrap.go need to reference the MCO image instead of the
old setupetcdenv image.

Required-for: openshift/installer#1875
Related-to: openshift#850
Related-to: openshift#739 (issue will only close when final installer pr merges)
kikisdeliveryservice added a commit to kikisdeliveryservice/machine-config-operator that referenced this pull request Jul 3, 2019
Operator.go & bootstrap.go need to reference the MCO image instead of the
old setupetcdenv image.

Update template to add setupetcdenv entrypoint for MCO image.

Required-for: openshift/installer#1875
Related-to: openshift#850
Related-to: openshift#739 (issue will only close when final installer pr merges)
mandre added a commit to shiftstack/shiftstack-ci that referenced this pull request Jul 5, 2019
Since openshift/machine-config-operator#850 and
openshift/installer#1875 the installer will
take a single image for the machine config operator.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants