Skip to content

Conversation

@stbenjam
Copy link
Member

@stbenjam stbenjam commented May 9, 2019

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is still the right registry, even for a public release. Do you know?

@russellb
Copy link
Member

russellb commented May 9, 2019

looks like it should be on quay.io/openshift-release-dev/ocp-release

@stbenjam
Copy link
Member Author

stbenjam commented May 9, 2019

It does work with this url, but seems to end up redirecting to quay in the end

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching will drop the need for the extra CI registry pull secret

@intlabs
Copy link

intlabs commented May 9, 2019

Switching will drop the need for the extra CI registry pull secret

Confirmed:
$ docker pull quay.io/openshift-release-dev/ocp-release:4.1.0-rc.1
4.1.0-rc.1: Pulling from openshift-release-dev/ocp-release
5dfdb3f0bcc0: Pull complete
99f178453a43: Pull complete
f9fa6d5c56a6: Pull complete
d252d93af3e8: Pull complete
a9607047a865: Pull complete
Digest: sha256:be61a9ec132118e41a417b347242361d9cc96b1a73753e121dc7a74a1905baea
Status: Downloaded newer image for quay.io/openshift-release-dev/ocp-release:4.1.0-rc.1

@russellb
Copy link
Member

We should also look at #401 - if that's ready, would be good to base our next update on this new approach

@stbenjam
Copy link
Member Author

I’ve been watching the work on that. I agree, would like to use that process. Looks like a few things left to do on the checklist at the top of the PR before it’s ready.

@stbenjam stbenjam changed the title Pin release image to 4.1.0-rc1 Pin release image to 4.1.0-rc3 May 13, 2019
@stbenjam stbenjam changed the title Pin release image to 4.1.0-rc3 Changes required for 4.1.0-rc3 May 13, 2019
- ip: "{{ baremetal_network_cidr | nthhost(5) }}"
hostnames:
- "api"
- "api-int"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is still needed, at least for now - the masters are pointing at dnsmasq when they come up and they get their ignition config from api-int

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is fine as an interim solution in the VM case, but how do we avoid requiring a new DNS record for the API VIP in the baremetal case? cc @celebdor and @cybertron

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is to get the VIPs from install config and serve them in coredns (and make ignition fetch from VIP instead of api)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift-metal3/kni-installer#77 makes api-int available from coredns. We'll still need the change to point ignition at the VIP instead of a DNS name though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm looking in the right place, doesn't ignition happen using HTTPS, verifying the hostname and certificate? The endpoint is also not platform specific: how do we make that work?

https://github.com/openshift/installer/blob/master/pkg/asset/ignition/machine/node.go#L23-L25

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to tweak the certificate so it includes the VIP in addition to the DNS name. Some work has been done in that direction by other teams and I need to look at getting it integrated into our fork of the installer.

@stbenjam
Copy link
Member Author

I was able to get a successful 4.1.0 rc.3 install, using the PR's
listed below. I didn't do any deep testing but my cluster looks
functional.

@hardys
Copy link

hardys commented May 15, 2019

Ah sorry this will need a rebase since vm-setup moved to metal3-dev-env - I wonder if we should parameterize some of vm-setup/roles/common/tasks/main.yml so that we can control the DNS things which are specific to OpenShift via the vm_setup_vars.yml in dev-scripts?

@markmc
Copy link
Contributor

markmc commented May 15, 2019

@stbenjam want to add a checklist at the top listing what needs to be done before we can merge? Finding it hard to follow ...

@hardys
Copy link

hardys commented May 15, 2019

@stbenjam want to add a checklist at the top listing what needs to be done before we can merge? Finding it hard to follow ...

Sounds good, but I think the main remaining item is to resolve the api-int DNS solution, what's proposed here will only work for VMs AFAICS and we want to avoid requiring those testing baremetal to create another upstream DNS record for the "internal" VIP.

So we need to figure out how ignition can resolve the IP in both VM and baremetal cases, which as mentioned by @stbenjam is tricky as I don't think we can just replace the hostname with an IP, and the resolv.conf additions don't happen until after the ignition/pivot happens, so we have a chicken/egg problem?

@stbenjam
Copy link
Member Author

I updated the description with the checklist. I'll rebase this PR when I'm at my workstation. Essentially,
nothing's changed since #523 (comment) - but there are a number of open and WIP PR's related to the api-int change. It's not clear to me if @cybertron @celebdor want to get something else in, or if they are ok with only the 3 listed above which worked for me.

@markmc
Copy link
Contributor

markmc commented May 15, 2019

@stbenjam want to add a checklist at the top listing what needs to be done before we can merge? Finding it hard to follow ...

Sounds good, but I think the main remaining item is to resolve the api-int DNS solution, what's proposed here will only work for VMs AFAICS and we want to avoid requiring those testing baremetal to create another upstream DNS record for the "internal" VIP.

So we need to figure out how ignition can resolve the IP in both VM and baremetal cases, which as mentioned by @stbenjam is tricky as I don't think we can just replace the hostname with an IP, and the resolv.conf additions don't happen until after the ignition/pivot happens, so we have a chicken/egg problem?

Thanks for the explanation.

I suggest we proceed with rebasing ASAP and add "remove the need for an api-int DNS entry on baremetal" to our backlog?

@stbenjam
Copy link
Member Author

@hardys I rebased, and moved the api-int change here: metal3-io/metal3-dev-env#8

That's a bit specific to openshift, isn't it? Maybe we should parametrize those records in that role?

@hardys
Copy link

hardys commented May 15, 2019

@hardys I rebased, and moved the api-int change here: metal3-io/metal3-dev-env#8

That's a bit specific to openshift, isn't it? Maybe we should parametrize those records in that role?

Thanks! And yes, as mentioned in my earlier comment we probably should parameterize/sanitize some of that but I missed it in my earlier refactoring - I can take a look unless you've already started something?

@hardys
Copy link

hardys commented May 15, 2019

I just pushed #536 which would enable us to add api-int via a variable only to dev-scripts, it will need to be merged together with metal3-io/metal3-dev-env#9

stbenjam added 2 commits May 15, 2019 09:47
  - update structure of cluster-image-registry-operator config object to match openshift/release#3548
  - host-etcd moved from kube-system namespace to openshift-etcd
@markmc
Copy link
Contributor

markmc commented May 16, 2019

We should be able to proceed with registry.svc.ci.openshift.org/kni/release:4.1.0-rc.3-kni.0 as a temporary workaround for #537 - see https://gist.github.com/markmc/f8a78d7cea7252a9e0f29dadcfaa1253 for how it was built

@stbenjam
Copy link
Member Author

Updated the pin, and I'm testing a new deployment right now. Thanks!

@stbenjam
Copy link
Member Author

It comes up fine, but hitting an issue on worker deployments. Based on discussion in slack, it sounds like it's pre-existing on 4.0 as well. The worker node shows "Node not found hook failed: Port 00:f6:ff:4d:bb:31 already exists" when it boots

@russellb
Copy link
Member

It comes up fine, but hitting an issue on worker deployments. Based on discussion in slack, it sounds like it's pre-existing on 4.0 as well. The worker node shows "Node not found hook failed: Port 00:f6:ff:4d:bb:31 already exists" when it boots

This isn't the real error. You see this because we haven't actually provisioned the worker host at all, and the baremetal-operator powers on hosts that aren't provisioned if we have online: true set on them. It's questionable whether that behavior makes sense, but it's behaving as expected.

The real error is up in the cluster-api integration that prevented the worker from being provisioned. See #537 (comment)

@russellb russellb merged commit 0bca6a6 into openshift-metal3:master May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants