Switch to using cluster-hosted Ironic for worker deployments#659
Switch to using cluster-hosted Ironic for worker deployments#659hardys wants to merge 4 commits intoopenshift-metal3:masterfrom
Conversation
|
Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/864/ |
Switch to using the new ironic + baremetal operator pod.
This means we can keep the OCP specific pieces out of the upstream repo, and prototype the integration which will ultimately be handled via openshift/machine-api-operator#302 Also set the RHCOS image URL n the configmap based on the variable from common.sh Co-Authored-By: Ian Main <imain@redhat.com>
|
Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/866/ |
|
|
||
| # Kill the dnsmasq container on the host since it is performing DHCP and doesn't | ||
| # allow our pod in openshift to take over. | ||
| for name in dnsmasq ironic-inspector ; do |
There was a problem hiding this comment.
What about the rest of the ironic containers?
There was a problem hiding this comment.
If we kill all the containers we lose the ironic database and cleanup will not work. eg 'make clean' in dev-scripts will fail to remove the ironic managed masters.
There was a problem hiding this comment.
OK, expanding the comment would help. I imagine all of this is going away shortly anyway once the bootstrap VM ironic work lands.
There was a problem hiding this comment.
Yeah that's correct, but then we'll also need to modify make clean to not rely on the terraform cleanup (until we figure out how to reimplement destroy)
| POD_NAME=$(oc --config ocp/auth/kubeconfig get pods -n openshift-machine-api | grep metal3-baremetal-operator | cut -f 1 -d ' ') | ||
|
|
||
| # Make sure our pod is running. | ||
| echo "Waiting for baremetal-operator pod to become ready" |
There was a problem hiding this comment.
Why do we need to wait? It shouldn't be required to block here
There was a problem hiding this comment.
Yeah it's probably optional. I haven't tested the whole thing without the wait yet though. I'll give it a go. On the other hand we could use this to catch errors early.
There was a problem hiding this comment.
FWIW in my testing this was helpful when one of the containers went into CrashLoopBackoff, it meant I could start investigating when it became clear the pod was wedged and not starting correctly.
I don't have a strong opinion but for development having some verbose monitoring of the pod startup is probably no bad thing?
There was a problem hiding this comment.
yeah I'm 50/50 on it. I do kinda like how it can catch an issue with the pod, but for CI it's probably not useful.
| oc --config ocp/auth/kubeconfig adm --as system:admin policy add-scc-to-user privileged system:serviceaccount:openshift-machine-api:baremetal-operator | ||
| oc --config ocp/auth/kubeconfig apply -f ocp/deploy/operator_ironic.yaml -n openshift-machine-api | ||
|
|
||
| # Sadly I don't see a way to get this from the json.. |
There was a problem hiding this comment.
oc get pod -l name=metal3-baremetal-operator -n openshift-machine-api -o jsonpath="{.items[0].metadata.name}"
|
@imain how do you want to proceed with this PR vs #635 ? I pushed this mainly to prove things in CI, but if you're happy with the approach of breaking the dependency on metal3-io/baremetal-operator#212 we could go ahead with this one? wdyt? |
|
It’s fine. It can stay for the dev scripts. A worker couldn’t deploy until
this is done anyway.
|
Testing with some additional patches to prove #635 in CI cc @imain