Skip to content

Switch to using cluster-hosted Ironic for worker deployments#659

Closed
hardys wants to merge 4 commits intoopenshift-metal3:masterfrom
hardys:pr635
Closed

Switch to using cluster-hosted Ironic for worker deployments#659
hardys wants to merge 4 commits intoopenshift-metal3:masterfrom
hardys:pr635

Conversation

@hardys
Copy link
Copy Markdown

@hardys hardys commented Jul 9, 2019

Testing with some additional patches to prove #635 in CI cc @imain

@metal3ci
Copy link
Copy Markdown

metal3ci commented Jul 9, 2019

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/864/

@hardys hardys removed the CI check this PR with CI label Jul 9, 2019
imain and others added 3 commits July 9, 2019 16:28
Switch to using the new ironic + baremetal operator pod.
This means we can keep the OCP specific pieces out of the upstream
repo, and prototype the integration which will ultimately be handled
via openshift/machine-api-operator#302

Also set the RHCOS image URL n the configmap based on the variable from
common.sh

Co-Authored-By: Ian Main <imain@redhat.com>
@hardys hardys changed the title WIP test of pr635 with additions Switch to using cluster-hosted Ironic for worker deployments Jul 9, 2019
@hardys hardys requested review from russellb and sadasu July 9, 2019 15:29
@hardys hardys added the CI check this PR with CI label Jul 9, 2019
@hardys hardys requested a review from derekhiggins July 9, 2019 15:58
@metal3ci
Copy link
Copy Markdown

metal3ci commented Jul 9, 2019

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/866/

Comment thread 08_deploy_bmo.sh

# Kill the dnsmasq container on the host since it is performing DHCP and doesn't
# allow our pod in openshift to take over.
for name in dnsmasq ironic-inspector ; do
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the rest of the ironic containers?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we kill all the containers we lose the ironic database and cleanup will not work. eg 'make clean' in dev-scripts will fail to remove the ironic managed masters.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, expanding the comment would help. I imagine all of this is going away shortly anyway once the bootstrap VM ironic work lands.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's correct, but then we'll also need to modify make clean to not rely on the terraform cleanup (until we figure out how to reimplement destroy)

Comment thread 08_deploy_bmo.sh
POD_NAME=$(oc --config ocp/auth/kubeconfig get pods -n openshift-machine-api | grep metal3-baremetal-operator | cut -f 1 -d ' ')

# Make sure our pod is running.
echo "Waiting for baremetal-operator pod to become ready"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wait? It shouldn't be required to block here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's probably optional. I haven't tested the whole thing without the wait yet though. I'll give it a go. On the other hand we could use this to catch errors early.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW in my testing this was helpful when one of the containers went into CrashLoopBackoff, it meant I could start investigating when it became clear the pod was wedged and not starting correctly.

I don't have a strong opinion but for development having some verbose monitoring of the pod startup is probably no bad thing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm 50/50 on it. I do kinda like how it can catch an issue with the pod, but for CI it's probably not useful.

Comment thread 08_deploy_bmo.sh
oc --config ocp/auth/kubeconfig adm --as system:admin policy add-scc-to-user privileged system:serviceaccount:openshift-machine-api:baremetal-operator
oc --config ocp/auth/kubeconfig apply -f ocp/deploy/operator_ironic.yaml -n openshift-machine-api

# Sadly I don't see a way to get this from the json..
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oc get pod -l name=metal3-baremetal-operator -n openshift-machine-api -o jsonpath="{.items[0].metadata.name}"

@hardys
Copy link
Copy Markdown
Author

hardys commented Jul 10, 2019

@imain how do you want to proceed with this PR vs #635 ?

I pushed this mainly to prove things in CI, but if you're happy with the approach of breaking the dependency on metal3-io/baremetal-operator#212 we could go ahead with this one? wdyt?

@russellb
Copy link
Copy Markdown
Member

russellb commented Jul 11, 2019 via email

@hardys
Copy link
Copy Markdown
Author

hardys commented Jul 11, 2019

@imain is updating #635 so I'll abandon this one

@hardys hardys closed this Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI check this PR with CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants