Skip to content

Commit 3013859

Browse files
committed
ci-operator/templates/openshift/installer/cluster-launch-installer-upi-e2e: Gather on bootstrap failure
Currently UPI bootstrap failures die with [1]: time="2019-08-13T20:38:56Z" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource" time="2019-08-13T20:39:12Z" level=info msg="Use the following commands to gather logs from the cluster" time="2019-08-13T20:39:12Z" level=info msg="openshift-install gather bootstrap --help" time="2019-08-13T20:39:12Z" level=fatal msg="waiting for Kubernetes API: context deadline exceeded" but don't actually gather those recommended logs [2]. With this commit, I've added a setup-script global GATHER_BOOTSTRAP_ARGS which the various per-platform flows can populate as they create resources. Then if the wait-for-bootstrap command dies and that variable is non-empty, we run the gather to store the logs in the installer's artifact directory. We can't use: --master ${CONTROL_PLANE_0_IP},${CONTROL_PLANE_1_IP},${CONTROL_PLANE_2_IP} because the backing installer code [3] uses StringArrayVar [4], which does not perform StringSliceVar's [5] comma-splitting. The GATHER_BOOTSTRAP_ARGS approach is a bit of a cludge, because the expansion in gather-bootstrap-and-fail is not quoted; relying instead on a lack of shell-sensitive characters in the IP arguments. That's likely fine in practice, but if we wanted to tighten it down we could switch the script from sh to Bash and use an array variable. For now; I'm punting that to future work. There's also crufy Terraform business around this in the teardown container, which I've left alone for now. [1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/4719/rehearse-4719-pull-ci-openshift-installer-master-e2e-aws-proxy/5/artifacts/e2e-aws-proxy/installer/.openshift_install.log [2]: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_release/4719/rehearse-4719-pull-ci-openshift-installer-master-e2e-aws-proxy/5/artifacts/e2e-aws-proxy/installer/ [3]: https://github.com/openshift/installer/blob/8f972b45987a32cc91bc61c39a727e9a1224693d/cmd/openshift-install/gather.go#L71 [4]: https://godoc.org/github.com/spf13/pflag#FlagSet.StringArrayVar [5]: https://godoc.org/github.com/spf13/pflag#FlagSet.StringSliceVar
1 parent be9e978 commit 3013859

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

ci-operator/templates/openshift/installer/cluster-launch-installer-upi-e2e.yaml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,8 @@ objects:
340340
value: ${BASE_DOMAIN}
341341
- name: SSH_PUB_KEY_PATH
342342
value: /etc/openshift-installer/ssh-publickey
343+
- name: SSH_PRIVATE_KEY_PATH
344+
value: /etc/openshift-installer/ssh-privatekey
343345
- name: PULL_SECRET_PATH
344346
value: /etc/openshift-installer/pull-secret
345347
- name: TFVARS_PATH
@@ -363,6 +365,15 @@ objects:
363365
trap 'rc=$?; if test "${rc}" -eq 0; then touch /tmp/setup-success; else touch /tmp/exit; fi; exit "${rc}"' EXIT
364366
trap 'CHILDREN=$(jobs -p); if test -n "${CHILDREN}"; then kill ${CHILDREN} && wait; fi' TERM
365367
368+
GATHER_BOOTSTRAP_ARGS=
369+
370+
function gather_bootstrap_and_fail() {
371+
if test -n "${GATHER_BOOTSTRAP_ARGS}"; then
372+
openshift-install --dir=/tmp/artifacts/installer gather bootstrap --key "${SSH_PRIVATE_KEY_PATH}" ${GATHER_BOOTSTRAP_ARGS}
373+
fi
374+
return 1
375+
}
376+
366377
while true; do
367378
if [[ -f /tmp/exit ]]; then
368379
echo "Another process exited" 2>&1
@@ -591,6 +602,7 @@ objects:
591602
592603
BOOTSTRAP_IP="$(aws cloudformation describe-stacks --stack-name "${CLUSTER_NAME}-bootstrap" \
593604
--query 'Stacks[].Outputs[?OutputKey == `BootstrapPublicIp`].OutputValue' --output text)"
605+
GATHER_BOOTSTRAP_ARGS="${GATHER_BOOTSTRAP_ARGS} --bootstrap ${BOOTSTRAP_IP}"
594606
595607
aws cloudformation create-stack \
596608
--stack-name "${CLUSTER_NAME}-control-plane" \
@@ -622,6 +634,7 @@ objects:
622634
CONTROL_PLANE_0_IP="$(echo "${CONTROL_PLANE_IPS}" | cut -d, -f1)"
623635
CONTROL_PLANE_1_IP="$(echo "${CONTROL_PLANE_IPS}" | cut -d, -f2)"
624636
CONTROL_PLANE_2_IP="$(echo "${CONTROL_PLANE_IPS}" | cut -d, -f3)"
637+
GATHER_BOOTSTRAP_ARGS="${GATHER_BOOTSTRAP_ARGS} --master ${CONTROL_PLANE_0_IP} --master ${CONTROL_PLANE_1_IP} --master ${CONTROL_PLANE_2_IP}"
625638
626639
for INDEX in 0 1 2
627640
do
@@ -708,7 +721,7 @@ objects:
708721
709722
echo "Waiting for bootstrap to complete"
710723
openshift-install --dir=/tmp/artifacts/installer wait-for bootstrap-complete &
711-
wait "$!"
724+
wait "$!" || gather_bootstrap_and_fail
712725
713726
echo "Bootstrap complete, destroying bootstrap resources"
714727
if [[ "${CLUSTER_TYPE}" == "aws" ]]; then

0 commit comments

Comments
 (0)