Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,44 @@ objects:
oc create -f /tmp/cluster/insights-live.yaml || true
fi

# set up cloud-provider-specific env vars
export KUBE_SSH_BASTION="$( oc --insecure-skip-tls-verify get node -l node-role.kubernetes.io/master -o 'jsonpath={.items[0].status.addresses[?(@.type=="ExternalIP")].address}' ):22"
# set up SSH for the e2e tests + for this script
function setup_ssh_bastion() {
echo "Setting up ssh bastion"
mkdir -p ~/.ssh
cp "${KUBE_SSH_KEY_PATH}" ~/.ssh/id_rsa
chmod 0600 ~/.ssh/id_rsa
if ! whoami &> /dev/null; then
if [[ -w /etc/passwd ]]; then
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd
fi
fi
curl https://raw.githubusercontent.com/eparis/ssh-bastion/master/deploy/deploy.sh | bash
for i in $(seq 0 60)
do
# AWS fills only .hostname of a service
BASTION_HOST=$(oc get service -n "${SSH_BASTION_NAMESPACE}" ssh-bastion -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
if [[ -n "${BASTION_HOST}" ]]; then break; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should be in the script, let's get Eric to fix his bastion

# Azure fills only .ip of a service. Use it as bastion host.
BASTION_HOST=$(oc get service -n "${SSH_BASTION_NAMESPACE}" ssh-bastion -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
if [[ -n "${BASTION_HOST}" ]]; then break; fi
echo "Waiting for SSH bastion load balancer service"
sleep 10
done
}

function bastion_ssh() {
retry 60 \
ssh -o LogLevel=error -o ConnectionAttempts=100 -o ConnectTimeout=30 -o StrictHostKeyChecking=no \
-o ProxyCommand="ssh -A -o StrictHostKeyChecking=no -o LogLevel=error -o ServerAliveInterval=30 -o ConnectionAttempts=100 -o ConnectTimeout=30 -W %h:%p core@${BASTION_HOST} 2>/dev/null" \
$@
}

export SSH_BASTION_NAMESPACE="testing-ssh-bastion"
export KUBE_SSH_KEY_PATH=/tmp/cluster/ssh-privatekey
setup_ssh_bastion
export KUBE_SSH_BASTION="${BASTION_HOST}:22"

# set up cloud-provider-specific env vars
if [[ "${CLUSTER_TYPE}" == "gcp" ]]; then
export GOOGLE_APPLICATION_CREDENTIALS="/tmp/cluster/gce.json"
export KUBE_SSH_USER=cloud-user
Expand Down Expand Up @@ -212,32 +247,6 @@ objects:
if [ "${RETRY_IGNORE_EXIT_CODE}" != "" ]; then return 0; else return "${rc}"; fi
}

function setup_ssh_bastion() {
echo "Setting up ssh bastion"
mkdir -p ~/.ssh || true
cp "${KUBE_SSH_KEY_PATH}" ~/.ssh/id_rsa
chmod 0600 ~/.ssh/id_rsa
if ! whoami &> /dev/null; then
if [ -w /etc/passwd ]; then
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd
fi
fi
curl https://raw.githubusercontent.com/eparis/ssh-bastion/master/deploy/deploy.sh | bash
for i in $(seq 0 60)
do
BASTION_HOST=$(oc get service -n openshift-ssh-bastion ssh-bastion -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
if [ ! -z "${BASTION_HOST}" ]; then break; fi
sleep 10
done
}

function bastion_ssh() {
retry 60 \
ssh -o LogLevel=error -o ConnectionAttempts=100 -o ConnectTimeout=30 -o StrictHostKeyChecking=no \
-o ProxyCommand="ssh -A -o StrictHostKeyChecking=no -o LogLevel=error -o ServerAliveInterval=30 -o ConnectionAttempts=100 -o ConnectTimeout=30 -W %h:%p core@${BASTION_HOST} 2>/dev/null" \
$@
}

function restore-cluster-state() {
echo "Placing file /etc/rollback-test with contents A"
cat > /tmp/machineconfig.yaml <<'EOF'
Expand Down Expand Up @@ -268,7 +277,6 @@ objects:

wait_for_machineconfigpool_to_apply

setup_ssh_bastion

echo "Make etcd backup on first master - /usr/local/bin/etcd-snapshot-backup.sh"
FIRST_MASTER=$(oc get node -l node-role.kubernetes.io/master= -o name | head -n1 | cut -d '/' -f 2)
Expand Down Expand Up @@ -343,17 +351,12 @@ objects:

if [[ "${rc}" == "1" ]]; then exit 1; fi

echo "Removing ssh-bastion"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be removed before the tests start, otherwise e2e test would fail - we're grabbing image from outside of the release image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of this PR is to run the tests with the bastion running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in that case we have to use some other images in eparis/ssh-bastion repo. See https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/3595/rehearse-3595-pull-ci-openshift-installer-master-e2e-restore-cluster-state/67, test [Feature:Platform][Smoke] Managed cluster should ensure pods use images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel] would fail with fail [github.com/openshift/origin/test/extended/operators/images.go:112]: May 10 09:33:15.658: Pods found with invalid container images not present in release payload: openshift-ssh-bastion/ssh-bastion-f4d5bbcbd-xcx68/ssh-bastion image=quay.io/eparis/ssh:latest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that one is going to fail. I think the test could skip openshift-ssh-bastion namespace if $KUBE_SSH_BASTION is set.

openshift/origin#23252

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later on we shoud be vendoring the manifests in bindata and run it in ssh-bastion namespace in BeforeAll

Once openshift/origin#23208 merged I could take care of updating this

oc delete project openshift-ssh-bastion

echo "Remove existing openshift-apiserver pods"
# This would ensure "Pod 'openshift-apiserver/apiserver-xxx' is not healthy: container openshift-apiserver has restarted more than 5 times" test won't fail
oc delete pod --all -n openshift-apiserver
}

function recover-from-etcd-quorum-loss() {
setup_ssh_bastion

# Machine API won't let the user to destroy the node which runs the controller
echo "Finding two masters to destroy"
MAPI_POD=$(oc get pod -l k8s-app=controller -n openshift-machine-api --no-headers -o name)
Expand Down Expand Up @@ -516,9 +519,6 @@ objects:
retry 10 oc wait pod/etcd-member-${master} -n openshift-etcd --for condition=Ready
done

echo "Removing ssh-bastion"
retry 10 oc delete project openshift-ssh-bastion

echo "Scale etcd-quorum guard"
retry 10 oc scale --replicas=3 deployment.apps/etcd-quorum-guard -n openshift-machine-config-operator

Expand Down Expand Up @@ -850,6 +850,9 @@ objects:
mkdir -p /tmp/artifacts/must-gather
queue /tmp/artifacts/must-gather/must-gather.log oc --insecure-skip-tls-verify adm must-gather --dest-dir /tmp/artifacts/must-gather

echo "Removing ssh-bastion ..."
queue /dev/null oc --insecure-skip-tls-verify --request-timeout=5s delete project testing-ssh-bastion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you doing this? Why isn't this being torn down by the cluster tear down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was requested in #4161 (review)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service load balancers aren't leaked.


echo "Waiting for logs ..."
wait

Expand Down