-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Run ssh bastion during openshift-tests #4161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be removed before the tests start, otherwise e2e test would fail - we're grabbing image from outside of the release image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point of this PR is to run the tests with the bastion running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in that case we have to use some other images in eparis/ssh-bastion repo. See https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_release/3595/rehearse-3595-pull-ci-openshift-installer-master-e2e-restore-cluster-state/67, test [Feature:Platform][Smoke] Managed cluster should ensure pods use images from our release image with proper ImagePullPolicy [Suite:openshift/conformance/parallel] would fail with fail [github.com/openshift/origin/test/extended/operators/images.go:112]: May 10 09:33:15.658: Pods found with invalid container images not present in release payload: openshift-ssh-bastion/ssh-bastion-f4d5bbcbd-xcx68/ssh-bastion image=quay.io/eparis/ssh:latest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that one is going to fail. I think the test could skip openshift-ssh-bastion namespace if $KUBE_SSH_BASTION is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later on we shoud be vendoring the manifests in bindata and run it in ssh-bastion namespace in BeforeAll
Once openshift/origin#23208 merged I could take care of updating this
|
/hold The PR looks good, I doubt it would make tests pass though. Could you add a space at the end of release/ci-operator/jobs/openshift/installer/openshift-installer-master-presubmits.yaml Line 757 in 53e161b
|
Not exactly empty space, but done. |
9de3374 to
a92564c
Compare
|
I am a bit confused. How ssh access brings control plane down? There are tests that simply need to access OS on nodes. Upstream has chosen ssh as the access method. We can either disable those tests and leave corresponding parts untested, or, as I am trying, to run the tests with ssh bastion. Access to nodes is mostly required by *) To be honest, I don't understand why we don't run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to verify that the project is fully removed before we continue? is it possible that we will leak a loadbalancer otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, oc delete waits for the namespace to be fully deleted.
|
@abhinavdahiya I am not following your comment, we should make every effort to run upstream tests. |
|
With openshift 4 we don't ship with ssh access to the clusters by default. And my understanding was that we stopped tests that required ssh access in the hope that we can move them off needing ssh access. |
a92564c to
cab5e61
Compare
|
/retest |
|
I am updating Eric's ssh bastion to have configurable namespace: eparis/ssh-bastion#10
|
358eede to
8676108
Compare
|
/hold cancel Looks fine to me |
|
Removed WIP patches for rehearsals. |
|
@jsafrane: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/assign @vrutkovs |
|
/test pj-rehearse |
vrutkovs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/lgtm waiting for rehearses to pass |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jsafrane, vrutkovs The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Rehearsal went OK. /assign @wking @smarterclayton |
|
/hold cancel |
|
/hold I don't want this on for everyone. Let me review. |
| queue /tmp/artifacts/must-gather/must-gather.log oc --insecure-skip-tls-verify adm must-gather --dest-dir /tmp/artifacts/must-gather | ||
| echo "Removing ssh-bastion ..." | ||
| queue /dev/null oc --insecure-skip-tls-verify --request-timeout=5s delete project testing-ssh-bastion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you doing this? Why isn't this being torn down by the cluster tear down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was requested in #4161 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Service load balancers aren't leaked.
This statement is not true, upstream has been moving away from SSH for a long time. In general, tests that use SSH should not use SSH unless there is no other alternative. Disruptive (most of which won't work on OpenShift anyway because they encode assumptions about the shape of the OS) is the biggest place where that happens, and even most of those can be done via pods. I would prefer to see the remaining tests moved away from SSH than turning this on globally for all test frameworks. I'm certainly ok with having this option be available for certain test suites (like disruptive) but opt-in is probably better, not on by default. I think for this PR if you remove the automatic opt-in and leave the function there, which specific tests can use by adding it before the test runs (specifically disruptive, which needs to get enabled). |
I haven't noticed any coordinated effort, either as KEP, issue or announced on kubernetes-dev. We (sig-storage) are still adding new tests that use ssh. I don't remember anyone reporting issues for that (with one exception, we use kubernetes-in-docker to tests CSI stuff and ssh does not work there; kubernetes/kubernetes#81751)
Node tests run as part of e2e-aws, I don't think it's a good idea to clutter the tests just for https://bugzilla.redhat.com/show_bug.cgi?id=1711600. |
|
You haven't noticed it but it has repeatedly been enforced at test level. You can't have conformance tests that require SSH. Large sets of tests have had SSH access removed. |
Any references? Again, no KEP, no announce, how are we supposed to know about SSH deprecation? |
|
We run almost every test in the suite. A year ago there were a hundred or more that depend on SSH. Now there are 3 or 4 in the core path. |
|
I am not arguing that there tests were not reworked but the communication around. How are we supposed to know about SSH deprecation? |
|
/close |
|
@jsafrane: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
| do | ||
| # AWS fills only .hostname of a service | ||
| BASTION_HOST=$(oc get service -n "${SSH_BASTION_NAMESPACE}" ssh-bastion -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') | ||
| if [[ -n "${BASTION_HOST}" ]]; then break; fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic should be in the script, let's get Eric to fix his bastion
|
We’re about to start running disruptive tests from e2e and that suite does need SSH |
|
We also might end up with an “other” suite for some of the disabled tests - that’s a good spot to include SSH as wel |
|
I'm going to pull this code out and fix it up as part of getting the disruptive suite going. |
Run ssh bastion and set up its env. variables before running
openshift-testsinopenshift_installertemplate. Some upstream e2e tests require SSH access to nodes.Partly fixes https://bugzilla.redhat.com/show_bug.cgi?id=1711600
cc @vrutkovs to double check
restore-cluster-stateandrecover-from-etcd-quorum-losschanges.Tested with:
Where
jsafrane/origin@mount-propagation-testis this PR: openshift/origin#22966Mount propagation tests (= which need ssh to nodes) succeed.