Skip to content

Conversation

@kikisdeliveryservice
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice commented Mar 11, 2019

Test verifies that MCD updates MCP, daemons and
writes new ssh keys to node filesystems.

Closes #546

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 11, 2019
@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 11, 2019
@kikisdeliveryservice
Copy link
Contributor Author

hmm passed before added the done check but seeing lots of timeouts in ci that seem unrelated to mco.

/retest

@kikisdeliveryservice
Copy link
Contributor Author

/retest

@cgwalters
Copy link
Member

This is probably OK, but we could go the extra mile and actually verify that the SSH key ends up on the node's filesystem. Basically the API equivalent of oc rsh pods/machine-config-daemon-xyz ls /rootfs/var/roothome/.ssh/authorized_keys.

@kikisdeliveryservice
Copy link
Contributor Author

@cgwalters Yep, I was thinking doing a real check would make sense too! First pass was me poking around with the e2e for the first time. Ty!

@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Mar 14, 2019

Ok so I think I was checking the Done annotation incorrectly before. Trying something new.

Update: Ok cool, that worked!

@ashcrow
Copy link
Member

ashcrow commented Mar 14, 2019

level=info msg="Waiting up to 30m0s for the cluster to initialize..."
level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

@kikisdeliveryservice
Copy link
Contributor Author

@ashcrow we are hitting aws issues all over openshift rn.

@kikisdeliveryservice kikisdeliveryservice force-pushed the e2e-ssh branch 2 times, most recently from dd77dc6 to a52163b Compare March 19, 2019 02:02
@openshift-ci-robot openshift-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 19, 2019
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 19, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2019
@kikisdeliveryservice
Copy link
Contributor Author

For tomorrow: need to figure out a way to do this: (oc debug isn't working on my local cluster)

  • oc get pods -n openshift-machine-config-operator --field-selector spec.nodeNAME=node.Name
  • oc rsh -n openshift-machine-config-operator machine-config-daemon-zt85f
  • cat /rootfs/home/core/.ssh/authorized_keys to search file for test key

@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Mar 19, 2019

basically i have a worker node name. so i need to get the daemon pod name that goes along with it, then rsh into that daemon and grab the authorized_keys file and check that the test key is inside.

the way to get a list of all the mcd is here:

mcdList, err := cs.Pods("openshift-machine-config-operator").List(listOptions)

should i should be able to somehow filter that list to get that daemon pod whose spec.nodeName matches worker node.Name without having to use exec.Command to execute oc get pods -n openshift-machine-config-operator --field-selector ...

maybe some sort of filter for spec.nodeName on cs.Pods().List or cs.Pods().Get()?

@cgwalters
Copy link
Member

You want to use https://godoc.org/k8s.io/apimachinery/pkg/apis/meta/v1#ListOptions

@kikisdeliveryservice kikisdeliveryservice force-pushed the e2e-ssh branch 2 times, most recently from 30c1796 to 7923d26 Compare March 20, 2019 16:28
@runcom
Copy link
Member

runcom commented Mar 20, 2019

/approve

pending LGTM until this works and passes and others review this

(great work with this!)

@kikisdeliveryservice
Copy link
Contributor Author

thanks for your suggestions @runcom - they are really helpful!!!

ill squash once i get this working.

@kikisdeliveryservice
Copy link
Contributor Author

level=info msg="Waiting up to 30m0s for the cluster at https://api.ci-op-jfqwkc5d-57a9f.origin-ci-int-aws.dev.rhcloud.com:6443 to initialize..."
level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

looking through the logs the timeout might be due to InstallerControllerFailed (kube-apiserver), so going to try retest

/test e2e-aws-op

@kikisdeliveryservice
Copy link
Contributor Author

pretty sure grep is failing bc it's being done in a different shell than the oc rsh.

@kikisdeliveryservice
Copy link
Contributor Author

kikisdeliveryservice commented Mar 20, 2019

cc: @runcom

GOCACHE=off go test -timeout 50m -v${WHAT:+ -run="$WHAT"} ./test/e2e/
=== RUN   TestMCDToken
--- PASS: TestMCDToken (0.45s)
=== RUN   TestMCDeployed
--- PASS: TestMCDeployed (2418.86s)
=== RUN   TestUpdateSSH
panic: test timed out after 50m0s

goroutine 2196 [running]:
testing.(*M).startAlarm.func1()
	/usr/local/go/src/testing/testing.go:1240 +0xfc
created by time.goFunc
	/usr/local/go/src/time/sleep.go:172 +0x44

goroutine 1 [chan receive, 9 minutes]:

Because TestMCDeployed takes 40 minutes, it's blocking my test from ever passing bc only 10 minutes are left for all of the other tests in our suite to run.

I believe that my test might be working correctly as I see no error output on this pass. I will change TestMCDeployed from testing 10 to 2 MCs my branch test file to allow my PR more time to run. If it works, we'll have to decide what to do bc I've seen TestMCDeployed take ~40 min for 10 MCs consistently.

cc:@runcom

@kikisdeliveryservice
Copy link
Contributor Author

/retest

@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] e2e: first pass adding e2e for ssh via mcd [WIP] e2e: add ssh mcd test Mar 20, 2019
@kikisdeliveryservice kikisdeliveryservice force-pushed the e2e-ssh branch 3 times, most recently from 9011194 to 01f5991 Compare March 21, 2019 05:32
@kikisdeliveryservice
Copy link
Contributor Author

Awesome!! Got my e2e ssh test working! Will clean up commits tomorrow and pick up #563 to see if it fixes the time out issue.

@runcom
Copy link
Member

runcom commented Mar 21, 2019

Awesome!!!!!!

Test verifies that MCD updates MCP, daemons and
writes new ssh keys to node filesystems.
@kikisdeliveryservice kikisdeliveryservice changed the title [WIP] e2e: add ssh mcd test e2e: add ssh mcd test Mar 21, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2019
@kikisdeliveryservice
Copy link
Contributor Author

@runcom timeout issues fixed, I think your PR did the trick!

Copy link
Member

@runcom runcom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 21, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit bca2c1f into openshift:master Mar 21, 2019
@kikisdeliveryservice kikisdeliveryservice deleted the e2e-ssh branch March 21, 2019 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants