Skip to content

Conversation

@vrutkovs
Copy link
Contributor

@vrutkovs vrutkovs commented Jun 7, 2019

This test is equivalent to bash function in CI's installer template.
It runs in a dedicated subcommand and ensure cluster state is properly
being restored after etcd snapshot restore procedure.

TODO:

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 7, 2019
@vrutkovs vrutkovs force-pushed the disaster-recovery-tests branch from 0e86dec to 71f5c03 Compare June 7, 2019 10:50
@vrutkovs
Copy link
Contributor Author

vrutkovs commented Jun 7, 2019

/cc @hexfusion @runcom

Copy link
Contributor

@hexfusion hexfusion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this looks great, just few notes on first pass. I will test a bit and let you know if anything else pops up.

_, err = oc.Run("create").Args("-n", ns, "secret", "generic", "ssh-host-keys", "--from-file", secretKeyArgs).Output()
o.Expect(err).NotTo(o.HaveOccurred())
} else {
o.Expect(err).NotTo(o.HaveOccurred())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the scope of err just making sure you are intended to report on err from

_, err = oc.AdminKubeClient().CoreV1().Secrets(ns).Get("ssh-host-keys", metav1.GetOptions{}) on line 137

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's intended. "not found" error would be skipped and the secret would be created, thus two expects in different scopes.

Perhaps in-scope err should be renamed for clarity, wdyt?

}

func constructEtcdConnectionString(masters []string, proxy string) string {
//TODO vrutkovs: replace this nonsense with `etcdctl member list -w json ...`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will hack on this.

@vrutkovs vrutkovs changed the title tests: run restore etcd from snapshot test using a new subcommand WIP tests: run restore etcd from snapshot test using a new subcommand Jun 7, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2019
This test is equivalent to bash function in CI's installer template.
It runs in a dedicated subcommand and ensure cluster state is properly
being restored after etcd snapshot restore procedure.
@vrutkovs vrutkovs force-pushed the disaster-recovery-tests branch from 7b66bdf to 65bf6f1 Compare June 10, 2019 07:14
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vrutkovs
To complete the pull request process, please assign smarterclayton
You can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vrutkovs vrutkovs changed the title WIP tests: run restore etcd from snapshot test using a new subcommand tests: run restore etcd from snapshot test using a new subcommand Jun 10, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 10, 2019
}

func initDRSnapshotRestore(value string) error {
if len(value) == 0 {
Copy link
Member

@runcom runcom Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit if value == ""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit if value == ""

project standard is to use len. If you have instances of the == "" please update to match the kube/openshift standard.

@vrutkovs
Copy link
Contributor Author

/cc @smarterclayton

Rewrote DR restore from snapshot scenario test, PTAL

"sudo -i install -o core -g core /root/assets/backup/snapshot.db /tmp/snapshot.db")
setMachineConfig("rollback-B.yaml", oc, mcps)

scpFileToHost(os.Getenv("KUBE_SSH_KEY_PATH"), proxy, "/home/core/.ssh/id_rsa", firstMaster)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @deads2k - the code in lines L78-95 is running when the cluster is considered to be in a bad state already.
This should be moved to a bash script, as it would be a great example for the customers of how DR is being tested in CI.

@vrutkovs
Copy link
Contributor Author

See #23208

@vrutkovs vrutkovs closed this Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants