Skip to content

Conversation

@hexfusion
Copy link
Contributor

@hexfusion hexfusion commented Nov 1, 2019

NOTE cluster-etcd-operator is out for 4.3

In 4.3 cluster-etcd-operator will take over the process of bootstrapping the etcd cluster. To provide a clear path to disable/revert these changes we have setup the following conditional logic.

MCO: The MCO render command invoked in bootkube has a new optional flag to pass the value of the cluster-etcd-operator image[1]. The availability of this flags value[2] is used to conditionally adjust the etcd-member static pod spec allowing it to use the new bootstrapping method via the operator or fall back to the 4.2 SRV method.

Installer: The installer in 4.3 has a few notable changed introduced by this PR. First of all the render command populates a static pod manifest which creates a single member etcd cluster. After we have the single node cluster we can progress and cluster-operator can be deployed. This speeds up the time it takes for the operators to begin to reconcile as we are no longer waiting for all 3 etcds to bootstrap before we progress the operators.

cluster-etcd-operator: CEO is currently set as Unmanaged[3]. This allows us to include the CEO in CVO operator payload while setting the controllers to perform noop. This short term phase allows us to merge this PR proving that we can at the same time have CEO included in CVO but still use the old SRV bootstrap.

Revert Plan: If a case existed where we had a design error and the operator needed to be pulled from 4.3.
:

[1] https://github.com/openshift/installer/pull/2608/files#diff-ce82c1d8a44f7dfc41dfc024085ccfeeR298
[2] https://github.com/openshift/machine-config-operator/blob/bd846958bc95d049547164046a962054fca093df/templates/master/00-master/_base/files/etc-kubernetes-manifests-etcd-member.yaml#L22
[3] https://github.com/openshift/cluster-etcd-operator/blob/master/manifests/0000_12_etcd-operator_01_operator.cr.yaml#L8

Depends on:

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 1, 2019
@hexfusion
Copy link
Contributor Author

/test e2e-aws

@hexfusion
Copy link
Contributor Author

adding DNS for other cloud providers

@hexfusion hexfusion force-pushed the 4.3-cluster-etcd-operator-v1 branch 4 times, most recently from ffd10f8 to 58545f4 Compare November 1, 2019 18:20
@patrickdillon
Copy link
Contributor

patrickdillon commented Nov 1, 2019

In general, installer repo expects commit messages. See formatting here: https://github.com/patrickdillon/installer/blob/master/CONTRIBUTING.md#commit-message-format

Just reread your PR:

I will write up a full changelog soon.

@hexfusion hexfusion force-pushed the 4.3-cluster-etcd-operator-v1 branch from 2a4900a to a7e55c1 Compare November 1, 2019 20:01
@hexfusion
Copy link
Contributor Author

@patrickdillon I will clean this all up before we get to merging time.

@hexfusion
Copy link
Contributor Author

hexfusion commented Nov 1, 2019

/cc @abhinavdahiya @crawford

Curious about general thoughts. The idea for this PR is to set CEO up for full testing. But we are testing with CEO disabled which would be the result if we had to revert.

records = [var.etcd_ip_addresses[count.index]]
}

resource "aws_route53_record" "bootstrap_a_node" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, we try to keep all the bootstrap resources ie resources that are used by bootstrap-host and should be deleted when bootstrapping is complete, in the bootstrap module.

So I think a better way would be to pass the internal_zone_id, cluster_domain to the bootstrap module and create the dns record there??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to resolve this in a patch after this PR if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to resolve this in a patch after this PR if possible.

hmm, this shouldn't be too much change..
just want to make sure we don't delete resources that aren't meant to be deleted..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@hexfusion hexfusion changed the title WIP *: add logic for cluster-etcd-operator toggle *: add logic for cluster-etcd-operator toggle Nov 4, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2019
@hexfusion hexfusion changed the title *: add logic for cluster-etcd-operator toggle WIP *: add logic for cluster-etcd-operator toggle Nov 4, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2019
@hexfusion
Copy link
Contributor Author

/retest

@hexfusion
Copy link
Contributor Author

/skip

@hexfusion hexfusion changed the title WIP *: add logic for cluster-etcd-operator toggle *: add logic for cluster-etcd-operator toggle Nov 5, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2019
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

["${var.bootstrap_ip_address}"] -> var.bootstrap_ip_address

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

["${var.bootstrap_ip_address}"] -> [var.bootstrap_ip_address] didn't realize that the records needed a list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/installer/pull/2608/files#diff-1dd6be44797de18d5c235fc324f8ee2aR92

i think this should be something like https://github.com/openshift/installer/pull/2608/files#diff-764adb23dcc0edbbebc09192eb233e9aR3

Comment on lines 121 to 122
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you needs these volumes?

Copy link
Contributor

@abhinavdahiya abhinavdahiya Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we start the etcd member here?
https://github.com/openshift/installer/pull/2608/files#diff-ce82c1d8a44f7dfc41dfc024085ccfeeR200
but the bootstrap-apiserver puts the manifests to be run when bootstrapping the cluster..

Is it because we think the lifecycle of that pod is separate from the bootstrap-control-plane

@hexfusion hexfusion force-pushed the 4.3-cluster-etcd-operator-v1 branch 3 times, most recently from f46adcc to d364191 Compare November 26, 2019 11:02
@hexfusion
Copy link
Contributor Author

/retest

@hexfusion hexfusion force-pushed the 4.3-cluster-etcd-operator-v1 branch from d364191 to 3158e81 Compare November 26, 2019 14:22
@hexfusion hexfusion force-pushed the 4.3-cluster-etcd-operator-v1 branch from 3158e81 to b7e62e4 Compare November 26, 2019 14:40
@hexfusion
Copy link
Contributor Author

FTR will rebase this into something reasonable when I am done with the review.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 26, 2019

@hexfusion: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 b7e62e4 link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-aws b7e62e4 link /test e2e-aws
ci/prow/e2e-aws-fips b7e62e4 link /test e2e-aws-fips

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hexfusion
Copy link
Contributor Author

We have reworked this into a simpler solution, continued here #2730

@hexfusion hexfusion closed this Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants