-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Documentation/design: add launch design #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
openshift-merge-robot
merged 1 commit into
openshift:master
from
abhinavdahiya:design_launch
Jul 17, 2018
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,90 @@ | ||
| # Launch | ||
|
|
||
| ## Goals | ||
|
|
||
| 1. Validate credentials for cloud. | ||
| 2. Create a functional cluster. | ||
| 3. Validate the cluster is ready to be used by an user. | ||
|
|
||
| ## Overview | ||
|
|
||
| The launch phase creates a cluster based on the assets generated in the prepare phase. The launch performs operations, | ||
|
|
||
| 1. Pre-flight checks: Verify that credentials being passed to the installer are sufficient to bring up the cluster. | ||
| 2. Launch all the platform specific resources required for the cluster. For example in AWS this includes, DNS in Route53, VPCs, ELBs, IAM roles, Security groups etc. | ||
| 3. Bootstrap the cluster. | ||
| 4. Wait for the cluster to be ready for use by an user. | ||
|
|
||
| ## Detailed design | ||
|
|
||
| ### Idempotency | ||
|
|
||
| Launch is **NOT** idempotent. Re-running the `installer launch` command should exit with a failure. | ||
|
|
||
| ### Pre flight checks | ||
|
|
||
| ### Platform Specific checks | ||
|
|
||
| 1. AWS | ||
|
|
||
| The launch phase performs following checks | ||
|
|
||
| * Credentials are sufficient to create all the resources. | ||
| * If VPC / Route53 zones were supplied, ensure they are valid. | ||
| * TODO: add more | ||
|
|
||
| 2. Libvirt | ||
|
|
||
| The launch phase performs following checks | ||
|
|
||
| * Ensure QEMU URI is reachable | ||
| * OS image path is valid. | ||
| * TODO: add more | ||
|
|
||
| ### Launch platform-specific resources | ||
|
|
||
| Use terraform to create resources. | ||
|
|
||
| ### Bootstrapping cluster | ||
|
|
||
| **[Various options are discussed here](https://docs.google.com/document/d/17sTJ1mdWtPTFkaHLENX2aeEYw1-o4aDS2NbWnZRgOlY/edit#heading=h.r9how0eg6txs)** *This link might be private* | ||
|
|
||
| The goal is to eliminate external coordination steps and keep the "special" steps constrained to a single throw-away node. | ||
|
|
||
| a. Launch a bootstrap node from `bootstrap.ign` that was generated in prepare step. The user-data of the bootstrap node either contains the `bootstrap.ign` or points to a remote location that contains the `bootstrap.ign` file. | ||
|
|
||
| b. Launch 3 master nodes with ign endpoint as `api.example.com` | ||
|
|
||
| c. Launch ELB or equivalent resource that fronts bootstrap node, and masters | ||
|
|
||
| d. ALIAS `api.example.com` → ELB | ||
|
|
||
| e. Start bootstrap MachineConfig Server to serve ignition for the masters. Also start the `etcd-signer-server` for serving TLS assets for the etcd members. | ||
|
|
||
| f. When etcd on masters has formed quorum, stop local MachineConfig Server and the `etcd-signer-server`. | ||
|
|
||
| g. Run bootkube to launch self-hosted control-plane. | ||
|
|
||
| * Bootkube starts `boostrap-*` static pods on boostrap node to create a temporary control plane. | ||
| * Then bootkube uses the temporary control plane to bootstrap the self-hosted control plane. | ||
| * When bootkube validates that the self-hosted plane is ready, it shuts down. | ||
| * When bootkube shuts down, it tears down bootstrap-apiserver (this triggers fail on ELB healthcheck, and bootstrap node is removed from backend pool). | ||
|
|
||
| h. Destroy bootstrap node and destroy the remote location if ign for bootstrap node was served from remote endpoint as it stores secrets. | ||
|
|
||
| ### Bootstrapping etcd | ||
|
|
||
| Etcd is co-located with master nodes. Therefore the bootstrap MachineConfig Server serves the ignition file with the etcd static pods. But the etcd nodes need TLS assets to communicate with each other. | ||
|
|
||
| 1. The boostrap node runs a [`etcd-signer-server`](https://github.com/coreos/kubecsr/tree/master/cmd/kube-etcd-signer-server) docker container which mimics the kube-apiserver's `CertificateSigningRequest` endpoint. | ||
| 2. etcd systemd-service has a `PreStartHook` defined that runs a [`etcd-client`](https://github.com/coreos/kubecsr/tree/master/cmd/kube-client-agent). The `etcd-client` reaches out to the API server endpoint, currently being served by `etcd-signer-server`, for certificates. | ||
| 3. After the `PreStartHook` succeeds, each etcd member has all the TLS assets for creating the etcd cluster. | ||
|
|
||
| ### Verify cluster is ready | ||
|
|
||
| The launch step needs to exit only when the cluster is ready for use by an user. | ||
|
|
||
| TODO: need more info to decide when cluster is up. | ||
|
|
||
| a. Is installer done when control-plane is ready. | ||
| b. Installer is ready when all the second-level operators report `Done` condition. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why? As today the tectonic installer is idempotent right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
today
installer/installer/pkg/workflow/install.go
Lines 129 to 134 in ae41b0a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a requirement as we discussed previously. If the launch errors on re-run, it is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is most definitely not idempotent. Running the installer a second time is one of the best ways to screw up your cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it couple times when I came back to my terminal from interruption, didn't remember seeing errors. Anyway whatever is easier is fine to me.