Skip to content
This repository has been archived by the owner on Jan 5, 2022. It is now read-only.

Merge with upstream Typhoon #3

Merged
merged 81 commits into from
Apr 10, 2018
Merged

Merge with upstream Typhoon #3

merged 81 commits into from
Apr 10, 2018

Conversation

bendrucker
Copy link

Merges our changes with Typhoon.

New features

Removals

  • Container Linux snippets allow us to pass in additional files, systemd units, and users. var.ssh_authorized_keys (list) is reverted to var.ssh_authorized_key (single string). I will generate a CL config with users/keys in the kubernetes project.
  • OIDC support is replaced with custom apiserver args. Again, I'll set these args in the kubernetes project.

pms1969 and others added 30 commits February 22, 2018 16:10
* Set Kubelet search path for flexvolume plugins
to /var/lib/kubelet/volumeplugins
* Add support for flexvolume plugins on AWS, GCE, and DO
* See 9548572 which added flexvolume support for bare-metal
* Author no longer works for CoreOS / Red Hat
* Typhoon development continues as usual
* Fix digital-ocean module to pass ssh_fingerprints
as a list since the module accepts a list
* Upcoming releases may begin to use features that require
the `terraform-provider-ct` plugin v0.2.1
* New users should use `terraform-provider-ct` v0.2.1. Existing
users can safely drop-in replace their v0.2.0 plugin with v0.2.1
as well (location referenced in ~/.terraformrc).
* See poseidon#145
* Template terraform-render-bootkube's multi-line kubeconfig
output using the right indentation
* Add `kubeconfig` variable to google-cloud controllers and
workers Terraform submodules
* Remove `kubeconfig_*` variables from google-cloud controllers
and workers Terraform submodules
* etcd_service_ip dates back to deprecated self-hosted etcd
* Don't need to define a specific dated image. Managed
instance groups do not delete instances when new images
are released to a channel
* Set defaults for internal worker module's count,
machine_type, and os_image
* Allow "pools" of homogeneous workers to be created
using the google-cloud/kubernetes/workers module
* Allow groups of workers to be defined and joined to
a cluster (i.e. worker pools)
* Move worker resources into a Terraform submodule
* Output variables needed for passing to worker pools
* Add usage docs for AWS worker pools (advanced)
* This reverts commit cce4537.
* Provider passing to child modules is complex and the behavior
changed between Terraform v0.10 and v0.11. We're continuing to
allow both versions so this change should be reverted. For the
time being, those using our internal Terraform modules will have
to be aware of the minimum version for AWS and GCP providers,
there is no good way to do enforcement.
* Fix issue where worker firewall rules didn't apply to
additional workers attached to a GCP cluster using the new
"worker pools" feature (unreleased, poseidon#148). Solves host
connection timeouts and pods not being scheduled to attached
worker pools.
* Add `name` field to GCP internal worker module to represent
the unique name of of the worker pool
* Use `cluster_name` field of GCP internal worker module for
passing the name of the cluster to which workers should be
attached
* Ensure consistency between AWS and GCP platforms
* Annotate Prometheus service to scrape metrics from
Prometheus itself (enables Prometheus* alerts)
* Update kube-state-metrics addon-resizer to 1.7
* Use port 8080 for kube-state-metrics
* Add PrometheusNotIngestingSamples alert rule
* Change K8SKubeletDown alert rule to fire when 10%
of kubelets are down, not 1%
  * prometheus-operator/prometheus-operator#1032
dghubble and others added 28 commits March 25, 2018 20:41
* Calico isn't viable on Digital Ocean because their firewalls
do not support IP-IP protocol. Its not viable to run a cluster
without firewalls just to use Calico.
* Remove the caveat note. Don't allow users to shoot themselves
in the foot
* Remove optional machine_type variable on Google Cloud
* Use controller_type and worker_type instead
* Previously, etcd secrets were erroneously distributed to worker
nodes (permissions 500, ownership etc:etcd).
* AWS and Google Cloud make use of auto-scaling groups
and managed instance groups, respectively. As such, the
kubeconfig is already held in cloud user-data
* Controller instances are provisioned with a kubeconfig
from user-data. Its redundant to use a Terraform remote
file copy step for the kubeconfig.
* Change EBS volume type from `standard` ("prior generation)
 to `gp2`. Prometheus alerts are tuned for SSDs
* Other platforms have fast enough disks by default
* Use etcd v3.3 --listen-metrics-urls to expose only metrics
data via http://0.0.0.0:2381 on controllers
* Add Prometheus discovery for etcd peers on controller nodes
* Temporarily drop two noisy Prometheus alerts
* Expose etcd metrics to workers so Prometheus can
run on a worker, rather than a controller
* Drop temporary firewall rules allowing Prometheus
to run on a controller and scrape targes
* Related to poseidon#175
* Kubernetes recommends using the alias to fetch images
from the nearest GCR regional mirror, to abstract the use
of GCR, and to drop names containing 'google'
* https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ
* Terraform v0.11.4 introduced changes to remote-exec
that mean Typhoon bare-metal clusters require multiple
runs of terraform apply to ssh and bootstrap.
* Bare-metal installs PXE boot a live instance to install
to disk and then reboot from disk as controllers/workers.
Terraform remote-exec has no way to "know" to wait until
the reboot has occurred to kickoff Kubernetes bootstrap.
Previously Typhoon created a "debug" user during this
install phase to allow an admin to SSH, but remote-exec
would hang, trying to connect as user "core". Terraform
v0.11.4 changes this behavior so remote-exec fails and
a user must re-run terraform apply until succeeding.
* A new way to "trick" remote-exec into waiting for the
reboot into the disk install is to run SSH on a non-standard
port during the disk install. This retains the ability
for an admin to SSH during install (most distros don't have
this) and fixes the issue so only a single run of terraform
apply is needed.
* hashicorp/terraform#17359 (comment)
* poseidon#145
* Additional users can be easily added upstream
@bendrucker bendrucker merged commit e62801d into master Apr 10, 2018
@bendrucker bendrucker deleted the merge-upstream-b8656fd branch April 10, 2018 22:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants