Merge with upstream Typhoon #3

bendrucker · 2018-04-10T21:59:14Z

Merges our changes with Typhoon.

New features

HTTP etcd metrics endpoint
Separate worker module for creating multiple pools
Kubernetes v1.10

Removals

Container Linux snippets allow us to pass in additional files, systemd units, and users. var.ssh_authorized_keys (list) is reverted to var.ssh_authorized_key (single string). I will generate a CL config with users/keys in the kubernetes project.
OIDC support is replaced with custom apiserver args. Again, I'll set these args in the kubernetes project.

* Set Kubelet search path for flexvolume plugins to /var/lib/kubelet/volumeplugins * Add support for flexvolume plugins on AWS, GCE, and DO * See 9548572 which added flexvolume support for bare-metal

* Author no longer works for CoreOS / Red Hat * Typhoon development continues as usual

* Fix digital-ocean module to pass ssh_fingerprints as a list since the module accepts a list

* Upcoming releases may begin to use features that require the `terraform-provider-ct` plugin v0.2.1 * New users should use `terraform-provider-ct` v0.2.1. Existing users can safely drop-in replace their v0.2.0 plugin with v0.2.1 as well (location referenced in ~/.terraformrc). * See poseidon#145

* https://github.com/projectcalico/calico/releases/tag/v3.0.3

* Template terraform-render-bootkube's multi-line kubeconfig output using the right indentation * Add `kubeconfig` variable to google-cloud controllers and workers Terraform submodules * Remove `kubeconfig_*` variables from google-cloud controllers and workers Terraform submodules

* etcd_service_ip dates back to deprecated self-hosted etcd

* Don't need to define a specific dated image. Managed instance groups do not delete instances when new images are released to a channel

* Set defaults for internal worker module's count, machine_type, and os_image * Allow "pools" of homogeneous workers to be created using the google-cloud/kubernetes/workers module

* Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) * Move worker resources into a Terraform submodule * Output variables needed for passing to worker pools * Add usage docs for AWS worker pools (advanced)

* This reverts commit cce4537. * Provider passing to child modules is complex and the behavior changed between Terraform v0.10 and v0.11. We're continuing to allow both versions so this change should be reverted. For the time being, those using our internal Terraform modules will have to be aware of the minimum version for AWS and GCP providers, there is no good way to do enforcement.

* Fix issue where worker firewall rules didn't apply to additional workers attached to a GCP cluster using the new "worker pools" feature (unreleased, poseidon#148). Solves host connection timeouts and pods not being scheduled to attached worker pools. * Add `name` field to GCP internal worker module to represent the unique name of of the worker pool * Use `cluster_name` field of GCP internal worker module for passing the name of the cluster to which workers should be attached

* Ensure consistency between AWS and GCP platforms

* Annotate Prometheus service to scrape metrics from Prometheus itself (enables Prometheus* alerts) * Update kube-state-metrics addon-resizer to 1.7 * Use port 8080 for kube-state-metrics * Add PrometheusNotIngestingSamples alert rule * Change K8SKubeletDown alert rule to fire when 10% of kubelets are down, not 1% * prometheus-operator/prometheus-operator#1032

* https://github.com/grafana/grafana/releases/tag/v5.0.1

* https://github.com/prometheus/prometheus/releases/tag/v2.2.0

* https://github.com/coreos/etcd/releases/tag/v3.3.2

* hashicorp/terraform-provider-aws#3537 * https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/

* Calico isn't viable on Digital Ocean because their firewalls do not support IP-IP protocol. Its not viable to run a cluster without firewalls just to use Calico. * Remove the caveat note. Don't allow users to shoot themselves in the foot

* Remove optional machine_type variable on Google Cloud * Use controller_type and worker_type instead

* Previously, etcd secrets were erroneously distributed to worker nodes (permissions 500, ownership etc:etcd).

* AWS and Google Cloud make use of auto-scaling groups and managed instance groups, respectively. As such, the kubeconfig is already held in cloud user-data * Controller instances are provisioned with a kubeconfig from user-data. Its redundant to use a Terraform remote file copy step for the kubeconfig.

This reverts commit c59a9c6.

* Change EBS volume type from `standard` ("prior generation) to `gp2`. Prometheus alerts are tuned for SSDs * Other platforms have fast enough disks by default

* Use etcd v3.3 --listen-metrics-urls to expose only metrics data via http://0.0.0.0:2381 on controllers * Add Prometheus discovery for etcd peers on controller nodes * Temporarily drop two noisy Prometheus alerts

* https://github.com/coreos/etcd/releases/tag/v3.3.3

* kubernetes/kubernetes#61908

* kubernetes/kube-state-metrics#412 * kubernetes/kube-state-metrics#413

* Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to poseidon#175

* Kubernetes recommends using the alias to fetch images from the nearest GCR regional mirror, to abstract the use of GCR, and to drop names containing 'google' * https://groups.google.com/forum/#!msg/kubernetes-dev/ytjk_rNrTa0/3EFUHvovCAAJ

* Terraform v0.11.4 introduced changes to remote-exec that mean Typhoon bare-metal clusters require multiple runs of terraform apply to ssh and bootstrap. * Bare-metal installs PXE boot a live instance to install to disk and then reboot from disk as controllers/workers. Terraform remote-exec has no way to "know" to wait until the reboot has occurred to kickoff Kubernetes bootstrap. Previously Typhoon created a "debug" user during this install phase to allow an admin to SSH, but remote-exec would hang, trying to connect as user "core". Terraform v0.11.4 changes this behavior so remote-exec fails and a user must re-run terraform apply until succeeding. * A new way to "trick" remote-exec into waiting for the reboot into the disk install is to run SSH on a non-standard port during the disk install. This retains the ability for an admin to SSH during install (most distros don't have this) and fixes the issue so only a single run of terraform apply is needed. * hashicorp/terraform#17359 (comment)

* poseidon#145 * Additional users can be easily added upstream

pms1969 and others added 30 commits February 22, 2018 16:10

Switch apiserver from ELB to a network load balancer

ceb5555

Update CHANGES.md with AWS ELB to NLB change

461fd46

Update bootkube and terraform-render-bootkube to v0.11.0

c4914c3

Add kubelet --volume-plugin-dir flag

13f3745

* Set Kubelet search path for flexvolume plugins to /var/lib/kubelet/volumeplugins * Add support for flexvolume plugins on AWS, GCE, and DO * See 9548572 which added flexvolume support for bare-metal

List addons below platforms in CHANGES

66c64b4

Remove author employment disclosure note

92600ef

* Author no longer works for CoreOS / Red Hat * Typhoon development continues as usual

Mention the command that applies the changes

04c6613

Pass Digital Ocean ssh_fingerprints as a list

0da7757

* Fix digital-ocean module to pass ssh_fingerprints as a list since the module accepts a list

Update the Digital Ocean SSH fingerprint docs

3d9683b

Update Calico from v3.0.2 to v3.0.3

a44cf0e

* https://github.com/projectcalico/calico/releases/tag/v3.0.3

Improve links in tutorials and changelog notes

ea6bf9c

Remove unused etcd_service_ip template variable

98985e5

* etcd_service_ip dates back to deprecated self-hosted etcd

Show os_image coreos-stable on Google Cloud

06d40c5

* Don't need to define a specific dated image. Managed instance groups do not delete instances when new images are released to a channel

Add support for worker pools on google-cloud

160ae34

* Set defaults for internal worker module's count, machine_type, and os_image * Allow "pools" of homogeneous workers to be created using the google-cloud/kubernetes/workers module

Add support for worker pools on AWS

73126eb

* Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) * Move worker resources into a Terraform submodule * Output variables needed for passing to worker pools * Add usage docs for AWS worker pools (advanced)

Add module version requirements to internal workers modules

cce4537

Rename cluster_name to name in internal module

c112ee3

* Ensure consistency between AWS and GCP platforms

addons: Update from Grafana v4.6.3 to v5.0.0

9dcc255

Update CHANGES.md changelog with monitoring updates

0e688ef

Update Grafana from v5.0.0 to 5.0.1

d54709f

* https://github.com/grafana/grafana/releases/tag/v5.0.1

Update Prometheus from v2.2.0-rc.1 to v2.2.0

42708f9

* https://github.com/prometheus/prometheus/releases/tag/v2.2.0

Add ignore_changes for AWS worker image_id

b61d637

Update etcd from v3.3.1 to v3.3.2

9fb1e1a

* https://github.com/coreos/etcd/releases/tag/v3.3.2

Enable AWS NLB cross-zone load balancing

35f3b1b

* hashicorp/terraform-provider-aws#3537 * https://aws.amazon.com/about-aws/whats-new/2018/02/network-load-balancer-now-supports-cross-zone-load-balancing/

Normalize Terraform configs with terraform fmt

8e7e6b9

dghubble and others added 28 commits March 25, 2018 20:41

Improve cluster definition examples in docs

455a4af

Organize and cleanup variable descriptions

e43cf9f

Remove unmaintained pxe-worker internal module

ba9daf4

Add disk_size variable on Google Cloud

8d3d422

Add optional controller_type and worker_type vars on GCP

fdb543e

* Remove optional machine_type variable on Google Cloud * Use controller_type and worker_type instead

Ensure etcd secrets are only distributed to controller hosts

cfd603b

* Previously, etcd secrets were erroneously distributed to worker nodes (permissions 500, ownership etc:etcd).

Use consistent naming of remote provision steps

de4d907

addons: Update from Grafana v4.6.3 to v5.0.4

b1e41dc

This reverts commit c59a9c6.

Add disk_type variable for EBS volume type on AWS

f8e9bfb

* Change EBS volume type from `standard` ("prior generation) to `gp2`. Prometheus alerts are tuned for SSDs * Other platforms have fast enough disks by default

Update Kubernetes from v1.9.6 to v1.10.0

1cc043d

Update CHANGES.md with Kubernetes link

642f7ec

Add etcd metrics, Prometheus scrapes, and Grafana dash

d770393

* Use etcd v3.3 --listen-metrics-urls to expose only metrics data via http://0.0.0.0:2381 on controllers * Add Prometheus discovery for etcd peers on controller nodes * Temporarily drop two noisy Prometheus alerts

Update etcd from v3.3.2 to v3.3.3

ce001e9

* https://github.com/coreos/etcd/releases/tag/v3.3.3

Update kube-dns from v1.14.8 to v1.14.9

18dbaf7

* kubernetes/kubernetes#61908

Update kube-state-metrics from v1.2.0 to v1.3.0

7186aa4

* kubernetes/kube-state-metrics#412 * kubernetes/kube-state-metrics#413

Update docs builder and material theme

b76126d

Return Prometheus deployment to be a worker workload

f4b2396

* Expose etcd metrics to workers so Prometheus can run on a worker, rather than a controller * Drop temporary firewall rules allowing Prometheus to run on a controller and scrape targes * Related to poseidon#175

Clarify bare-metal SSH instructions

b8656fd

Merge upstream @ b8656fd

d3cac94

Remove built-in OIDC support, push to upstream via custom apiserver args

2aab565

pass custom ami value to new worker module

97cdf5e

launch workers into private subnets

5131075

Clarify comment

4a54dd6

Remove ability to set multiple authorized keys for "core" user

0091040

* poseidon#145 * Additional users can be easily added upstream

bendrucker merged commit e62801d into master Apr 10, 2018

bendrucker deleted the merge-upstream-b8656fd branch April 10, 2018 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge with upstream Typhoon #3

Merge with upstream Typhoon #3

bendrucker commented Apr 10, 2018

Merge with upstream Typhoon #3

Merge with upstream Typhoon #3

Conversation

bendrucker commented Apr 10, 2018

New features

Removals