modules/ignition: introduce runtime-mappings.yaml by lucab · Pull Request #2039 · coreos/tectonic-installer

lucab · 2017-10-04T09:59:27Z

This introduces a runtime-mappings.yaml to be used by tectonic-torcx.

In particular, this file is consumed by k8s-node-bootstrap.service
as a fallback to determine the docker version to install when the
api-server is unreachable (e.g. while installing a brand new cluster).
If an api-server is available, an up-to-date version of this data
will be sourced from a config-map instead as a primary source.

Part of OST-79

/cc @squeed

alexsomesan · 2017-10-05T09:23:09Z

run smokes

lucab · 2017-10-05T09:42:20Z

@robszumski this is the secondary source of version-mappings for the bootstrapper. The primary source is at https://github.com/coreos-inc/tectonic-cluo-operator/pull/13. Both of them will need to be owned by somebody involved in the release process and sync'd.

/cc @Quentin-M @diegs

lucab · 2017-10-05T14:01:47Z

Bikeshedding question: is the name specific enough? This thing used to be called version-manifest but it felt a bit too generic, hence the current name. I'm happy as it is, but still open to better proposals.

coreosbot · 2017-10-16T11:32:54Z

Can one of the admins verify this patch?

diegs · 2017-10-16T17:57:44Z

cc @derekparker

derekparker · 2017-10-16T18:05:25Z

One thing we have discussed offline is using this configmap to express both Docker and Kubelet versions, since those are so tightly coupled. This means that the KVO Node Agent will also read from this configmap, and KVO will write to this configmap with the versions of docker / kubelet that are defined in the KVO upgrade spec.

The upgrade flow for Nodes then becomes:

KVO updates this configmap with correct versions
KVO applies needs-reboot to all Nodes
CLUO applies before-update label on those Nodes
Torcx and Node-agent run as before reboot hooks to do their work
KVO waits for all Nodes to reboot, then proceeds

cc @diegs @lucab @squeed @crawford @aaronlevy

squeed · 2017-10-17T09:55:31Z

The proposed flow makes sense - it's definitely an improvement - but I'm not sure that it's necessary, given that most kubelet upgrades will be non-disruptive and not involve tectonic-torcx.

crawford · 2017-10-19T08:57:08Z

modules/ignition/resources/kubernetes/runtime-mappings.yaml

+    1.8:
+        # As per k8s issue 42926, 1.11, 1.12, 1.13, and 17.03 are supported
+        # see https://github.com/kubernetes/kubernetes/issues/42926#issuecomment-325733231
+        docker: [ "1.13", "1.12"]


We don't ship 1.13. We should instead just jump straight to 17.03.

This was just an optimistic forecast made at 1.7 time. This file is not yet consumed by the bootstrapper anyway, so I'm keeping this in sync with the old one for the moment, will bump them all at the same time once testing is green. We'll also need a forecast for 1.9.

lucab · 2017-10-19T15:37:55Z

@diegs so it looks like this can be already merged as is? (I'll decouple and take care of the new version followup)

diegs

@lucab I'm good with merging this.

mxinden · 2017-11-01T11:53:47Z

We did some changes (#2082) to the testing process. Please rebase on to current master, so that the basic-tests PR status is reported correctly.

Quentin-M · 2017-11-02T02:39:33Z

@lucab

Both of them will need to be owned by somebody involved in the release process and sync'd.

May I understand why/how?
Does not look like it's changing anything to the way Tectonic is released?

diegs · 2017-11-02T02:41:29Z

@Quentin-M he means that this needs to be sync'd with one that is shipped in the KVO or TCLUO

lucab · 2017-11-15T15:23:39Z

modules/gcp/worker-igm/variables-ignition.tf

@@ -1,25 +0,0 @@
-# This file is supposed to be symlinked in consuming modules


@enxebre it looks you copied this file instead of symlinking it. I'm fixing GCP here as it's making my PR fail. Thanks to @squat for quickly figuring this out.

lucab · 2017-11-15T15:37:56Z

Rebased, PTAL. It is ready for review and targeted at Helium as per https://github.com/coreos-inc/kube-version-operator/blob/master/Documentation/tectonic_torcx.md#tectonic-install-flow. I think this needs additional labels in order to be tested on all platforms.

squat · 2017-11-15T15:39:56Z

ok to test

lucab · 2017-11-15T15:49:04Z

config.tf

    tectonic_prometheus_operator = "quay.io/coreos/tectonic-prometheus-operator:v1.7.1"
    tectonic_cluo_operator       = "quay.io/coreos/tectonic-cluo-operator:v0.2.4"
-    tectonic_torcx               = "quay.io/coreos/tectonic-torcx:installer-latest"
+    tectonic_torcx               = "quay.io/coreos/tectonic-torcx:v0.2.0"


@s-urbaniak @sym3tri this gets rid of the mutable tag.

meghnagala · 2017-11-15T17:51:31Z

@cpanato What is the ETA on these tests passing?

diegs

LGTM

lucab · 2017-11-15T18:27:52Z

@meghnagala I'm re-spawning my local metal cluster to double-check I didn't miss anything obvious. I've also re-pushed this PR, re-kicking the CI to ensure I didn't hit some odd flakes. If neither of those helps, I'm going to grab @cpanato tomorrow morning to dig further.

cpanato · 2017-11-15T19:47:17Z

@lucab looks like all tests failed due timeout during the bootstrap process, we have all the logs from docker and journals we can dig into tomorrow if you don't mind.

This introduces a runtime-mappings.yaml to be used by tectonic-torcx. In particular, this file is consumed by k8s-node-bootstrap.service as a fallback to determine the docker version to install when the api-server is unreachable (e.g. while installing a brand new cluster). If an api-server is available, an up-to-date version of this data will be sourced from a config-map instead as a primary source.

lucab · 2017-11-16T16:27:35Z

This is still failing due to an unrelated failure in metal testing environment. We tracked this down to broken bridge connectivity across VMs on the Packet machine, which should be hopefully resolved soon. I'm going to rebase and push this once metal testing environment is fixed.

squat · 2017-11-16T16:43:02Z

ok to test

squat · 2017-11-16T16:43:59Z

Some azure tests failed; running those selectively so we can run just the bare-metal tests once that CI infrastructure is fixed.

cpanato · 2017-11-16T18:08:05Z

retest this please

cpanato · 2017-11-16T19:00:36Z

now all azure are green

cpanato · 2017-11-17T10:35:56Z

retest this please

cpanato · 2017-11-17T11:05:57Z

@lucab tests passed

squat

PR was already LGTM'd. reapproving so we can merge.

lucab force-pushed the ups/runtime-mapping branch from 29e8234 to f7f4004 Compare October 5, 2017 07:39

lucab changed the title ~~[WIP] modules/ignition: introduce runtime-mappings.yaml~~ modules/ignition: introduce runtime-mappings.yaml Oct 5, 2017

lucab force-pushed the ups/runtime-mapping branch from f7f4004 to d6f0188 Compare October 5, 2017 08:23

alexsomesan added the run-smoke-tests label Oct 5, 2017

alexsomesan closed this Oct 5, 2017

alexsomesan reopened this Oct 5, 2017

lucab mentioned this pull request Oct 6, 2017

cli: consume external runtime-mappings coreos/tectonic-torcx#49

Merged

lucab force-pushed the ups/runtime-mapping branch from d6f0188 to 9f15f65 Compare October 16, 2017 11:32

crawford reviewed Oct 19, 2017

View reviewed changes

diegs previously approved these changes Oct 19, 2017

View reviewed changes

lucab dismissed diegs’s stale review via 2b58dc8 November 15, 2017 14:53

lucab force-pushed the ups/runtime-mapping branch 2 times, most recently from 2b58dc8 to 0a2c0b2 Compare November 15, 2017 15:21

lucab commented Nov 15, 2017

View reviewed changes

squat added platform/aws platform/azure platform/bare-metal platform/gcp labels Nov 15, 2017

lucab commented Nov 15, 2017

View reviewed changes

lucab force-pushed the ups/runtime-mapping branch from 0a2c0b2 to f0e972f Compare November 15, 2017 16:03

lucab force-pushed the ups/runtime-mapping branch from f0e972f to 6f36d18 Compare November 15, 2017 18:21

diegs previously approved these changes Nov 15, 2017

View reviewed changes

lucab dismissed diegs’s stale review via fc770dd November 15, 2017 21:53

lucab force-pushed the ups/runtime-mapping branch from 6f36d18 to fc770dd Compare November 15, 2017 21:53

lucab added 2 commits November 16, 2017 07:02

modules/gcp: symlink variables-ignition.tf for master and worker

4772a76

lucab force-pushed the ups/runtime-mapping branch from fc770dd to 4772a76 Compare November 16, 2017 07:02

squat removed platform/aws platform/gcp platform/bare-metal labels Nov 16, 2017

cpanato added platform/bare-metal and removed platform/azure labels Nov 17, 2017

squat approved these changes Nov 17, 2017

View reviewed changes

squat merged commit 86a6add into coreos:master Nov 17, 2017

		@@ -1,25 +0,0 @@
		# This file is supposed to be symlinked in consuming modules

Conversation

lucab commented Oct 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexsomesan commented Oct 5, 2017

Uh oh!

lucab commented Oct 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucab commented Oct 5, 2017

Uh oh!

coreosbot commented Oct 16, 2017

Uh oh!

diegs commented Oct 16, 2017

Uh oh!

derekparker commented Oct 16, 2017

Uh oh!

squeed commented Oct 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crawford Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

lucab Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

lucab commented Oct 19, 2017

Uh oh!

diegs left a comment

Choose a reason for hiding this comment

Uh oh!

mxinden commented Nov 1, 2017

Uh oh!

Quentin-M commented Nov 2, 2017

Uh oh!

diegs commented Nov 2, 2017

Uh oh!

lucab Nov 15, 2017

Choose a reason for hiding this comment

Uh oh!

lucab commented Nov 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

squat commented Nov 15, 2017

Uh oh!

lucab Nov 15, 2017

Choose a reason for hiding this comment

Uh oh!

meghnagala commented Nov 15, 2017

Uh oh!

diegs left a comment

Choose a reason for hiding this comment

Uh oh!

lucab commented Nov 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpanato commented Nov 15, 2017

Uh oh!

lucab commented Nov 16, 2017

Uh oh!

squat commented Nov 16, 2017

Uh oh!

squat commented Nov 16, 2017

Uh oh!

cpanato commented Nov 16, 2017

Uh oh!

cpanato commented Nov 16, 2017

Uh oh!

cpanato commented Nov 17, 2017

Uh oh!

cpanato commented Nov 17, 2017

Uh oh!

squat left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

lucab commented Oct 4, 2017 •

edited

Loading

lucab commented Oct 5, 2017 •

edited

Loading

squeed commented Oct 17, 2017 •

edited

Loading

lucab commented Nov 15, 2017 •

edited

Loading

lucab commented Nov 15, 2017 •

edited

Loading