Skip to content
This repository was archived by the owner on Feb 5, 2020. It is now read-only.

modules/ignition: introduce runtime-mappings.yaml#2039

Merged
squat merged 2 commits intocoreos:masterfrom
lucab:ups/runtime-mapping
Nov 17, 2017
Merged

modules/ignition: introduce runtime-mappings.yaml#2039
squat merged 2 commits intocoreos:masterfrom
lucab:ups/runtime-mapping

Conversation

@lucab
Copy link
Contributor

@lucab lucab commented Oct 4, 2017

This introduces a runtime-mappings.yaml to be used by tectonic-torcx.

In particular, this file is consumed by k8s-node-bootstrap.service
as a fallback to determine the docker version to install when the
api-server is unreachable (e.g. while installing a brand new cluster).
If an api-server is available, an up-to-date version of this data
will be sourced from a config-map instead as a primary source.

Part of OST-79

/cc @squeed

@lucab lucab force-pushed the ups/runtime-mapping branch from 29e8234 to f7f4004 Compare October 5, 2017 07:39
@lucab lucab changed the title [WIP] modules/ignition: introduce runtime-mappings.yaml modules/ignition: introduce runtime-mappings.yaml Oct 5, 2017
@lucab lucab force-pushed the ups/runtime-mapping branch from f7f4004 to d6f0188 Compare October 5, 2017 08:23
@alexsomesan
Copy link
Contributor

run smokes

@alexsomesan alexsomesan closed this Oct 5, 2017
@alexsomesan alexsomesan reopened this Oct 5, 2017
@lucab
Copy link
Contributor Author

lucab commented Oct 5, 2017

@robszumski this is the secondary source of version-mappings for the bootstrapper. The primary source is at https://github.com/coreos-inc/tectonic-cluo-operator/pull/13. Both of them will need to be owned by somebody involved in the release process and sync'd.

/cc @Quentin-M @diegs

@lucab
Copy link
Contributor Author

lucab commented Oct 5, 2017

Bikeshedding question: is the name specific enough? This thing used to be called version-manifest but it felt a bit too generic, hence the current name. I'm happy as it is, but still open to better proposals.

@coreosbot
Copy link

Can one of the admins verify this patch?

@diegs
Copy link
Contributor

diegs commented Oct 16, 2017

cc @derekparker

@derekparker
Copy link
Contributor

One thing we have discussed offline is using this configmap to express both Docker and Kubelet versions, since those are so tightly coupled. This means that the KVO Node Agent will also read from this configmap, and KVO will write to this configmap with the versions of docker / kubelet that are defined in the KVO upgrade spec.

The upgrade flow for Nodes then becomes:

  1. KVO updates this configmap with correct versions
  2. KVO applies needs-reboot to all Nodes
  3. CLUO applies before-update label on those Nodes
  4. Torcx and Node-agent run as before reboot hooks to do their work
  5. KVO waits for all Nodes to reboot, then proceeds

cc @diegs @lucab @squeed @crawford @aaronlevy

@squeed
Copy link

squeed commented Oct 17, 2017

The proposed flow makes sense - it's definitely an improvement - but I'm not sure that it's necessary, given that most kubelet upgrades will be non-disruptive and not involve tectonic-torcx.

1.8:
# As per k8s issue 42926, 1.11, 1.12, 1.13, and 17.03 are supported
# see https://github.com/kubernetes/kubernetes/issues/42926#issuecomment-325733231
docker: [ "1.13", "1.12"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't ship 1.13. We should instead just jump straight to 17.03.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just an optimistic forecast made at 1.7 time. This file is not yet consumed by the bootstrapper anyway, so I'm keeping this in sync with the old one for the moment, will bump them all at the same time once testing is green. We'll also need a forecast for 1.9.

@lucab
Copy link
Contributor Author

lucab commented Oct 19, 2017

@diegs so it looks like this can be already merged as is? (I'll decouple and take care of the new version followup)

diegs
diegs previously approved these changes Oct 19, 2017
Copy link
Contributor

@diegs diegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucab I'm good with merging this.

@mxinden
Copy link
Contributor

mxinden commented Nov 1, 2017

We did some changes (#2082) to the testing process. Please rebase on to current master, so that the basic-tests PR status is reported correctly.

@Quentin-M
Copy link
Contributor

@lucab

Both of them will need to be owned by somebody involved in the release process and sync'd.

May I understand why/how?
Does not look like it's changing anything to the way Tectonic is released?

@diegs
Copy link
Contributor

diegs commented Nov 2, 2017

@Quentin-M he means that this needs to be sync'd with one that is shipped in the KVO or TCLUO

@lucab lucab force-pushed the ups/runtime-mapping branch 2 times, most recently from 2b58dc8 to 0a2c0b2 Compare November 15, 2017 15:21
@@ -1,25 +0,0 @@
# This file is supposed to be symlinked in consuming modules
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre it looks you copied this file instead of symlinking it. I'm fixing GCP here as it's making my PR fail. Thanks to @squat for quickly figuring this out.

@lucab
Copy link
Contributor Author

lucab commented Nov 15, 2017

Rebased, PTAL. It is ready for review and targeted at Helium as per https://github.com/coreos-inc/kube-version-operator/blob/master/Documentation/tectonic_torcx.md#tectonic-install-flow. I think this needs additional labels in order to be tested on all platforms.

@squat
Copy link
Contributor

squat commented Nov 15, 2017

ok to test

tectonic_prometheus_operator = "quay.io/coreos/tectonic-prometheus-operator:v1.7.1"
tectonic_cluo_operator = "quay.io/coreos/tectonic-cluo-operator:v0.2.4"
tectonic_torcx = "quay.io/coreos/tectonic-torcx:installer-latest"
tectonic_torcx = "quay.io/coreos/tectonic-torcx:v0.2.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@s-urbaniak @sym3tri this gets rid of the mutable tag.

@lucab lucab force-pushed the ups/runtime-mapping branch from 0a2c0b2 to f0e972f Compare November 15, 2017 16:03
@meghnagala
Copy link

@cpanato What is the ETA on these tests passing?

@lucab lucab force-pushed the ups/runtime-mapping branch from f0e972f to 6f36d18 Compare November 15, 2017 18:21
diegs
diegs previously approved these changes Nov 15, 2017
Copy link
Contributor

@diegs diegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lucab
Copy link
Contributor Author

lucab commented Nov 15, 2017

@meghnagala I'm re-spawning my local metal cluster to double-check I didn't miss anything obvious. I've also re-pushed this PR, re-kicking the CI to ensure I didn't hit some odd flakes. If neither of those helps, I'm going to grab @cpanato tomorrow morning to dig further.

@cpanato
Copy link
Contributor

cpanato commented Nov 15, 2017

@lucab looks like all tests failed due timeout during the bootstrap process, we have all the logs from docker and journals we can dig into tomorrow if you don't mind.

This introduces a runtime-mappings.yaml to be used by tectonic-torcx.

In particular, this file is consumed by k8s-node-bootstrap.service
as a fallback to determine the docker version to install when the
api-server is unreachable (e.g. while installing a brand new cluster).
If an api-server is available, an up-to-date version of this data
will be sourced from a config-map instead as a primary source.
@lucab lucab force-pushed the ups/runtime-mapping branch from fc770dd to 4772a76 Compare November 16, 2017 07:02
@lucab
Copy link
Contributor Author

lucab commented Nov 16, 2017

This is still failing due to an unrelated failure in metal testing environment. We tracked this down to broken bridge connectivity across VMs on the Packet machine, which should be hopefully resolved soon. I'm going to rebase and push this once metal testing environment is fixed.

@squat
Copy link
Contributor

squat commented Nov 16, 2017

ok to test

@squat
Copy link
Contributor

squat commented Nov 16, 2017

Some azure tests failed; running those selectively so we can run just the bare-metal tests once that CI infrastructure is fixed.

@cpanato
Copy link
Contributor

cpanato commented Nov 16, 2017

retest this please

@cpanato
Copy link
Contributor

cpanato commented Nov 16, 2017

now all azure are green

@cpanato
Copy link
Contributor

cpanato commented Nov 17, 2017

retest this please

@cpanato
Copy link
Contributor

cpanato commented Nov 17, 2017

@lucab tests passed

Copy link
Contributor

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR was already LGTM'd. reapproving so we can merge.

@squat squat merged commit 86a6add into coreos:master Nov 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.