-
Notifications
You must be signed in to change notification settings - Fork 462
[fcos] Bug 1802534: gcp-routes: move to MCO, implement downfile, tweak timing #1741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fcos] Bug 1802534: gcp-routes: move to MCO, implement downfile, tweak timing #1741
Conversation
This is required for F31
This would disable cgroupsv2 and Spectre mitigations
Zincati would hit FCOS servers and update machines
with `make go-deps`
This is necessary to prevent races
EtcdInformer is only used by the MCO pod to reconcile image names. This pulls out the logic of creating etcd informer from the generic controller context to MCO pod start method.
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
On an upgrade from a 4.3 cluster to a cluster-etcd-operator enabled cluster discovery fails because the expected operator environment is not getting created.
[fcos] Cherry-pick commits to enable cluster-etcd-operator
The new cluster etcd operator flow is: 1) start bootstrap mcs 2) start etcd on bootstrap 3) wait for bootstrapping to finish i.e. atleast one control-plane is ready and there is MCS running on cluster 4) turn down bootstrap mcs What the above does is giving a chance to workers to grab the ignition config from the bootstap server which now stays up longer. However, by the time they attempt to create a CSR the kube-apiserver has rotated that bootstrap chain of trust out which causes the workers to error out with: Jan 29 19:55:20 ip-10-0-130-205 hyperkube[2623]: E0129 19:55:20.869251 2623 certificate_manager.go:421] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Unauthorized The above results in workers not being able to join the cluster eventually. What this patch does is denying serving the configuration to all pools but master within the bootstrap server, effectively delaying workers to grab the wrong config from the wrong server. Workers will keep polling for configuration and they'll eventually grab the correct one from the server running within the new cluster. Signed-off-by: Antonio Murdaca <runcom@linux.com>
…rry-pick-1421-to-fcos [fcos] pkg/server: serve config only to master in bootstrap server
This would ensure masters / workers would report as healthy to GCP LBs
[fcos] GCP: add a script and a service which ensure GCP routes are set correctly
Make sure daemon doesn't panic when mask or contents is nil
[FCOS] pkg/daemon/update.go: check for nil pointers
[fcos] remove the etcd-member pod because we no longer need it
Avoid running python scripts on host and use `podman run --net=host` instead Cherrypicked to master as openshift#1521
…ontainer [FCOS] non-virtual-ip: replace the script with podman wrapper
node-ip is a subcommand that allows the user to see which IP should the node use in cases of multiple interface and multiple address nodes. This is useful to prevent cases where Container Runtime related services bind to an interface that is not reachable in the control plane. It has two commands: * show: Takes one or more Virtual IPs of the control plane and it gives you one eligible IP on stdout. * set: Takes one or more Virtual IPs of the control plane and sets systemd service configuration for services like CRI-O or Kubelet that need to bind to the control plane. Signed-off-by: Antoni Segura Puimedon <antoni@redhat.com>
[fcos] Use mcd subcommand to determine node ip
This patch adds the workaround suggested on [1] to make nodeport work, instead of ethtool we use NM to apply the fix for each connction before it is up. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714 Signed-off-by: Gal Zaidman <gzaidman@redhat.com>
Signed-off-by: Antonio Murdaca <runcom@linux.com>
as it's difficult to debug with cri-o only set to log level error Signed-off-by: Peter Hunt <pehunt@redhat.com>
When keepalived sets the VIP, it triggers a connection event in NetworkManager that, since the name we set from DHCP was transient it would do a reverse lookup on the connection address. Unfortunately, NM does not filter out the deprecated VIP for the reverse lookup and ends up overriding the hostname with DNS names that map to VIPs configured in the system. This fix is a workaround that on environments that have DHCP provide the hostname, will prevent the erroneous behavior of NM by setting DHCP provided FQDN addresses as static, which prevents NM from doing further address lookups. On hostname-less DHCP environments we'd want to hook to the hostname event in NM and make sure that the first one that does not map to a VIP is the one that gets set as static. Signed-off-by: Antoni Segura Puimedon <antoni@redhat.com>
On PR [1] we added a workaround for Bug [2], this fails when the worker starts for the first time since openshift-sdn is created only when the sdn pod is starting. Instead we will disable by default leave as is only when running with OVNkubernetes [1] On PR openshift#1606, [2] https://bugzilla.redhat.com/show_bug.cgi?id=1794714 Signed-off-by: Gal Zaidman <gzaidman@redhat.com>
This patch adds the workaround suggested on [1] to make nodeport work, instead of ethtool we use NM to apply the fix for each connction before it is up. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714 Brings the following ovirt fixes to openstack platform: - openshift#1606 - openshift#1621
This is inferred by the golang-1.13 container now Remove now erroring copying of scripts and add workaround for k8s.io/code-generator not playing nice with go modules.
Restore code-generator's go.mod file and remove vendor/k8s.io/code-generator/vendor directory after running verification in`make verify`
A map doesn't guarantee order when we are creating new ignitions. When we update the image CR with blocked registries, the ctrcfg controller needs to update two files registries.conf and policy.json. Since we get an update from the image CR about every 20 mins, we compare the semantics to see if anything has changed. But since the order is not guaranteed, the controller might think that the semantics is not equal even if nothing in the data changed. Hence another MC is created, and everytime we get an update the MC applied to the nodes keeps flipping back and forth for the 2 possible orders causing the nodes to reboot a bunch of times. So move to using a struct array to ensure the order is always the same and we don't have two similar MCs being created. Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
[fcos] rebase on recent master
Ensure `curl` command has NSS_SDB_USE_CACHE env var set
…rry-pick-1648-to-fcos etcdquorumguard_deployment: pass NSS_SDB_USE_CACHE=no to curl
MCO should apply etcd-quorum-guard deployment instead of CVO. It also controls the number of replicas in this deployment: it would scale 1 replica if CEO's useUnsupportedUnsafeNonHANonProductionUnstableEtcd option is enabled. This allows creating single node clusters
[FCOS] Control amount of replicas in etcd-quorum-guard deployment
They're missing a lot of things, but let's just add a reasonable Infra object for now.
A new mode for gcp-routes allows for marking individual VIPs as down. If that's available, use it.
OpenShift wants a pretty different version of gcp-routes.service compared to the generic case. So we really should just control it ourselves. This creates a service, openshift-gcp-routes.service, that conflicts with the gcp-routes.service from the RHCOS overlay. It also picks up some pending improvements to it, namely downfile support.
784081c to
6f5b3c5
Compare
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
/close |
|
@LorbusChris: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is an automated cherry-pick of #1670
/assign LorbusChris