Skip to content

Conversation

@openshift-cherrypick-robot

This is an automated cherry-pick of #1670

/assign LorbusChris

Vadim Rutkovsky and others added 30 commits January 18, 2020 06:13
This would disable cgroupsv2 and Spectre mitigations
Zincati would hit FCOS servers and update machines
with `make go-deps`
EtcdInformer is only used by the MCO pod to reconcile image names.
This pulls out the logic of creating etcd informer from the generic
controller context to MCO pod start method.
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
On an upgrade from a 4.3 cluster to a cluster-etcd-operator enabled
cluster discovery fails because the expected operator environment
is not getting created.
[fcos] Cherry-pick commits to enable cluster-etcd-operator
The new cluster etcd operator flow is:

1) start bootstrap mcs
2) start etcd on bootstrap
3) wait for bootstrapping to finish i.e. atleast one control-plane is ready and there is MCS running on cluster
4) turn down bootstrap mcs

What the above does is giving a chance to workers to grab
the ignition config from the bootstap server which now stays up longer.
However, by the time they attempt to create a CSR the kube-apiserver has
rotated that bootstrap chain of trust out which causes the workers to error out with:

Jan 29 19:55:20 ip-10-0-130-205 hyperkube[2623]: E0129 19:55:20.869251    2623 certificate_manager.go:421] Failed while requesting a signed certificate from the master: cannot create certificate signing request: Unauthorized

The above results in workers not being able to join the cluster eventually.

What this patch does is denying serving the configuration to all pools but master
within the bootstrap server, effectively delaying workers to grab the wrong config
from the wrong server. Workers will keep polling for configuration and they'll
eventually grab the correct one from the server running within the new cluster.

Signed-off-by: Antonio Murdaca <runcom@linux.com>
…rry-pick-1421-to-fcos

[fcos] pkg/server: serve config only to master in bootstrap server
This would ensure masters / workers would report as healthy to GCP LBs
[fcos] GCP: add a script and a service which ensure GCP routes are set correctly
Make sure daemon doesn't panic when mask or contents is nil
[FCOS] pkg/daemon/update.go: check for nil pointers
[fcos] remove the etcd-member pod because we no longer need it
Avoid running python scripts on host and use `podman run --net=host` instead

Cherrypicked to master as openshift#1521
…ontainer

[FCOS] non-virtual-ip: replace the script with podman wrapper
node-ip is a subcommand that allows the user to see which IP should the
node use in cases of multiple interface and multiple address nodes. This
is useful to prevent cases where Container Runtime related services bind
to an interface that is not reachable in the control plane.

It has two commands:

* show: Takes one or more Virtual IPs of the control plane and it gives
  you one eligible IP on stdout.

* set: Takes one or more Virtual IPs of the control plane and sets
  systemd service configuration for services like CRI-O or Kubelet that
  need to bind to the control plane.

Signed-off-by: Antoni Segura Puimedon <antoni@redhat.com>
[fcos] Use mcd subcommand to determine node ip
yuqi-zhang and others added 21 commits April 13, 2020 10:55
This patch adds the workaround suggested on [1]
to make nodeport work, instead of ethtool we use
NM to apply the fix for each connction before it is up.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714

Signed-off-by: Gal Zaidman <gzaidman@redhat.com>
Signed-off-by: Antonio Murdaca <runcom@linux.com>
as it's difficult to debug with cri-o only set to log level error

Signed-off-by: Peter Hunt <pehunt@redhat.com>
When keepalived sets the VIP, it triggers a connection event in
NetworkManager that, since the name we set from DHCP was transient it
would do a reverse lookup on the connection address. Unfortunately, NM
does not filter out the deprecated VIP for the reverse lookup and ends
up overriding the hostname with DNS names that map to VIPs configured in
the system.

This fix is a workaround that on environments that have DHCP provide the
hostname, will prevent the erroneous behavior of NM by setting DHCP
provided FQDN addresses as static, which prevents NM from doing further
address lookups.

On hostname-less DHCP environments we'd want to hook to the hostname
event in NM and make sure that the first one that does not map to a VIP
is the one that gets set as static.

Signed-off-by: Antoni Segura Puimedon <antoni@redhat.com>
On PR [1] we added a workaround for Bug [2], this
fails when the worker starts for the first time
since openshift-sdn is created only when the
sdn pod is starting.
Instead we will disable by default leave as is
only when running with OVNkubernetes

[1] On PR openshift#1606,
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1794714

Signed-off-by: Gal Zaidman <gzaidman@redhat.com>
This patch adds the workaround suggested on [1]
to make nodeport work, instead of ethtool we use
NM to apply the fix for each connction before it is up.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1794714

Brings the following ovirt fixes to openstack platform:
- openshift#1606
- openshift#1621
This is inferred by the golang-1.13 container now

Remove now erroring copying of scripts and add workaround for
k8s.io/code-generator not playing nice with go modules.
Restore code-generator's go.mod file and
remove vendor/k8s.io/code-generator/vendor directory
after running verification in`make verify`
A map doesn't guarantee order when we are creating new ignitions.
When we update the image CR with blocked registries, the ctrcfg
controller needs to update two files registries.conf and policy.json.
Since we get an update from the image CR about every 20 mins, we compare
the semantics to see if anything has changed. But since the order is not
guaranteed, the controller might think that the semantics is not equal
even if nothing in the data changed. Hence another MC is created, and
everytime we get an update the MC applied to the nodes keeps flipping
back and forth for the 2 possible orders causing the nodes to reboot a
bunch of times. So move to using a struct array to ensure the order is
always the same and we don't have two similar MCs being created.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
Ensure `curl` command has NSS_SDB_USE_CACHE env var set
…rry-pick-1648-to-fcos

etcdquorumguard_deployment: pass NSS_SDB_USE_CACHE=no to curl
MCO should apply etcd-quorum-guard deployment instead of CVO.
It also controls the number of replicas in this deployment: it would
scale 1 replica if CEO's useUnsupportedUnsafeNonHANonProductionUnstableEtcd
option is enabled.

This allows creating single node clusters
[FCOS] Control amount of replicas in etcd-quorum-guard deployment
They're missing a lot of things, but let's just add a reasonable Infra
object for now.
A new mode for gcp-routes allows for marking individual VIPs as down. If
that's available, use it.
OpenShift wants a pretty different version of gcp-routes.service
compared to the generic case. So we really should just control it
ourselves.

This creates a service, openshift-gcp-routes.service, that conflicts
with the gcp-routes.service from the RHCOS overlay. It also picks up
some pending improvements to it, namely downfile support.
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 20, 2020
@LorbusChris
Copy link
Contributor

/close

@openshift-ci-robot
Copy link
Contributor

@LorbusChris: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.