Skip to content

Conversation

@umohnani8
Copy link
Contributor

A resync happens about every 20 minutes, which sends an updated
event to the image informer even if nothing in the image CR has
changed. Adding and extra filter that checks if there has been
any changes to the registries part of the image CR before syncing
the image handler again.

Helps fix #453 where changes to the image config will not be applied unless and actual change happened in the CR.

Signed-off-by: Urvashi Mohnani [email protected]

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 19, 2019
@umohnani8
Copy link
Contributor Author

@umohnani8
Copy link
Contributor Author

/test e2e-aws
/test e2e-aws-op
/test images

@mrunalp
Copy link
Member

mrunalp commented Feb 19, 2019

/test e2e-aws

@mrunalp
Copy link
Member

mrunalp commented Feb 19, 2019

/test e2e-aws-op

@mrunalp
Copy link
Member

mrunalp commented Feb 19, 2019

/test images

@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented Feb 19, 2019

FTR: Lots of CI flakes with 504 gateway timeouts hence the retests. Also these seem to be known flakes being worked on. See https://github.com/openshift/release/issues/2905

Taking a look/testing locally.

@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented Feb 19, 2019

@umohnani8 : is there something I should be testing for other than watching the logs? If so, lmk!

Given the below comment, disregard

@runcom
Copy link
Member

runcom commented Feb 19, 2019

/hold

discussed offline, we'll need something lighter and more state oriented

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 19, 2019
@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 20, 2019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, dropping.

@umohnani8
Copy link
Contributor Author

@runcom updated code as discussed on slack.
@kikisdeliveryservice yeah you shouldn't see "Applied Image..." in the logs unless you actually go update the image cr (oc edit image.config.openshift.io cluster)

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 20, 2019
@runcom
Copy link
Member

runcom commented Feb 20, 2019

/hold cancel
/approve

/assign kikisdeliveryservice

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Feb 20, 2019
@kikisdeliveryservice
Copy link
Contributor

hi @umohnani8 since the docs are pretty thin on using this feature and im less familiar with it, can you please add step by step instructions on how to test this?

@umohnani8
Copy link
Contributor Author

@kikisdeliveryservice this is what an example image CR looks like with insecure and blocked registries in it

apiVersion: config.openshift.io/v1
kind: Image
metadata:
  creationTimestamp: 2019-02-19T16:01:32Z
  generation: 6
  name: cluster
  resourceVersion: "282342"
  selfLink: /apis/config.openshift.io/v1/images/cluster
  uid: 9ff69432-345f-11e9-8ea5-0a4bee075008
spec:
  additionalTrustedCA:
    name: ""
  registrySources:
    insecureRegistries:
    - blah.io
    blockedRegistries:
    - test.io
status:
  internalRegistryHostname: image-registry.openshift-image-registry.svc:5000

You can add as many or as little you want there and should see the changes on all the nodes once the controller does its magic :)
The CR usually exists already in a new cluster (so I have noticed with all my clusters) so all you need to do is oc edit image.config.openshift.io cluster. Since it is a cluster wide config, we only care for the CR called "cluster".

@umohnani8
Copy link
Contributor Author

Looks like the test flakes being fixed in #457
/test unit

@kikisdeliveryservice
Copy link
Contributor

thanks @umohnani8 :) will try this out.

@kikisdeliveryservice
Copy link
Contributor

/assign kikisdeliveryservice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this applied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

A resync happens about every 20 minutes, which sends an updated
event to the image informer even if nothing in the image CR has
changed. Adding and extra filter that checks if there has been
any changes to the registries part of the image CR before syncing
the image handler again.

Signed-off-by: Urvashi Mohnani <[email protected]>
@kikisdeliveryservice
Copy link
Contributor

Testing results:

  1. in a cluster running for a while, no more erroneous logging/syncing
  2. editing oc edit image.config.openshift.io cluster and adding insecure and secure registries:
I0220 00:47:17.343506       1 container_runtime_config_controller.go:600] Applied ImageConfig cluster on MachineConfigPool worker
I0220 00:47:18.649066       1 container_runtime_config_controller.go:600] Applied ImageConfig cluster on MachineConfigPool master
I0220 00:47:22.941391       1 render_controller.go:456] Generated machineconfig worker-cac98dc94bdf965eb43a5bddfb46eb6e from 5 configs: [{MachineConfig  00-worker  machineconfiguration.openshift.io/v1  } {MachineConfig  00-worker-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-9446842e-33cd-11e9-8221-02427fa2e0b0-registries  machineconfiguration.openshift.io/v1  }]
I0220 00:47:24.241613       1 render_controller.go:456] Generated machineconfig master-3745275ffcaf2c65dec35fc28e164976 from 5 configs: [{MachineConfig  00-master  machineconfiguration.openshift.io/v1  } {MachineConfig  00-master-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  01-master-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-master-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-master-94454f94-33cd-11e9-8221-02427fa2e0b0-registries  machineconfiguration.openshift.io/v1  }]
I0220 00:47:28.041363       1 node_controller.go:432] Setting node ip-10-0-138-64.us-west-2.compute.internal to desired config worker-cac98dc94bdf965eb43a5bddfb46eb6e
I0220 00:47:29.441259       1 node_controller.go:432] Setting node ip-10-0-150-226.us-west-2.compute.internal to desired config master-3745275ffcaf2c65dec35fc28e164976
I0220 00:48:46.870972       1 node_controller.go:432] Setting node ip-10-0-160-154.us-west-2.compute.internal to desired config master-3745275ffcaf2c65dec35fc28e164976

Checking MCD:

I0220 00:52:50.479085    4777 update.go:137] Checking reconcilable for config worker-f2f49ab8040cc5907aa3d81afa0be316 to worker-cac98dc94bdf965eb43a5bddfb46eb6e
...
I0220 00:52:51.972143    4777 update.go:663] machine-config-daemon initiating reboot: Node will reboot into config worker-cac98dc94bdf965eb43a5bddfb46eb6e

new config was successfully applied and daemons rebooted.

@kikisdeliveryservice
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 20, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, runcom, umohnani8

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kikisdeliveryservice
Copy link
Contributor

Flakes: The bootstrap user should successfully login with password decoded from kubeadmin secret & cluster network timeouts.

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@runcom
Copy link
Member

runcom commented Feb 20, 2019

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@runcom
Copy link
Member

runcom commented Feb 20, 2019

CI errors tracked in slack

/retest

@openshift-merge-robot openshift-merge-robot merged commit 10520e9 into openshift:master Feb 20, 2019
runcom added a commit to runcom/machine-config-operator that referenced this pull request Feb 24, 2019
The CRC (container runtime config) controller recently added a check to
avoid resyncing and recreating the very same registries config if
nothing has changed on the image crd side [1]. While that's correct,
during an upgrade, the controllers need to generate the MC fragments
using the controller version they're at. Since we weren't checking the
versions of the controller that generated the registries config, we
wrongly assumed the configurations were equal and never generated a new
one with the new controller.
This patch fixes that by adding a version check before skipping a
regeneration on equal content in the registries configs.
Fixes: openshift#487

[1] openshift#461

Signed-off-by: Antonio Murdaca <[email protected]>
@umohnani8 umohnani8 mentioned this pull request Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Applied ImageConfig cluster on MachineConfigPool" logs too frequently?

7 participants