Skip to content

Conversation

@cheesesashimi
Copy link
Member

@cheesesashimi cheesesashimi commented Oct 7, 2021

This enables the MCD to check if on-disk configuration has drifted from the currently applied MachineConfig. This work is being tracked in https://issues.redhat.com/browse/MCO-69

- What I did
I added a goroutine to the MCD to run a Config Drift Monitor. Under the hood, the Config Drift Monitor uses fsnotify to wire up handlers for all of the directories referenced in a given MachineConfig. If a write event is detected for any file (or files relating to a Systemd unit) defined in a MachineConfig, the Config Drift Monitor will run the validateOnDiskState function. If the on-disk configuration has drifted from what is specified in the MachineConfig, the node will be marked Degraded. Additionally, I added a validateOnDiskState check to the syncNode routine so that if the config has drifted prior to an update, it will prevent the cluster from getting into an inconsistent state.

Care was taken to ensure that the Config Drift Monitor only runs whenever no updates are occurring. When the MCD is booting, updates may still be pending. To avoid spurious config drift errors (since technically, the config will "drift" from one config to another), the Config Drift Monitor will only run when the MCD has finished booting. Additionally, prior to applying an update, the Config Drift Monitor is shut down. Again, this is done because the config will "drift" from one config to another while the update is being applied. If no reboot is required, the Config Drift Monitor will be started once the update is complete.

- How to verify it

  1. Within a cluster, create a new MachineConfig, e.g.:
---
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: etc-file
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:,hello%20world%0A
        mode: 420
        path: /etc/etc-file
  1. Verify that the Config Drift Monitor has started by reviewing the MCD logs:
I1203 15:28:31.366342    1883 daemon.go:647] Starting MachineConfigDaemon
I1203 15:28:31.366426    1883 daemon.go:654] Enabling Kubelet Healthz Monitor
I1203 15:28:38.461070    1883 daemon.go:1343] Updating Node ip-10-0-132-121.ec2.internal
I1203 15:28:38.461091    1883 daemon.go:1343] Updating Node ip-10-0-132-86.ec2.internal
I1203 15:28:38.461111    1883 daemon.go:1343] Updating Node ip-10-0-150-37.ec2.internal
I1203 15:28:38.461116    1883 daemon.go:1343] Updating Node ip-10-0-151-158.ec2.internal
I1203 15:28:38.461121    1883 daemon.go:1343] Updating Node ip-10-0-164-151.ec2.internal
I1203 15:28:38.461125    1883 daemon.go:1343] Updating Node ip-10-0-170-92.ec2.internal
I1203 15:28:39.462097    1883 daemon.go:402] Started syncing node "ip-10-0-132-121.ec2.internal" (2021-12-03 15:28:39.462086333 +0000 UTC m=+9.130821879)
I1203 15:28:39.462153    1883 daemon.go:404] Finished syncing node "ip-10-0-132-121.ec2.internal" (63.876µs)
I1203 15:28:39.462162    1883 daemon.go:402] Started syncing node "ip-10-0-132-86.ec2.internal" (2021-12-03 15:28:39.462160974 +0000 UTC m=+9.130896512)
I1203 15:28:39.462215    1883 daemon.go:394] Node ip-10-0-132-86.ec2.internal is not labeled node-role.kubernetes.io/master
I1203 15:28:39.468002    1883 daemon.go:910] Current config: rendered-infra-e40714a84270cd41458ddf52dd8c465a
I1203 15:28:39.468060    1883 daemon.go:911] Desired config: rendered-infra-fcc7352d9598d0d4441fdd00aac05739
I1203 15:28:39.474917    1883 update.go:1994] Disk currentConfig rendered-infra-fcc7352d9598d0d4441fdd00aac05739 overrides node's currentConfig annotation rendered-infra-e40714a84270cd41458ddf52dd8c465a
I1203 15:28:39.477776    1883 daemon.go:1193] Validating against pending config rendered-infra-fcc7352d9598d0d4441fdd00aac05739
I1203 15:28:39.495045    1883 daemon.go:1211] Validated on-disk state
I1203 15:28:39.495129    1883 event.go:282] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-132-86.ec2.internal", UID:"b9afb880-9b75-45ef-b454-a89eb37b6d6f", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeDone' Setting node ip-10-0-132-86.ec2.internal, currentConfig rendered-infra-fcc7352d9598d0d4441fdd00aac05739 to Done
I1203 15:28:39.508451    1883 daemon.go:1343] Updating Node ip-10-0-132-86.ec2.internal
I1203 15:28:39.509827    1883 daemon.go:1262] Completing pending config rendered-infra-fcc7352d9598d0d4441fdd00aac05739
I1203 15:28:39.509839    1883 drain.go:44] Initiating uncordon on node (currently schedulable: false)
I1203 15:28:39.523698    1883 drain.go:62] RunCordonOrUncordon() succeeded but node is still not in uncordon state, retrying
I1203 15:28:39.524490    1883 daemon.go:1343] Updating Node ip-10-0-132-86.ec2.internal
I1203 15:28:39.558790    1883 daemon.go:1343] Updating Node ip-10-0-132-86.ec2.internal
I1203 15:28:49.523812    1883 drain.go:44] Initiating uncordon on node (currently schedulable: true)
I1203 15:28:49.523838    1883 drain.go:66] uncordon succeeded on node (currently schedulable: true)
I1203 15:28:49.523846    1883 update.go:1994] Update completed for config rendered-infra-fcc7352d9598d0d4441fdd00aac05739 and node has been successfully uncordoned
I1203 15:28:49.526114    1883 daemon.go:1278] In desired config rendered-infra-fcc7352d9598d0d4441fdd00aac05739
I1203 15:28:49.536226    1883 config_drift_monitor.go:240] Config Drift Monitor started
  1. Once the new MachineConfig has been applied to all pool nodes, modify the contents on-disk of this file:
$ oc debug node/<node-name> -- printf "not-the-data" > /host/etc/etc-file
  1. Check the MCD logs for the node this targeted. You should see something that resembles:
E1203 15:40:28.247846    1883 on_disk_validation.go:221] content mismatch for file "/etc/etc-file" (-want +got):
  bytes.Join({
- 	"hello world\n",
+ 	"not-the-data",
  }, "")
E1203 15:40:28.247952    1883 daemon.go:733] content mismatch for file "/etc/etc-file"
E1203 15:40:28.247984    1883 writer.go:135] Marking Degraded due to: content mismatch for file "/etc/etc-file"
I1203 15:40:28.248939    1883 event.go:282] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-132-86.ec2.internal", UID:"b9afb880-9b75-45ef-b454-a89eb37b6d6f", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'ConfigDriftDetected' content mismatch for file "/etc/etc-file"
I1203 15:40:28.272864    1883 daemon.go:1343] Updating Node ip-10-0-132-86.ec2.internal
I1203 15:40:29.273087    1883 daemon.go:402] Started syncing node "ip-10-0-132-86.ec2.internal" (2021-12-03 15:40:29.27307442 +0000 UTC m=+718.941809966)
E1203 15:40:29.286249    1883 on_disk_validation.go:221] content mismatch for file "/etc/etc-file" (-want +got):
  bytes.Join({
- 	"hello world\n",
+ 	"not-the-data",
  }, "")
E1203 15:40:29.286289    1883 daemon.go:506] Preflight config drift check failed: content mismatch for file "/etc/etc-file"
I1203 15:40:29.286296    1883 daemon.go:404] Finished syncing node "ip-10-0-132-86.ec2.internal" (13.220356ms)

To recover, either change the file contents back or oc debug node/<node-name> -- touch /host/run/machine-config-daemon-force. The latter will force the MCD to apply the update and will cause a reboot which may or may not be desirable.

- Description for the changelog
Proactively detect on-disk config drift

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2021

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 7, 2021
@cheesesashimi
Copy link
Member Author

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 12, 2021

@cheesesashimi: The following commands are available to trigger required jobs:

  • /test cluster-bootimages
  • /test e2e-agnostic-upgrade
  • /test e2e-aws
  • /test e2e-gcp-op
  • /test images
  • /test unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test e2e-aws-disruptive
  • /test e2e-aws-proxy
  • /test e2e-aws-serial
  • /test e2e-aws-single-node
  • /test e2e-aws-techpreview-featuregate
  • /test e2e-aws-upgrade
  • /test e2e-aws-upgrade-single-node
  • /test e2e-aws-workers-rhel7
  • /test e2e-aws-workers-rhel8
  • /test e2e-azure
  • /test e2e-azure-upgrade
  • /test e2e-gcp-op-single-node
  • /test e2e-gcp-single-node
  • /test e2e-gcp-upgrade
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-ipv6
  • /test e2e-openstack
  • /test e2e-openstack-parallel
  • /test e2e-ovirt
  • /test e2e-ovirt-upgrade
  • /test e2e-ovn-step-registry
  • /test e2e-vsphere
  • /test e2e-vsphere-upgrade
  • /test e2e-vsphere-upi
  • /test okd-e2e-aws
  • /test okd-e2e-gcp-op
  • /test okd-e2e-upgrade
  • /test okd-e2e-vsphere
  • /test okd-images

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade
  • pull-ci-openshift-machine-config-operator-master-e2e-aws
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-disruptive
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-serial
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-techpreview-featuregate
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-upgrade-single-node
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-workers-rhel7
  • pull-ci-openshift-machine-config-operator-master-e2e-aws-workers-rhel8
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
  • pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-single-node
  • pull-ci-openshift-machine-config-operator-master-e2e-metal-ipi
  • pull-ci-openshift-machine-config-operator-master-e2e-ovn-step-registry
  • pull-ci-openshift-machine-config-operator-master-e2e-vsphere-upgrade
  • pull-ci-openshift-machine-config-operator-master-images
  • pull-ci-openshift-machine-config-operator-master-okd-e2e-aws
  • pull-ci-openshift-machine-config-operator-master-okd-images
  • pull-ci-openshift-machine-config-operator-master-unit
  • pull-ci-openshift-machine-config-operator-master-verify
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cheesesashimi
Copy link
Member Author

/test e2e-aws

@cheesesashimi
Copy link
Member Author

/test e2e-gcp-op

@cheesesashimi
Copy link
Member Author

There are still some TODOs:

  • General code cleanup and deduplication in pkg/daemon/daemon.go.
  • Better error handling for pkg/daemon/daemon.go. Specifically, how we wrap validateOnDiskState errors and detect them. Also, figure out a better error message for the cases when this does occur.
  • Wire up the poll period in pkg/daemon/daemon.go so that it can be artificially lowered for testing purposes.
  • Emit events using the dn.recorder object. TBD: Name(s) of events.

@cheesesashimi
Copy link
Member Author

/test e2e-gcp-op
/test e2e-gcp-op-single-node

@cheesesashimi cheesesashimi force-pushed the zzlotnik/periodically-check-config-drift branch from cf98999 to ddb275c Compare October 14, 2021 21:42
@cheesesashimi
Copy link
Member Author

/test e2e-gcp-op
/test e2e-gcp-op-single-node

2 similar comments
@cheesesashimi
Copy link
Member Author

/test e2e-gcp-op
/test e2e-gcp-op-single-node

@cheesesashimi
Copy link
Member Author

/test e2e-gcp-op
/test e2e-gcp-op-single-node

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think the direction looks good, although I have a few concerns, namely on what we do in the non-booting sync loops and how we handle "no-change forcefile updates". More details in comments below, and please do let me know if I am misunderstanding any of this, I am not sure if I have the details correct

@sinnykumari
Copy link
Contributor

Overall logic looks fine, although I feel that there is potential to reuse existing code. Not replying to inline as there are already lot of inline comments and it can get confusing to understand. Trying to summarize here my chain of thought to keep things a bit simple and reusing already existing code wherever possible (helps to minimize new regression) :

This should be ideally sufficient unless I missed any corner case. We don't need mutex because existing checkStateOnFirstRun() -> validateOnDiskState() is called only when dn.booting is set to true and that happens only when MCD pod is restarted or created. Once validateOnDiskState() finishes, it sets dn.booting to false

@cheesesashimi
Copy link
Member Author

Here's where I am:

  • I reverted validateOnDiskState back to it's original scope of just validating the on-disk state with the addition of the diskValidatorLock. It will not emit any events on success or failure. That responsibility will fall on the caller.
  • I'm decoupling the forcefile detection logic from validateOnDiskState to make what's happening more explicit. That means no more maybeValidateOnDiskState.
  • I will keep the forceFileExists function because I think it's a bit cleaner and easier to read though I'm open to arguments against.
  • To be more explicit, I'm creating a function called checkForConfigDrift which runs validateOnDiskState against the current on-disk config and will only emit an event when it finds an error. However, it will log whenever it runs.
  • I've also added a runPeriodicConfigDriftCheck which executes the aforementioned function but emits events upon error with the specific context of being run periodically as opposed to on-demand.

To respond to @sinnykumari's comments:

  • Using dn.booting would only prevent the periodic check from running when the MCD is still booting. It won't prevent the periodic check from running while an update is in progress. If the MCD is performing an update and the periodic check runs, it could cause unexpected behavior.
  • I'm not sure if the additional logic found in https://github.com/openshift/machine-config-operator/blob/master/pkg/daemon/daemon.go#L1071-L1104 is applicable for the context of checking for drift relative to the current on-disk configuration in both the periodic case as well as prior to applying an update.

@kikisdeliveryservice
Copy link
Contributor

Just as a note, when this is closer to ready, please rebase and get this down to ~3-4 commits.

Can you pull this out of draft (leaving WIP/hold), feels like we can have ci run at least to get it running your tests. :)

@cheesesashimi cheesesashimi marked this pull request as ready for review November 2, 2021 22:07
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2021
@cheesesashimi
Copy link
Member Author

cheesesashimi commented Nov 2, 2021

Update:

  • I adopted pretty much all of the changes @kikisdeliveryservice requested (thanks BTW!)
  • I also pulled this PR out of draft so all the tests can run.
  • I realized that I introduced an import cycle, so I fixed that. However, I'm not completely happy with my fix, so I'm completely open to feedback and suggestions.

TODO:

  • Get the import cycle fix into a state I and others are happy with.
  • Address lint / test failures.
  • Squash commits.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

20 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 11, 2021

@cheesesashimi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-single-node 930cbe0 link false /test e2e-gcp-op-single-node
ci/prow/e2e-aws-disruptive 930cbe0 link false /test e2e-aws-disruptive
ci/prow/e2e-vsphere-upgrade 930cbe0 link false /test e2e-vsphere-upgrade
ci/prow/e2e-aws-single-node 930cbe0 link false /test e2e-aws-single-node
ci/prow/e2e-aws-upgrade-single-node 930cbe0 link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-aws-workers-rhel8 930cbe0 link false /test e2e-aws-workers-rhel8

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 4b98289 into openshift:master Dec 11, 2021
@cheesesashimi cheesesashimi deleted the zzlotnik/periodically-check-config-drift branch December 13, 2021 20:30
cheesesashimi added a commit to cheesesashimi/machine-config-operator that referenced this pull request Dec 15, 2021
The Config Drift Monitor
(openshift#2795) was
previously unaware of compressed files. What would happen is the MCD
would unzip a compressed file payload and write that to disk. However,
the Config Drift Monitor was unaware that the file was compressed, so it
was comparing the compressed contents of the MachineConfig against the
uncompressed contents that were written to disk. Because of that, the
Config Drift Monitor would erroneously degrade the node / MCP.

Fixes: #2032565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants