Bug 1764116: templates: rename our dropins to include the mco string #1203

runcom · 2019-10-22T09:34:21Z

Mainly to avoid ppl to ship something which could override the MCO files.

The situation can still happen, in which case, the bad MC must be deleted and in order to rollback the maxUnavailable for the pool must be increased by at least 1

Signed-off-by: Antonio Murdaca [email protected]

openshift-ci-robot · 2019-10-22T09:34:25Z

@runcom: This pull request references Bugzilla bug 1764001, which is invalid:

expected the bug to target the "4.3.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1764001: pkg/daemon: validate on-disk when in desired config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2019-10-22T09:34:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2019-10-22T09:35:51Z

@runcom: This pull request references Bugzilla bug 1764116, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 1764116: pkg/daemon: validate on-disk when in desired config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

runcom · 2019-10-22T09:48:56Z

@cgwalters wondering why this wasn't the case till the beginning :/

runcom · 2019-10-22T09:52:13Z

@cgwalters wondering why this wasn't the case till the beginning :/

the actual issue here is that the rendered machineconfigs can contain duplicate entries for e.g. a service or unit. If that's the case, the validate routine can validate and fail only the first entry but what we have written on disk is the second one - should we change the validate routine to always check the last entry if there's a duplicate? This PR just makes sure we can rollback but maybe the fix to validation is needed as well. This is related to the common templates fix that went in #1202

@ajeddeloh since a rendered mc can contain duplicates entries, should we validate only against the last one? right now, we're validating and bailing at the first entry, but the MCD applies the last when writing to disk.

The reproducer is as simple as creating this MC:

$ cat testmc.yaml
# This example MachineConfig replaces /etc/chrony.conf
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 50-examplecorp-chrony
spec:
  config:
    ignition:
      version: 2.2.0
    systemd:
      units:
      - name: "crio.service"
        dropins:
        - name: "10-default-env.conf"
          contents: "#foo"

The issue arises since we're already shipping 10-deafault-env.conf for the crio.service (and it's empty by default). So, the MCD applies the above, but when it reboots, it validates against the empty, first entry file but the ondisk one contains #foo now.

EDIT

This is trickier than what I've initilly thought - the situation here is:

we ship a dropin file directly as a file
we create a dropin systemd unit

When we validate on disk, the first file is checked (empty in our case) but it's written by the dropin systemd

runcom · 2019-10-22T11:43:37Z

(This PR isn't working as intended also, so keep holding)

runcom · 2019-10-22T11:45:33Z

maybe we should completely avoid having users override what we ship?

cgwalters · 2019-10-22T13:35:48Z

@ajeddeloh since a rendered mc can contain duplicates entries, should we validate only against the last one? right now, we're validating and bailing at the first entry, but the MCD applies the last when writing to disk.

I believe it's a hard error to have duplicate files in Ignition spec 3. We should also indeed disallow this in the MCO I'd say.

But the clear shorter term fix is to change our validation to check the last one for consistency with what we actually write.

runcom · 2019-10-22T13:39:14Z

I believe it's a hard error to have duplicate files in Ignition spec 3. We should also indeed disallow this in the MCO I'd say.

uhm, so then the MCC has to learn to generate rendered MCs by always using the last entry in alphabetical order? 🤔 or just error out maybe

cgwalters · 2019-10-22T20:25:10Z

I think we should error out.

ajeddeloh · 2019-10-22T20:37:44Z

@ajeddeloh since a rendered mc can contain duplicates entries, should we validate only against the last one? right now, we're validating and bailing at the first entry, but the MCD applies the last when writing to disk.

I believe it's a hard error to have duplicate files in Ignition spec 3. We should also indeed disallow this in the MCO I'd say.

Correct that duplicates are disallowed in spec 3.0.0+. Spec 2.x is more complicated. The last one listed "wins" but only if overwrite is true. Also they're created in the order: directories, then files, then links. This means a file can never "win" against a link. This is also fixed with spec 3.

runcom · 2019-10-22T21:15:58Z

The last one listed "wins" but only if overwrite is true

here the scenario is "we write a file at a dropin location" then "a dropin writes on that location again" ouch - does spec v3 disallow this?

kikisdeliveryservice · 2019-10-22T23:12:39Z

I think we should error out.

Erroring out seems to be the cleanest choice, imo - it's predictable and clear to a user what they need to do going forward.

ajeddeloh · 2019-10-23T17:52:45Z

here the scenario is "we write a file at a dropin location" then "a dropin writes on that location again" ouch - does spec v3 disallow this?

Can you clarify? Do you mean a systemd drop in or an appended config?

runcom · 2019-10-23T18:51:33Z

Can you clarify? Do you mean a systemd drop in or an appended config?

yep, a systemd dropin. This is also not fully related to Ignition but rather, to how the MCD interprets the ignition config. So, in this bug case, we first write a normal file (at the dropin location) and then we overwirte it with a systemd dropin (from igniton pov). So, effectively, the both end up in the same directory causing this whole confusion and bug.

ajeddeloh · 2019-10-23T21:57:44Z

Hmm yeah I think we ought to check for that in Ignition as well actually. Failing seems like the right thing to do since the user is asking for the impossible.

Filed: coreos/ignition#881

runcom · 2019-10-25T12:29:53Z

But the clear shorter term fix is to change our validation to check the last one for consistency with what we actually write.

I don't believe this is the right thing to do now as a short term hack - the reason here writing a file at a location and writing a dropin config which later writes there. Allowing the validation to pass means that any configuration already shipped (for things like crio and kubelet) can be firstly overridden and secondly skipped from validation.
I believe this a broader issue also, how does someone ship a crio dropin? should we allow that since crio is controlled by the CRC crd? hence it makes sense to just error out when someone provides a configuration that would overwrite anything specified before.

What do you all think?

cgwalters · 2019-10-25T12:59:01Z

I believe this a broader issue also, how does someone ship a crio dropin?

I think the problem here is that the user chose the same name for the dropin 10-default-env.conf as we did. Perhaps it'd be clearer if we named our drop-ins like 10-mco-default-env.conf and hopefully users would not include -mco in their names.

runcom · 2019-10-25T13:46:28Z

and hopefully users would not include -mco in their names.

so if they do instead, I think we need to avoid rendering and communicate that, how does that sound?

runcom · 2019-10-25T13:50:42Z

I'm updating this PR to move to -mco- for our dropins meanwhile - still doesn't solve the issue tho :(

runcom · 2019-10-25T14:08:26Z

About this PR and Bug tho, the issue was mainly the inability to roll back and I think that's the case because the pools are hitting their maxUnavailable so bumping that to 2 will reconcile the cluster, I'm verifying that, we can then get the rename in and postpone any later discussion about validation when spec 3 will be in MCO maybe (?)

Mainly to avoid ppl to ship something which could override the MCO files. Signed-off-by: Antonio Murdaca <[email protected]>

openshift-ci-robot · 2019-10-25T15:22:01Z

@runcom: This pull request references Bugzilla bug 1764116, which is valid.

Details

In response to this:

Bug 1764116: templates: rename our dropins to include the mco string

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: Antonio Murdaca <[email protected]>

kikisdeliveryservice · 2019-11-07T00:06:29Z

/skip

kikisdeliveryservice · 2019-11-07T00:06:35Z

/retest

kikisdeliveryservice · 2019-11-27T20:11:42Z

/skip

ashcrow · 2019-12-09T15:46:26Z

/retest

openshift-ci-robot · 2019-12-09T17:09:33Z

@runcom: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-scaleup-rhel7	`fef2281`	link	`/test e2e-aws-scaleup-rhel7`
ci/prow/e2e-vsphere	`fef2281`	link	`/test e2e-vsphere`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

runcom · 2020-05-06T09:02:49Z

Closed by #1715

runcom added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 22, 2019

openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Oct 22, 2019

openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 22, 2019

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2019

openshift-ci-robot requested review from cgwalters and kikisdeliveryservice October 22, 2019 09:35

runcom changed the title ~~Bug 1764001: pkg/daemon: validate on-disk when in desired config~~ Bug 1764116: pkg/daemon: validate on-disk when in desired config Oct 22, 2019

openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Oct 22, 2019

runcom force-pushed the validate-indesired branch from c2260d0 to bf106a8 Compare October 22, 2019 09:45

williamcaban mentioned this pull request Oct 22, 2019

automatically reconcile early config drift #726

Closed

templates: rename our dropins to include the mco string

0afa0d4

Mainly to avoid ppl to ship something which could override the MCO files. Signed-off-by: Antonio Murdaca <[email protected]>

runcom force-pushed the validate-indesired branch from 7ef9636 to 0afa0d4 Compare October 25, 2019 15:20

openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 25, 2019

runcom changed the title ~~Bug 1764116: pkg/daemon: validate on-disk when in desired config~~ Bug 1764116: templates: rename our dropins to include the mco string Oct 25, 2019

pkg/daemon: rollback dropins

fef2281

Signed-off-by: Antonio Murdaca <[email protected]>

openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 25, 2019

runcom mentioned this pull request May 6, 2020

Bug 1764001: pkg/daemon: rollback dropins and units #1715

Merged

runcom closed this May 6, 2020

marmijo mentioned this pull request Sep 1, 2022

config/types: prevent files/links/dirs from conflicting with systemd units or dropins coreos/ignition#1456

Merged

Bug 1764116: templates: rename our dropins to include the mco string #1203

Bug 1764116: templates: rename our dropins to include the mco string #1203

Uh oh!

Conversation

runcom commented Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Oct 22, 2019

Uh oh!

openshift-ci-robot commented Oct 22, 2019

Uh oh!

openshift-ci-robot commented Oct 22, 2019

Uh oh!

runcom commented Oct 22, 2019

Uh oh!

runcom commented Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

runcom commented Oct 22, 2019

Uh oh!

runcom commented Oct 22, 2019

Uh oh!

cgwalters commented Oct 22, 2019

Uh oh!

runcom commented Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cgwalters commented Oct 22, 2019

Uh oh!

ajeddeloh commented Oct 22, 2019

Uh oh!

runcom commented Oct 22, 2019

Uh oh!

kikisdeliveryservice commented Oct 22, 2019

Uh oh!

ajeddeloh commented Oct 23, 2019

Uh oh!

runcom commented Oct 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajeddeloh commented Oct 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

runcom commented Oct 25, 2019

Uh oh!

cgwalters commented Oct 25, 2019

Uh oh!

runcom commented Oct 25, 2019

Uh oh!

runcom commented Oct 25, 2019

Uh oh!

runcom commented Oct 25, 2019

Uh oh!

openshift-ci-robot commented Oct 25, 2019

Uh oh!

kikisdeliveryservice commented Nov 7, 2019

Uh oh!

kikisdeliveryservice commented Nov 7, 2019

Uh oh!

kikisdeliveryservice commented Nov 27, 2019

Uh oh!

ashcrow commented Dec 9, 2019

Uh oh!

openshift-ci-robot commented Dec 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

runcom commented May 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

runcom commented Oct 22, 2019 •

edited

Loading

runcom commented Oct 22, 2019 •

edited

Loading

runcom commented Oct 22, 2019 •

edited

Loading

runcom commented Oct 23, 2019 •

edited

Loading

ajeddeloh commented Oct 23, 2019 •

edited

Loading

openshift-ci-robot commented Dec 9, 2019 •

edited

Loading