-
Notifications
You must be signed in to change notification settings - Fork 464
Bug 1764116: templates: rename our dropins to include the mco string #1203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@runcom: This pull request references Bugzilla bug 1764001, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: runcom The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@runcom: This pull request references Bugzilla bug 1764116, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
c2260d0 to
bf106a8
Compare
|
@cgwalters wondering why this wasn't the case till the beginning :/ |
the actual issue here is that the rendered machineconfigs can contain duplicate entries for e.g. a service or unit. If that's the case, the validate routine can validate and fail only the first entry but what we have written on disk is the second one - should we change the validate routine to always check the last entry if there's a duplicate? This PR just makes sure we can rollback but maybe the fix to validation is needed as well. This is related to the common templates fix that went in #1202 @ajeddeloh since a rendered mc can contain duplicates entries, should we validate only against the last one? right now, we're validating and bailing at the first entry, but the MCD applies the last when writing to disk. The reproducer is as simple as creating this MC: $ cat testmc.yaml
# This example MachineConfig replaces /etc/chrony.conf
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 50-examplecorp-chrony
spec:
config:
ignition:
version: 2.2.0
systemd:
units:
- name: "crio.service"
dropins:
- name: "10-default-env.conf"
contents: "#foo"The issue arises since we're already shipping 10-deafault-env.conf for the crio.service (and it's empty by default). So, the MCD applies the above, but when it reboots, it validates against the empty, first entry file but the ondisk one contains EDIT This is trickier than what I've initilly thought - the situation here is:
When we validate on disk, the first file is checked (empty in our case) but it's written by the dropin systemd |
|
(This PR isn't working as intended also, so keep holding) |
|
maybe we should completely avoid having users override what we ship? |
I believe it's a hard error to have duplicate files in Ignition spec 3. We should also indeed disallow this in the MCO I'd say. But the clear shorter term fix is to change our validation to check the last one for consistency with what we actually write. |
uhm, so then the MCC has to learn to generate rendered MCs by always using the last entry in alphabetical order? 🤔 or just error out maybe |
|
I think we should error out. |
Correct that duplicates are disallowed in spec 3.0.0+. Spec 2.x is more complicated. The last one listed "wins" but only if overwrite is true. Also they're created in the order: directories, then files, then links. This means a file can never "win" against a link. This is also fixed with spec 3. |
here the scenario is "we write a file at a dropin location" then "a dropin writes on that location again" ouch - does spec v3 disallow this? |
Erroring out seems to be the cleanest choice, imo - it's predictable and clear to a user what they need to do going forward. |
Can you clarify? Do you mean a systemd drop in or an appended config? |
yep, a systemd dropin. This is also not fully related to Ignition but rather, to how the MCD interprets the ignition config. So, in this bug case, we first write a normal file (at the dropin location) and then we overwirte it with a systemd dropin (from igniton pov). So, effectively, the both end up in the same directory causing this whole confusion and bug. |
|
Hmm yeah I think we ought to check for that in Ignition as well actually. Failing seems like the right thing to do since the user is asking for the impossible. Filed: coreos/ignition#881 |
I don't believe this is the right thing to do now as a short term hack - the reason here writing a file at a location and writing a dropin config which later writes there. Allowing the validation to pass means that any configuration already shipped (for things like crio and kubelet) can be firstly overridden and secondly skipped from validation. What do you all think? |
I think the problem here is that the user chose the same name for the dropin |
so if they do instead, I think we need to avoid rendering and communicate that, how does that sound? |
|
I'm updating this PR to move to |
|
About this PR and Bug tho, the issue was mainly the inability to roll back and I think that's the case because the pools are hitting their maxUnavailable so bumping that to 2 will reconcile the cluster, I'm verifying that, we can then get the rename in and postpone any later discussion about validation when spec 3 will be in MCO maybe (?) |
Mainly to avoid ppl to ship something which could override the MCO files. Signed-off-by: Antonio Murdaca <[email protected]>
7ef9636 to
0afa0d4
Compare
|
@runcom: This pull request references Bugzilla bug 1764116, which is valid. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: Antonio Murdaca <[email protected]>
|
/skip |
|
/retest |
|
/skip |
|
/retest |
|
@runcom: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Closed by #1715 |
Mainly to avoid ppl to ship something which could override the MCO files.
The situation can still happen, in which case, the bad MC must be deleted and in order to rollback the maxUnavailable for the pool must be increased by at least 1
Signed-off-by: Antonio Murdaca [email protected]