Skip to content

Conversation

@cgwalters
Copy link
Member

Prep for using Butane APIs more directly as part of the layering
work.

(Only compile tested locally)

The `filesystem: "root"` kv pair is a remnant of the past from when
the template format was based on the Container Linux Config spec.
Nowadays, it's based on the Butane config spec (formerly known as
Fedora CoreOS Config) where this property node does not exist, and is
thus ignored when transpiling templates to Ignition config.
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 28, 2022
Comment on lines -570 to +610
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I had the patient open, I changed this to generate a single butane config, then translate it once to ignition which is way simpler than generating a tiny butane config and then continually merging ignition.

@cgwalters
Copy link
Member Author

/assign mkenigs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CoreOSConfig a subset of Butane? Could we rename this TranspileButaneToIgn? Or if there are differences maybe add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreOSConfig came from https://github.com/coreos/container-linux-config-transpiler - it's very similar/related to butane, but there is no strict subset or superset relationship.

I have now clarified that these templates are using butane specifically variant: fcos version: 1.2 which is defined to generate Ignition 3.2 which is what was output before.

(Really though, all we care about is the ability to conveniently define files inline)

@cgwalters cgwalters force-pushed the bump-butane branch 2 times, most recently from 98ca97b to 669378f Compare March 1, 2022 16:12
@cgwalters
Copy link
Member Author

Hmm, ok, still failing some unit tests; I fixed some but apparently not all. Will dig at some point but need to context switch.

@cgwalters cgwalters marked this pull request as draft March 2, 2022 15:06
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 2, 2022
@cgwalters
Copy link
Member Author

This will depend on #2980

(The issue here is that butane's ignition generation will compress, whereas prior fcct code didn't, so we need to consistently handle compression)

But even with that fixed, there's still yet another regression in TestReconcilable that I don't yet understand.

@kikisdeliveryservice
Copy link
Contributor

kikisdeliveryservice commented Mar 2, 2022

This will depend on #2980

(The issue here is that butane's ignition generation will compress, whereas prior fcct code didn't, so we need to consistently handle compression)

But even with that fixed, there's still yet another regression in TestReconcilable that I don't yet understand.

The failure there says devices required which I think comes from butane where TestReconcilable only declares old config with it:

oldIgnCfg.Storage.Raid = []ign3types.Raid{

			Name:  "data",
			Level: "stripe",
		},


So I think that config is invalid in the butane POV and needs to be updated.

@cgwalters
Copy link
Member Author

So I think that config is invalid in the butane POV and needs to be updated.

What was confusing me though is the new butane code shouldn't be involved here. But what's really happening is that butane is pulling in a new vendored ignition version, which has enhanced validation.

I've now reworked this to more cleanly split up the test fix, the bump of vendored ignition, then the switch from fcct to butane.

@cgwalters
Copy link
Member Author

/test unit
/test verify

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2022
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2022
@cgwalters
Copy link
Member Author

/test unit
/test verify

@cgwalters
Copy link
Member Author

OK yeah, e.g. this passed PR says:

ok github.com/openshift/machine-config-operator/test/e2e 5219.199s

5219/60 = ~87 - we have a 3 minute budget before the test would arbitrarily time out and fail. That's way too little.

@kikisdeliveryservice
Copy link
Contributor

OK yeah, e.g. this passed PR says:

ok github.com/openshift/machine-config-operator/test/e2e 5219.199s

5219/60 = ~87 - we have a 3 minute budget before the test would arbitrarily time out and fail. That's way too little.

is this happening in other PRs or just this one?

@cgwalters
Copy link
Member Author

cgwalters commented Mar 15, 2022

Spot checking a few others from this query

82.7 minutes
84.9 minutes

templates: Clean out filesystem properties
@mkenigs
Copy link
Contributor

mkenigs commented Mar 15, 2022

I also had a passing run on #2987 but then it hit the timeout after I made a minor tweak which shouldn't have changed anything

Enable unicast keepalived for all on-prem platforms
@cgwalters cgwalters changed the base branch from master to layering March 16, 2022 13:27
@cgwalters cgwalters marked this pull request as ready for review March 16, 2022 13:27
@cgwalters
Copy link
Member Author

/test e2e-aws
/test e2e-metal-ipi

@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 16, 2022
@cgwalters
Copy link
Member Author

OK, this one should be good to go, the CI failures look unrelated. Finally got a passed e2e-gcp-op, but...trying to decide whether I or someone needs to context switch into adding some headroom there or (much harder) parallelizing tests.

@mkenigs
Copy link
Contributor

mkenigs commented Mar 16, 2022

Will we just drop or change the cherry-pick if coreos/ignition#1329 needs changes?

@cgwalters
Copy link
Member Author

Will we just drop or change the cherry-pick if coreos/ignition#1329 needs changes?

Yeah, I think we can easily update the cherry pick. We'll just need to keep it in mind when bumping the vendored Ignition in the future but hopefully we'll have that in our collective memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I missed it but I don't see anything testing DecodeIgnitionFileContents. Would it make sense to move this test to helpers_test.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one, though this function is definitely tested indirectly in a bunch of other unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is sorting here (and below) necessary? Can you add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Done. It may not actually be necessary...I hit an issue that made me think it was, but perhaps it isn't?

But hmm, I think because we md5sum the machineconfigs today, we really have to be sure they're "canonicalized".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Handle drop-ins specially; we can only have a single entry with multiple drop-ins.
// Handle drop-ins specially; we can have a single unit with multiple drop-ins.

I'm also curious how we know there aren't other fields that would have to be merged

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other merging Ignition would have performed that the function isn't doing now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't really merging fields here, we're merging drop-ins to a single unit right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the function as a whole. Like what happens if there's two files with the same name? Will Butane handle that the same way as an Ignition merge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So...in the context of these templates, having duplicate files would generate an error (I am pretty sure).

Now in practice there are duplicates, e.g. kubelet.service but that's handled by generateMachineConfigForName().

(I think we should actually fix this and template the units and avoid duplicating them in the code, because it's caused real problems in the past and likely will continue to do so)

But for now, there shouldn't be any duplicates when we reach here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth commenting that as a function precondition or TODO?

To properly handle compression.  Prep for using butane.
Prep for using Butane APIs more directly as part of the layering
work.

The logic is also a bit reworked to generate a single Butane fragment
which we convert to Ignition in one go, instead of converting
butane into ignition repeatedly and using config merging.

There's only one wrinkle with doing this, which is that today in
the templates we have multiple files which are drop-ins for `crio.service`;
and we need to group those together.

(I think it would be cleaner to have them in a single file in the
 templates, but for clarity let's handle this)
We were testing this indirectly.
@mkenigs
Copy link
Contributor

mkenigs commented Mar 16, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 16, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, mkenigs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mkenigs
Copy link
Contributor

mkenigs commented Mar 16, 2022

Why does this have the templates and unicast commits?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 16, 2022

@cgwalters: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-workers-rhel7 669378f0894ec397f8570ae81e9397fac951c5dd link false /test e2e-aws-workers-rhel7
ci/prow/okd-e2e-aws 1690124785bf3577f2088922645d6f9db2539821 link false /test okd-e2e-aws
ci/prow/e2e-metal-ipi 1690124785bf3577f2088922645d6f9db2539821 link false /test e2e-metal-ipi
ci/prow/e2e-agnostic-upgrade 1690124785bf3577f2088922645d6f9db2539821 link true /test e2e-agnostic-upgrade
ci/prow/e2e-vsphere-upgrade 1690124785bf3577f2088922645d6f9db2539821 link false /test e2e-vsphere-upgrade
ci/prow/e2e-aws-disruptive 1690124785bf3577f2088922645d6f9db2539821 link false /test e2e-aws-disruptive
ci/prow/e2e-aws-upgrade-single-node 1690124785bf3577f2088922645d6f9db2539821 link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-aws-single-node 1690124785bf3577f2088922645d6f9db2539821 link false /test e2e-aws-single-node
ci/prow/e2e-gcp-single-node 096e6ff link false /test e2e-gcp-single-node
ci/prow/e2e-gcp-op-single-node 096e6ff link false /test e2e-gcp-op-single-node

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 4616b1a into openshift:layering Mar 16, 2022
@bgilbert
Copy link
Contributor

Note that this vendors coreos/ignition#1329 which was rejected upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. layering lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants