Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add bootCommands to cloud-init file generation #11271

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

davidumea
Copy link

@davidumea davidumea commented Oct 8, 2024

What this PR does / why we need it:
Adds the ability to provide bootcmd commands to cloud-init via the KubeadmConfig or KubeadmConfigTemplates custom resources. (Same for KubeadmControlPlane)

One thing that I know that does not work at the time of creating this PR is validating/mutating webhooks for KubeadmConfig and KubeadmConfigTemplates. Some pointers to where to change that would be appreciated.

Which issue(s) this PR fixes
Fixes #

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label labels Oct 8, 2024
@k8s-ci-robot k8s-ci-robot requested a review from elmiko October 8, 2024 14:17
@k8s-ci-robot
Copy link
Contributor

Welcome @davidumea!

It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot requested a review from enxebre October 8, 2024 14:17
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 8, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @davidumea. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member

chrischdi commented Oct 8, 2024

Are there version constraints around bootcmd? Does it already exist in cloud-init for a long time?

What happens if we render it but the cloud-init version in the machine does not support it? Does it make cloud-init fail as a whole or is it just ignored?

Edit: looks like it is already there for a long time (from the start?) at 0.7.7 https://cloudinit.readthedocs.io/en/0.7.7/topics/examples.html

@davidumea
Copy link
Author

davidumea commented Oct 9, 2024

Are there version constraints around bootcmd? Does it already exist in cloud-init for a long time?

Edit: looks like it is already there for a long time (from the start?) at 0.7.7 https://cloudinit.readthedocs.io/en/0.7.7/topics/examples.html

Exactly I don't think that should be an issue as bootcmd has been part of cloud-init for a long time.

What happens if we render it but the cloud-init version in the machine does not support it? Does it make cloud-init fail as a whole or is it just ignored?

During my testing, if the bootcmd module fails e.g. due to invalid commands, cloud-init still continues with the following modules. I haven't been able to test running this with a cloud-init version that doesn't support bootcmd.

@davidumea
Copy link
Author

@chrischdi Is there anything I can do to get this reviewed?

@fabriziopandini
Copy link
Member

/ok-to-test

cc @enxebre @vincepri @randomvariable

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 18, 2024
Copy link
Member

@randomvariable randomvariable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bootcmd module has been in cloud-init for as long as I can remember (which is back to 2010), so don't see an issue there.

I would like the documentation to be more explicit though, particularly about how this will have no effect on Ignition. Have suggested some changes.

bootstrap/kubeadm/api/v1beta1/kubeadmconfig_types.go Outdated Show resolved Hide resolved
bootstrap/kubeadm/api/v1beta1/kubeadmconfig_types.go Outdated Show resolved Hide resolved
@fabriziopandini fabriziopandini added the area/provider/bootstrap-kubeadm Issues or PRs related to CAPBK label Oct 18, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Oct 18, 2024
// BootCommands specifies extra commands to run very early in the boot process via the cloud-init bootcmd
// module. This is typically run in the cloud-init.service systemd unit. This has no effect in Ignition.
// +optional
BootCommands []BootCommand `json:"bootCommands,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be added only to v1beta1 which is the actively maintained API version?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this and run make verify, it removes bootCommands from the CRDs.

Might be an issue in the verify logic, what do you think?

diff --git a/bootstrap/kubeadm/config/crd/bases/bootstrap.cluster.x-k8s.io_kubeadmconfigs.yaml b/bootstrap/kubeadm/config/crd/bases/bootstrap.cluster.x-k8s.io_kubeadmconfigs.yaml
index 79c3b19e0..4e1852f77 100644
--- a/bootstrap/kubeadm/config/crd/bases/bootstrap.cluster.x-k8s.io_kubeadmconfigs.yaml
+++ b/bootstrap/kubeadm/config/crd/bases/bootstrap.cluster.x-k8s.io_kubeadmconfigs.yaml
@@ -47,17 +47,6 @@ spec:
               KubeadmConfigSpec defines the desired state of KubeadmConfig.
               Either ClusterConfiguration and InitConfiguration should be defined or the JoinConfiguration should be defined.
             properties:
-              bootCommands:
-                description: |-
-                  BootCommands specifies extra commands to run very early in the boot process via the cloud-init bootcmd
-                  module. This is typically run in the cloud-init.service systemd unit. This has no effect in Ignition.
-                items:
-                  description: BootCommand defines input for each bootcmd command
-                    in cloud-init.
-                  items:
-                    type: string
-                  type: array
-                type: array
               clusterConfiguration:
                 description: ClusterConfiguration along with InitConfiguration are
                   the configurations necessary for the init command

etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New fields should not be added to older API versions.
Conversions should be implemented instead.
See eg

dst.Files = restored.Files
dst.Users = restored.Users
if restored.Users != nil {
for i := range restored.Users {
if restored.Users[i].PasswdFrom != nil {
dst.Users[i].PasswdFrom = restored.Users[i].PasswdFrom
}
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted my changes to the v1alpha3 and v1alpha4 versions and added conversions instead a9c426a

Copy link
Author

@davidumea davidumea Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this and run make verify, it removes bootCommands from the CRDs.

Disregard this, I didn't realize that all versions of the crds were in the same manifest.

// BootCommands specifies extra commands to run very early in the boot process via the cloud-init bootcmd
// module. This is typically run in the cloud-init.service systemd unit. This has no effect in Ignition.
// +optional
BootCommands []BootCommand `json:"bootCommands,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New fields should not be added to older API versions.
Conversions should be implemented instead.
See eg

dst.Files = restored.Files
dst.Users = restored.Users
if restored.Users != nil {
for i := range restored.Users {
if restored.Users[i].PasswdFrom != nil {
dst.Users[i].PasswdFrom = restored.Users[i].PasswdFrom
}
}
}

@@ -34,6 +34,7 @@ func TestNewInitControlPlaneAdditionalFileEncodings(t *testing.T) {
cpinput := &ControlPlaneInput{
BaseUserData: BaseUserData{
Header: "test",
BootCommands: nil,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test for BootCommands not nil for init, join CP, join workers?

Copy link
Author

@davidumea davidumea Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added non-nil values for BootCommands in TestNewInitControlPlaneCommands and TestNewJoinControlPlaneAdditionalFileEncodings, did you want me to add more tests or is this enough? I couldn't find one for "join workers" daf4f97

Copy link
Member

@fabriziopandini fabriziopandini Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT to changes in TestNewInitControlPlaneCommands, TestNewJoinControlPlaneAdditionalFileEncodings I have dedicated comments.

We still need test coverage for join and join CP (if they are missing, kindly add them so we pay down some tech debt before adding more on top)

@@ -548,7 +548,7 @@ func TestMatchInitOrJoinConfiguration(t *testing.T) {
},
JoinConfiguration: nil,
Files: nil,
... // 10 identical fields
... // 11 identical fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to take some time to dig into impact of this change on existing clusters, because we need to make sure this change does not trigger rollouts
However, I'm not sure when I will get to it 😓

Copy link
Author

@davidumea davidumea Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's feasible to have this included in the next minor release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed this notification before.
Unfortunately code freeze is tomorrow, but let's first focus on getting the PR right first, then we can think about what options do we have

I took some time to look into this, and it should not trigger a rollout even if message in the logs could change after we deploy this change.

@chrischdi could you kindly double check this assumption

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine. Especially clusterctl upgrade tests should catch this if it happens and we are wrong.

Edit: sorry for the long round-trip, forgot to send the review / comment... (was pending)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 1, 2024
bootstrap/kubeadm/api/v1beta1/kubeadmconfig_types.go Outdated Show resolved Hide resolved
@@ -94,6 +95,9 @@ func TestNewInitControlPlaneCommands(t *testing.T) {

cpinput := &ControlPlaneInput{
BaseUserData: BaseUserData{
BootCommands: []bootstrapv1.BootCommand{
{"echo", "hello"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make this test to be useful to validate the code actually works/to prevent regressions, you should also modify test assertions down below that boot commands actually show up in the generated output (and possibly, ensuring that command show up in the right section of the generated file)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure how to ensure that the commands show up in the right section of the generated file, but I added a check that the bootcommands show up in the generated file and also that the bootcmd: string is there (which is omitted when bootCommands are unset)

a4f3bdb

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to test if the out put has the expected bootcmd and runcmd block vs testing the output has single lines.

Something like

	expectedRunCmd := `runcmd:
  - "\"echo $(date) ': hello world!'\""
  - 'kubeadm init --config /run/kubeadm/kubeadm.yaml  && echo success > /run/cluster-api/bootstrap-success.complete'
  - "echo $(date) ': hello world!'"`

	g.Expect(out).To(ContainSubstring(expectedRunCmd))

Should work; if you want to make it fancy, you can use regex, but up to you (similar for bootCmd)
Might be, to improve readability, let's also replace hello world! with something more specific like "hello preKubeadmCommands!" and "hello postKubeadmCommands!"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!

6fb468d

bootstrap/kubeadm/internal/cloudinit/cloudinit_test.go Outdated Show resolved Hide resolved
@@ -34,6 +34,7 @@ func TestNewInitControlPlaneAdditionalFileEncodings(t *testing.T) {
cpinput := &ControlPlaneInput{
BaseUserData: BaseUserData{
Header: "test",
BootCommands: nil,
Copy link
Member

@fabriziopandini fabriziopandini Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT to changes in TestNewInitControlPlaneCommands, TestNewJoinControlPlaneAdditionalFileEncodings I have dedicated comments.

We still need test coverage for join and join CP (if they are missing, kindly add them so we pay down some tech debt before adding more on top)

@@ -548,7 +548,7 @@ func TestMatchInitOrJoinConfiguration(t *testing.T) {
},
JoinConfiguration: nil,
Files: nil,
... // 10 identical fields
... // 11 identical fields
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed this notification before.
Unfortunately code freeze is tomorrow, but let's first focus on getting the PR right first, then we can think about what options do we have

I took some time to look into this, and it should not trigger a rollout even if message in the logs could change after we deploy this change.

@chrischdi could you kindly double check this assumption

internal/apis/bootstrap/kubeadm/v1alpha3/conversion.go Outdated Show resolved Hide resolved
internal/apis/bootstrap/kubeadm/v1alpha4/conversion.go Outdated Show resolved Hide resolved
@chrischdi
Copy link
Member

/test help

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-latestk8s-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-31-1-32-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@chrischdi
Copy link
Member

/test pull-cluster-api-e2e-main

@chrischdi
Copy link
Member

/test pull-cluster-api-e2e-main

@chrischdi
Copy link
Member

/retest

Looks like a flake in creating the docker container and overlapping ports

@chrischdi
Copy link
Member

/retest

Just to make sure its really not a flake

@davidumea
Copy link
Author

@fabriziopandini Can you take another look? I added some new tests and addressed the comments

@@ -0,0 +1,29 @@
/*
Copyright 2019 The Kubernetes Authors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Copyright 2019 The Kubernetes Authors.
Copyright 2025 The Kubernetes Authors.

Probably a nitpick, but it looks a bit outdated.

@@ -208,6 +209,7 @@ func (webhook *KubeadmControlPlane) ValidateUpdate(_ context.Context, oldObj, ne
{spec, kubeadmConfigSpec, joinConfiguration, "discovery"},
{spec, kubeadmConfigSpec, joinConfiguration, "discovery", "*"},
// spec.kubeadmConfigSpec
{spec, kubeadmConfigSpec, "bootCommands"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{spec, kubeadmConfigSpec, "bootCommands"},
{spec, kubeadmConfigSpec, bootCommands},

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this!

@@ -75,11 +75,20 @@ type KubeadmConfigSpec struct {
// +optional
Mounts []MountPoints `json:"mounts,omitempty"`

// preKubeadmCommands specifies extra commands to run before kubeadm runs
// bootCommands specifies extra commands to run very early in the boot process via the cloud-init bootcmd
// module. This is typically run in the cloud-init.service systemd unit. This has no effect in Ignition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no equivalent for the ignition format, it is good to prevent user confusion by introducing a validation webhook check on kubeadmConfig.spec.format == “cloud-init” for this field to be set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unfamiliar with ignition so I don't know if there is an equivalent format 😕. Where would you suggest to put this validation webhook check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap/kubeadm/internal/webhooks/kubeadmconfig.go seems to be the place for validation webhooks code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to validate that format == "clout-init" only if ignition is not used? Is that based on feature.KubeadmBootstrapFormatIgnition or is there another way of checking if ignition should be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format field on the config designates the type of configuration used. If it is cloud-init, then it is not ignition :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks, is this how you meant? b7c575a

Also, I believe the default value for the format is "cloud-config", but please let me know if I misunderstood and it should be "cloud-init".

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@anmazzotti anmazzotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me, beside the minor issue of not using the constant "cloud-config"

@@ -65,6 +65,9 @@ func (webhook *KubeadmConfig) ValidateCreate(_ context.Context, obj runtime.Obje
if !ok {
return nil, apierrors.NewBadRequest(fmt.Sprintf("expected a KubeadmConfig but got a %T", obj))
}
if c.Spec.BootCommands != nil && c.Spec.Format != "cloud-config" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May you use the constant instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@davidumea davidumea Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I have now crossed out the problem I mentioned in the initial PR description, since the webhooks now work as expected in my testing.

One thing that I know that does not work at the time of creating this PR is validating/mutating webhooks for KubeadmConfig and KubeadmConfigTemplates. Some pointers to where to change that would be appreciated.

bootstrap/kubeadm/internal/webhooks/kubeadmconfig.go Outdated Show resolved Hide resolved
bootstrap/kubeadm/internal/webhooks/kubeadmconfig_test.go Outdated Show resolved Hide resolved
@davidumea davidumea requested a review from anmazzotti January 24, 2025 14:34
@@ -65,6 +65,9 @@ func (webhook *KubeadmConfig) ValidateCreate(_ context.Context, obj runtime.Obje
if !ok {
return nil, apierrors.NewBadRequest(fmt.Sprintf("expected a KubeadmConfig but got a %T", obj))
}
if c.Spec.BootCommands != nil && c.Spec.Format != bootstrapv1.CloudConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this commands need to be a part of webhook.validate, otherwise the check looks fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to update the validateIgnition function so that allErrs wouldn't be empty, which was probably for the best, thanks

3db48fa

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the webhook check isn't needed anymore then? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/bootstrap-kubeadm Issues or PRs related to CAPBK cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants