Skip to content

Conversation

faiq
Copy link
Collaborator

@faiq faiq commented Sep 22, 2025

What type of PR is this?
/kind feature

Adds the new nodeadm type to provision EKS nodes that use nodeadm -- mostly used by AL2023 nodes. See KEP for more details and motivations.

The controller code largely resembles the existing code for EKSConfig reconciliation.

New unit tests and e2e tests were added to check logic. As well as a new e2e test to exercise upgrading.

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:


@faiq faiq force-pushed the faiq/adds-new-nodeadm-bootstrap-type branch from cb41c5b to 51ce7b1 Compare September 22, 2025 21:53
@faiq faiq closed this Sep 23, 2025
@faiq faiq reopened this Sep 23, 2025
@faiq faiq force-pushed the faiq/adds-new-nodeadm-bootstrap-type branch from 51ce7b1 to 3b4537a Compare September 23, 2025 16:38
@faiq faiq self-assigned this Sep 23, 2025
@faiq
Copy link
Collaborator Author

faiq commented Sep 23, 2025

How to run this.

check out branch
export a registry variable to something you have push access to export REGISTRY=docker.io/faiq
run make docker-build
run make docker-push
run make release-manifests
apply the release-manfiests in my case it was running

$ k apply -f _artifacts/image-patch/infrastructure-components/source-manifest.yaml

run kubectl set image deployment capa-controller-manager cluster-api-aws-controller-amd64=$REGISTRY/cluster-api-aws-controller-amd64:dev -n capa-system

apply some manifests

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/12
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta2
    kind: AWSManagedControlPlane
    name: default-control-plane
  infrastructureRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta2
    kind: AWSManagedControlPlane
    name: default-control-plane
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: AWSManagedControlPlane
metadata:
  name: default-control-plane
spec:
  addons:
  - name: kube-proxy
    version: v1.32.0-eksbuild.2
  network:
    cni:
      cniIngressRules:
      - description: kube-proxy metrics
        fromPort: 10249
        protocol: tcp
        toPort: 10249
      - description: NVIDIA Data Center GPU Manager metrics
        fromPort: 9400
        protocol: tcp
        toPort: 9400
      - description: Prometheus node exporter metrics
        fromPort: 9100
        protocol: tcp
        toPort: 9100
  region: us-west-2
  sshKeyName: ""
  version: v1.33.0
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
metadata:
  name: default2
spec:
  template:
    spec:
      cloudInit:
        insecureSkipSecretsManager: true
      ami:
        eksLookupType: AmazonLinux2023
      instanceMetadataOptions:
        httpTokens: required
        httpPutResponseHopLimit: 2
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: m5a.16xlarge
      rootVolume:
        size: 80
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2 # needs to match this thing
kind: NodeadmConfigTemplate
metadata:
  name: default
spec:
  template:
    spec:
      files:
        - path: /etc/yum.repos.d/centos-9.repo
          owner: root:root
          permissions: "0755"
          content: |-
            [baseos]
            name=CentOS Stream 9 - BaseOS
            baseurl=https://mirror.stream.centos.org/9-stream/BaseOS/$basearch/os/
            gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-Official
            gpgcheck=1
            repo_gpgcheck=0
            metadata_expire=6h
            countme=1
            enabled=1

            [appstream]
            name=CentOS Stream 9 - AppStream
            baseurl=https://mirror.stream.centos.org/9-stream/AppStream/$basearch/os/
            gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-Official
            gpgcheck=1
            repo_gpgcheck=0
            metadata_expire=6h
            countme=1
            enabled=1
      kubelet:
        config:
          evictionHard:
            memory.available: "2000Mi"
      preBootstrapCommands:
        - yum -y update; yum -y install iscsi-initiator-utils nfs-utils lvm2 xfsprogs ipvsadm sysstat lsscsi
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: default
spec:
  clusterName: default
  replicas: 3
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
          kind: NodeadmConfigTemplate
          name: default
      clusterName: test
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        kind: AWSMachineTemplate
        name: default2
      version: v1.33.0

Copy link
Collaborator

@dkoshkin dkoshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few questions

@faiq faiq force-pushed the faiq/adds-new-nodeadm-bootstrap-type branch from b37b672 to 3537aa5 Compare September 26, 2025 18:44
Copy link

@supershal supershal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing and driving this. 🙌

@faiq faiq force-pushed the faiq/adds-new-nodeadm-bootstrap-type branch from 76a95a8 to c434fec Compare September 26, 2025 19:05
@faiq faiq merged commit efb53df into main Sep 26, 2025
5 checks passed
@faiq faiq mentioned this pull request Oct 2, 2025
5 tasks
@CharlieR-o-o-t
Copy link

@faiq , hello! Are you planning to merge this to upstream?

{{- end }}
{{- end }}
{{- if .KubeletConfig }}
kubelet:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will result in incorrect NodeConfig generation if only kubelet flags have been provided

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried this out? I can give it a try as well.

Copy link

@CharlieR-o-o-t CharlieR-o-o-t Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it generates "flags" section under "cluster", but should be inside "kubelet".

"kubelet:" will be rendered only if config privided.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: NodeadmConfig
metadata:
  annotations:
    cluster.x-k8s.io/cloned-from-groupkind: NodeadmConfigTemplate.bootstrap.cluster.x-k8s.io
    cluster.x-k8s.io/cloned-from-name: my-eks-cluster3-md-0-lxc6k
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "1"
    cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
  creationTimestamp: "2025-10-06T18:16:18Z"
  generation: 1
  labels:
    cluster.x-k8s.io/cluster-name: my-eks-cluster3
    cluster.x-k8s.io/deployment-name: my-eks-cluster3-md-0-ph4nh
    cluster.x-k8s.io/set-name: my-eks-cluster3-md-0-ph4nh-fbpgq
    machine-template-hash: 1201924711-fbpgq
    topology.cluster.x-k8s.io/deployment-name: md-0
    topology.cluster.x-k8s.io/owned: ""
  name: my-eks-cluster3-md-0-ph4nh-fbpgq-dgzhn
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Machine
    name: my-eks-cluster3-md-0-ph4nh-fbpgq-dgzhn
    uid: c9a7a946-21ff-43ee-bf77-d3bf7f925094
  resourceVersion: "24606"
  uid: 59ad882d-9f02-4288-8d03-88bdf7dd0391
spec:
  kubelet:
    flags:
    - --register-with-taints=key=value:NoExecute

this works fine

@CharlieR-o-o-t
Copy link

@faiq , if aws secret manager is used (which is true by default), userdata will not be updated in case bootstrap change to NodeadmConfigTemplate. Need to update secret manager userdata also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants