Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add option to define toleration on wave-build jobs to allow scheduling on tainted nodes #702

Open
gavinelder opened this issue Oct 17, 2024 · 4 comments · May be fixed by #751
Open
Assignees

Comments

@gavinelder
Copy link
Contributor

The following duplicates #291.

Background.

In the current setup, wave build pods can utilize the nodeSelector functionality as defined in the Wave config map. This allows users to specify node selection criteria, ensuring that certain workloads run on specific nodes. For example, a user can configure the nodeSelector in the Wave config map as follows:

      build:
          nodeSelector:
            # this node selector binds the build pods to a separate cluster node group
            linux/amd64: 'service=wave-build'
            linux/arm64: 'service=wave-build-arm64'

This configuration ensures that wave build pods are scheduled on nodes labeled with service=wave-build. While this approach assigns these pods to specific nodes, it does not prevent other pods from being scheduled on those nodes, which can lead to unrelated workloads being scheduled on node-groups intended for wave.

Problem

We have seen instances of wave becoming a noisy neighbor impacting other pods scheduled on the same node as the wave-build pods resulting in degraded performance and in some instances node failure when wave-build was configured without shared storage.

Feature Request

By adding support for tolerations as part of the job configuration we can allow cluster operators to taint nodes with relevant NoSchedule annotation preventing other workloads being configured on the relevant wave-build nodes.

kubectl taint nodes node-name service=wave-build:NoSchedule

This taint would prevent any pods without the appropriate toleration from being scheduled on the nodes reserved for wave build jobs.

The configuration for the Wave build jobs would then look like the following for example, allowing the pod to tolerate the node taint.

spec:
  tolerations:
    - key: "service"
      operator: "Equal"
      value: "wave-build"
      effect: "NoSchedule"

Possible approach

A potential solution would be to allow end users to provide a Kubernetes-compliant list of Tolerations that can be added to the build jobs via the Wave service configuration. These tolerations would then be validated by the Wave service and injected into the build job pods during their creation.

By allowing users to specify tolerations in a format that directly follows the Kubernetes specification, we reduce the need for Wave to enforce specific business logic around pod scheduling, while ensuring that users can define their own scheduling constraints and preferences.

An example config would look like the following.

      build:
          nodeSelector:
            # this node selector binds the build pods to a separate cluster node group
            linux/amd64: 'service=wave-build'
            linux/arm64: 'service=wave-build-arm64'
          Tolerations:
            - key: "service"
              operator: "Equal"
              value: "wave-build"
              effect: "NoSchedule"

Alternatives considered

Use of an kubernetes admission controller

It is possible to use a admission controller such as Kyverno to add these relevant tolerations via mutation

The use of mutation as outlined below may by a suitable however for customer environments having native support within the wave application removes the need to run and operate additional infrastructure.

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: add-toleration-and-label-to-wave-build-jobs
spec:
  rules:
    - name: add-toleration-and-label-to-wave-build
      match:
        resources:
          kinds:
            - Job
          selector:
            matchLabels:
              app: wave-build
      mutate:
        patchStrategicMerge:
          spec:
            template:
              spec:
                tolerations:
                  - key: "service"
                    operator: "Equal"
                    value: "wave-build"
                    effect: "NoSchedule"
            # Add the labels to the job's template pod spec
            template:
              metadata:
                labels:
                  environment: production 
                  purpose: wave-build

Additional Information

The following is the Tolleration specification from kubernetes/pkg/apis/core/types.go

Toleration specification

type Toleration struct {
	// Key is the taint key that the toleration applies to. Empty means match all taint keys.
	// If the key is empty, operator must be Exists; this combination means to match all values and all keys.
	// +optional
	Key string
	// Operator represents a key's relationship to the value.
	// Valid operators are Exists and Equal. Defaults to Equal.
	// Exists is equivalent to wildcard for value, so that a pod can
	// tolerate all taints of a particular category.
	// +optional
	Operator TolerationOperator
	// Value is the taint value the toleration matches to.
	// If the operator is Exists, the value should be empty, otherwise just a regular string.
	// +optional
	Value string
	// Effect indicates the taint effect to match. Empty means match all taint effects.
	// When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
	// +optional
	Effect TaintEffect
	// TolerationSeconds represents the period of time the toleration (which must be
	// of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
	// it is not set, which means tolerate the taint forever (do not evict). Zero and
	// negative values will be treated as 0 (evict immediately) by the system.
	// +optional
	TolerationSeconds *int64
}

Pod Specification

type PodSpec struct {
    ...
    Tolerations []Toleration
    ...
}   

Example Pod with Tollerations.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  containers:
    - name: nginx
      image: nginx:1.14.2
  tolerations:
    - key: "key1"
      operator: "Equal"
      value: "value1"
      effect: "NoSchedule"
    - key: "key2"
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 3600
@pditommaso
Copy link
Collaborator

It would make sense to move this fwd as *replacement* of the existing node selector mechanism

@Property(name='wave.build.k8s.node-selector')
@Nullable
private Map<String, String> nodeSelectorMap

final selector= getSelectorLabel(req.platform, nodeSelectorMap)
k8sService.launchBuildJob(jobName, buildImage, buildCmd, req.workDir, configFile, timeout, selector)

@bebosudo
Copy link
Member

Note that under some circumstances node-selector may still be useful so I wouldn't replace it entirely: at the end of the day it's a node affinity rule that requires pods to always run on nodes matching a specific label.
Please let us know if more pointers are needed in order to develop this.

@munishchouhan munishchouhan self-assigned this Nov 19, 2024
@munishchouhan
Copy link
Member

@gavinelder @pditommaso Do we need tolerations for build pods only or for scan, transfer and mirror too?

@pditommaso
Copy link
Collaborator

Only builds are platform dependent

@munishchouhan munishchouhan linked a pull request Nov 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants