Skip to content

Commit a2edf36

Browse files
committed
Revert "Make svclb as simple as possible"
This reverts commit 1befd65. Signed-off-by: manuelbuil <[email protected]>
1 parent a44cb16 commit a2edf36

File tree

7 files changed

+162
-37
lines changed

7 files changed

+162
-37
lines changed

Diff for: .github/workflows/e2e.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ jobs:
3838
strategy:
3939
fail-fast: false
4040
matrix:
41-
etest: [startup, s3, btrfs, externalip, privateregistry, embeddedmirror, wasm, svcpoliciesandfirewall]
41+
etest: [startup, s3, btrfs, externalip, privateregistry, embeddedmirror, wasm]
4242
max-parallel: 3
4343
steps:
4444
- name: "Checkout"

Diff for: .github/workflows/unitcoverage.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ permissions:
2828
jobs:
2929
test:
3030
name: Unit Tests
31-
runs-on: ubuntu-latest
31+
runs-on: ubuntu-22.04
3232
timeout-minutes: 20
3333
steps:
3434
- name: Checkout

Diff for: docs/adrs/remove-svclb-klipperlb.md renamed to docs/adrs/remove-svclb-daemonset.md

+7-19
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# Remove svclb klipper-lb
1+
# Remove svclb daemonset
22

33
Date: 2024-09-26
44

55
## Status
66

7-
Discussing
7+
Not approved
88

99
## Context
1010

@@ -77,32 +77,20 @@ KUBE-SVC-CVG3OEGEH7H5P3HQ all -- 0.0.0.0/0 0.0.0.0/0
7777

7878
As a consequence, the traffic never gets into the svclb daemonset pod. This can be additionally demonstrated by running a tcpdump on the svclb daemonset pod and no traffic will appear. This can also be demonstrated by tracing the iptables flow, where we will see how traffic is following the described path.
7979

80-
In Kubernetes 1.29, a [new related kube-proxy feature](https://kubernetes.io/docs/concepts/services-networking/service/#load-balancer-ip-mode) was added and it became beta in 1.30. This feature allows the load-balancer controller to specify if kube-proxy should include that `KUBE-EXT` rule or not. The selection is controlled by a new field called ipMode that has two values: * VIP (default): Keeps the current behaviour
81-
* Proxy: Removes the `KUBE-EXT` rule and hence Kube-proxy does not react to changes in the external-ip field
80+
Therefore, if we replace the logic to find the node IPs of the serviceLB controller by something which does not require the svclb daemonset, we could get rid of that daemonset since traffic is never reaching it. That replacement should be easy because in the end a daemonset means all nodes, so we could basically query kube-api to provide the IPs of all nodes.
8281

8382

84-
Therefore, if our k3s load-balancer controller sets ipMode=Proxy, the traffic would finally get into the svclb daemon
85-
86-
87-
## Solutions
88-
89-
One quick solution would be to use ipMode=Proxy when setting the status of the LoadBalancer services. However, it must be noted that klipper-lb is using iptables to do exactly what kube-proxy is doing, so there is no real benefit. We could include some extra features such as `X-Forwarded-For` to preserve the source IP, which would give a good reason to forward traffic to the svclb daemonset. Nonetheless, to achive that, we would need to replace klipper-lb with a proper load-balancer, which is out of the scope of this ADR. Note as well, that svclb is not working as expected when using `externalTrafficPolicy: Local`: it allows access to the service via a node IP even if there is no instance of a service pod running on that node
90-
91-
Another solution is to get rid of the daemonset completely. However, that solution will not detect if there is a process already using the port in a node (by another k8s service or by a node server) because kube-proxy does not check this. Unfortunately, this solution will obscure the error to the user and make debugging more difficult.
92-
93-
One simpler solution would be to replace klipper-lb with a tiny image that includes the binary `sleep`. Additionally, we would remove all the extra permissions required to run klipper-lb (e.g. NET_ADMIN). In this case, we would use the daemonset to reserve the port by HostPort and if a node is already using that port, that node's IP will not be included as external-IP. In such a case, the daemonset pod running on that node will clearly warn the user that it can't run because the port is already reserved, hence making it easy to debug.
94-
95-
This commit includes the last solution by setting busybox as the new image for the daemonset instead of klipper-lb.
96-
9783
## Decision
9884

99-
----
85+
There is one use case where klipper-lb is used. When deploying in a public cloud and using the publicIP as the --node-external-ip, kube-proxy is expecting the publicIP to be the destination IP. However, public clouds are normally doing a DNAT, so the kube-proxy's rule will never be used because the incoming packet does not have the publicIP anymore. In that case, the packet is capable of reaching the service because of the hostPort functionality on the daemonset svclb pushing the packet to svclb and then, klipper-lb routing the packet to the service. Conclusion: klipper-lb is needed
10086

10187
## Consequences
10288

10389
### Positives
90+
* Less resource consumption as we won't need one daemonset per LoadBalancer type of service
10491
* One fewer repo to maintain (klipper-lb)
10592
* Easier to understand flow of traffic
10693

10794
### Negatives
108-
* None that I can think of
95+
* Possible confusion for users that have been using this feature for a long time ("Where is the daemonset?") or users relying on that daemonset for their automation
96+
* In today's solution, if two LoadBalancer type services are using the same port, it is rather easy to notice that things don't work because the second daemonset will not deploy, as the port is already being used by the first daemonset. Kube-proxy does not check if two services are using the same port and it will create both rules without any error. The service that gets its rules higher in the chain is the one that will be reached when querying $nodeIP:$port. Perhaps we could add some logic in the controller that warns users about a duplication of the pair ip&port

Diff for: pkg/cloudprovider/servicelb.go

+77-16
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@ package cloudprovider
22

33
import (
44
"context"
5-
"encoding/json"
65
"fmt"
76
"sort"
7+
"strconv"
88
"strings"
99
"time"
10-
10+
"encoding/json"
1111
"sigs.k8s.io/yaml"
1212

1313
"github.com/k3s-io/k3s/pkg/util"
@@ -43,7 +43,6 @@ var (
4343
daemonsetNodeLabel = "svccontroller." + version.Program + ".cattle.io/enablelb"
4444
daemonsetNodePoolLabel = "svccontroller." + version.Program + ".cattle.io/lbpool"
4545
nodeSelectorLabel = "svccontroller." + version.Program + ".cattle.io/nodeselector"
46-
extTrafficPolicyLabel = "svccontroller." + version.Program + ".cattle.io/exttrafficpolicy"
4746
priorityAnnotation = "svccontroller." + version.Program + ".cattle.io/priorityclassname"
4847
tolerationsAnnotation = "svccontroller." + version.Program + ".cattle.io/tolerations"
4948
controllerName = names.ServiceLBController
@@ -56,7 +55,7 @@ const (
5655
)
5756

5857
var (
59-
DefaultLBImage = "rancher/mirrored-library-busybox:1.36.1"
58+
DefaultLBImage = "rancher/klipper-lb:v0.4.9"
6059
)
6160

6261
func (k *k3s) Register(ctx context.Context,
@@ -436,17 +435,35 @@ func (k *k3s) newDaemonSet(svc *core.Service) (*apps.DaemonSet, error) {
436435
oneInt := intstr.FromInt(1)
437436
priorityClassName := k.getPriorityClassName(svc)
438437
localTraffic := servicehelper.RequestsOnlyLocalTraffic(svc)
438+
sourceRangesSet, err := servicehelper.GetLoadBalancerSourceRanges(svc)
439+
if err != nil {
440+
return nil, err
441+
}
442+
sourceRanges := strings.Join(sourceRangesSet.StringSlice(), ",")
439443
securityContext := &core.PodSecurityContext{}
440444

445+
for _, ipFamily := range svc.Spec.IPFamilies {
446+
switch ipFamily {
447+
case core.IPv4Protocol:
448+
securityContext.Sysctls = append(securityContext.Sysctls, core.Sysctl{Name: "net.ipv4.ip_forward", Value: "1"})
449+
case core.IPv6Protocol:
450+
securityContext.Sysctls = append(securityContext.Sysctls, core.Sysctl{Name: "net.ipv6.conf.all.forwarding", Value: "1"})
451+
if sourceRanges == "0.0.0.0/0" {
452+
// The upstream default load-balancer source range only includes IPv4, even if the service is IPv6-only or dual-stack.
453+
// If using the default range, and IPv6 is enabled, also allow IPv6.
454+
sourceRanges += ",::/0"
455+
}
456+
}
457+
}
458+
441459
ds := &apps.DaemonSet{
442460
ObjectMeta: meta.ObjectMeta{
443461
Name: name,
444462
Namespace: k.LBNamespace,
445463
Labels: labels.Set{
446-
nodeSelectorLabel: "false",
447-
svcNameLabel: svc.Name,
448-
svcNamespaceLabel: svc.Namespace,
449-
extTrafficPolicyLabel: "Cluster",
464+
nodeSelectorLabel: "false",
465+
svcNameLabel: svc.Name,
466+
svcNamespaceLabel: svc.Namespace,
450467
},
451468
},
452469
TypeMeta: meta.TypeMeta{
@@ -505,7 +522,6 @@ func (k *k3s) newDaemonSet(svc *core.Service) (*apps.DaemonSet, error) {
505522
Name: portName,
506523
Image: k.LBImage,
507524
ImagePullPolicy: core.PullIfNotPresent,
508-
Command: []string{"sleep", "inf"},
509525
Ports: []core.ContainerPort{
510526
{
511527
Name: portName,
@@ -514,7 +530,57 @@ func (k *k3s) newDaemonSet(svc *core.Service) (*apps.DaemonSet, error) {
514530
Protocol: port.Protocol,
515531
},
516532
},
533+
Env: []core.EnvVar{
534+
{
535+
Name: "SRC_PORT",
536+
Value: strconv.Itoa(int(port.Port)),
537+
},
538+
{
539+
Name: "SRC_RANGES",
540+
Value: sourceRanges,
541+
},
542+
{
543+
Name: "DEST_PROTO",
544+
Value: string(port.Protocol),
545+
},
546+
},
547+
SecurityContext: &core.SecurityContext{
548+
Capabilities: &core.Capabilities{
549+
Add: []core.Capability{
550+
"NET_ADMIN",
551+
},
552+
},
553+
},
554+
}
555+
556+
if localTraffic {
557+
container.Env = append(container.Env,
558+
core.EnvVar{
559+
Name: "DEST_PORT",
560+
Value: strconv.Itoa(int(port.NodePort)),
561+
},
562+
core.EnvVar{
563+
Name: "DEST_IPS",
564+
ValueFrom: &core.EnvVarSource{
565+
FieldRef: &core.ObjectFieldSelector{
566+
FieldPath: getHostIPsFieldPath(),
567+
},
568+
},
569+
},
570+
)
571+
} else {
572+
container.Env = append(container.Env,
573+
core.EnvVar{
574+
Name: "DEST_PORT",
575+
Value: strconv.Itoa(int(port.Port)),
576+
},
577+
core.EnvVar{
578+
Name: "DEST_IPS",
579+
Value: strings.Join(svc.Spec.ClusterIPs, ","),
580+
},
581+
)
517582
}
583+
518584
ds.Spec.Template.Spec.Containers = append(ds.Spec.Template.Spec.Containers, container)
519585
}
520586

@@ -542,11 +608,6 @@ func (k *k3s) newDaemonSet(svc *core.Service) (*apps.DaemonSet, error) {
542608
}
543609
ds.Spec.Template.Spec.Tolerations = append(ds.Spec.Template.Spec.Tolerations, tolerations...)
544610

545-
// Change the label to force the DaemonSet to update and call onPodChange if the ExternalTrafficPolicy changes
546-
if localTraffic {
547-
ds.Spec.Template.Labels[extTrafficPolicyLabel] = "Local"
548-
}
549-
550611
return ds, nil
551612
}
552613

@@ -649,8 +710,8 @@ func (k *k3s) getPriorityClassName(svc *core.Service) string {
649710
return k.LBDefaultPriorityClassName
650711
}
651712

652-
// getTolerations retrieves the tolerations from a service's annotations.
653-
// It parses the tolerations from a JSON or YAML string stored in the annotations.
713+
// getTolerations retrieves the tolerations from a service's annotations.
714+
// It parses the tolerations from a JSON or YAML string stored in the annotations.
654715
func (k *k3s) getTolerations(svc *core.Service) ([]core.Toleration, error) {
655716
tolerationsStr, ok := svc.Annotations[tolerationsAnnotation]
656717
if !ok {

Diff for: scripts/airgap/image-list.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
docker.io/rancher/klipper-helm:v0.9.3-build20241008
2+
docker.io/rancher/klipper-lb:v0.4.9
23
docker.io/rancher/local-path-provisioner:v0.0.30
34
docker.io/rancher/mirrored-coredns-coredns:1.11.3
45
docker.io/rancher/mirrored-library-busybox:1.36.1

Diff for: updatecli/updatecli.d/klipper-lb.yaml

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
name: "Bump Klipper LB version"
3+
scms:
4+
k3s:
5+
kind: "github"
6+
spec:
7+
user: "{{ .github.user }}"
8+
email: "{{ .github.email }}"
9+
username: "{{ .github.username }}"
10+
token: "{{ requiredEnv .github.token }}"
11+
owner: "{{ .k3s.org }}"
12+
repository: "{{ .k3s.repo }}"
13+
branch: "{{ .k3s.branch }}"
14+
commitmessage:
15+
title: "Bump Klipper LB version"
16+
klipper-lb:
17+
kind: "github"
18+
spec:
19+
user: "{{ .github.user }}"
20+
email: "{{ .github.email }}"
21+
username: "{{ .github.username }}"
22+
token: "{{ requiredEnv .github.token }}"
23+
owner: "{{ .k3s.org }}"
24+
repository: "{{ .klipper_lb.repo }}"
25+
branch: "{{ .klipper_lb.branch }}"
26+
27+
actions:
28+
github:
29+
title: "Bump Klipper LB version"
30+
kind: "github/pullrequest"
31+
scmid: "k3s"
32+
spec:
33+
automerge: false
34+
mergemethod: "squash"
35+
usetitleforautomerge: true
36+
parent: false
37+
labels:
38+
- "dependencies"
39+
40+
sources:
41+
klipper-lb:
42+
name: "Get Klipper LB latest release version"
43+
kind: "githubrelease"
44+
spec:
45+
owner: "{{ .klipper_lb.org }}"
46+
repository: "{{ .klipper_lb.repo }}"
47+
branch: "{{ .klipper_lb.branch }}"
48+
token: "{{ requiredEnv .github.token }}"
49+
versionfilter:
50+
kind: "latest"
51+
52+
conditions:
53+
klipper-lb:
54+
name: "Check rancher/klipper-lb image version in DockerHub"
55+
kind: "dockerimage"
56+
sourceid: "klipper-lb"
57+
spec:
58+
image: "rancher/klipper-lb"
59+
60+
targets:
61+
klipper-lb:
62+
name: "Update rancher/klipper-lb image versions"
63+
kind: "file"
64+
scmid: "k3s"
65+
sourceid: "klipper-lb"
66+
spec:
67+
files:
68+
- "pkg/cloudprovider/servicelb.go"
69+
- "scripts/airgap/image-list.txt"
70+
matchpattern: 'rancher/klipper-lb:v\d+\.\d+\.\d+(-\w+)?'
71+
replacepattern: 'rancher/klipper-lb:{{ source "klipper-lb" }}'

Diff for: updatecli/values.yaml

+4
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ klipper_helm:
1111
org: "k3s-io"
1212
repo: "klipper-helm"
1313
branch: "master"
14+
klipper_lb:
15+
org: "k3s-io"
16+
repo: "klipper-lb"
17+
branch: "master"
1418
local_path_provisioner:
1519
org: "rancher"
1620
repo: "local-path-provisioner"

0 commit comments

Comments
 (0)