CIS ProcessNodeUpdate keeps monitoring node in NotReady state #2677

vincentmli · 2022-12-15T15:28:19Z

Setup Details

CIS Version : 2.10
Build: f5networks/k8s-bigip-ctlr:latest
BIGIP Version: Big IP 16.2
AS3 Version: 3.24
Agent Mode: AS3
Orchestration: K8S/OSCP
Orchestration Version:
Pool Mode: Cluster
Additional Setup details: Cilium CNI

Description

runs k8s cluster with 3 nodes, run kubeadm reset on one worker node attempting to remove the worker node from the cluster

[root@cilium-worker ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W1215 10:09:57.336678  203504 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

now the worker node is in NotReady state

Every 1.0s: kubectl get no -o wide                                                                                                                                  cilium-dev: Thu Dec 15 10:11:14 2022

NAME                     STATUS     ROLES                  AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE          KERNEL-VERSION          CONTAINER-RUNTIME
centos-dev.localdomain   Ready      <none>                 324d   v1.21.3   10.169.72.233   <none>        CentOS Linux 8    6.0.0-rc7+              docker://20.10.7
cilium-dev               Ready      control-plane,master   324d   v1.21.1   10.169.72.239   <none>        CentOS Linux 8    5.18.0+                 docker://20.10.7
cilium-worker            NotReady   <none>                 9h     v1.21.1   10.169.72.238   <none>        CentOS Stream 8   4.18.0-348.el8.x86_64   docker://20.10.10

but CIS ProcessNodeUpdate is not triggered and still keeps the NotReady state worker node, noted the VXLAN manager ProcessNodeUpdate is triggered to update FDB entry

2022/12/15 06:06:33 [INFO] [CORE] ProcessNodeUpdate: Change in Node state detected
2022/12/15 06:06:33 [INFO] newNode: centos-dev.localdomain

2022/12/15 06:06:33 [INFO] newNode: cilium-dev

2022/12/15 06:06:33 [INFO] newNode: cilium-worker

2022/12/15 06:06:34 [INFO] [2022-12-15 06:06:34,065 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan
2022/12/15 06:06:37 [INFO] [2022-12-15 06:06:37,067 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan
2022/12/15 06:06:47 [INFO] [2022-12-15 06:06:47,228 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan
2022/12/15 15:10:42 [INFO] [2022-12-15 15:10:42,436 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan

Steps To Reproduce

patched CIS to log new watched node

diff --git a/pkg/appmanager/appManager.go b/pkg/appmanager/appManager.go
index 3c597c94..94c8193f 100644
--- a/pkg/appmanager/appManager.go
+++ b/pkg/appmanager/appManager.go
@@ -3296,6 +3296,9 @@ func (appMgr *Manager) ProcessNodeUpdate(
                // Compare last set of nodes with new one
                if !reflect.DeepEqual(newNodes, appMgr.oldNodes) {
                        log.Infof("[CORE] ProcessNodeUpdate: Change in Node state detected")
+                       for _, node := range newNodes {
+                               log.Infof("newNode: %s\n", node.Name)
+                       }
                        // ServiceKey contains a service port in addition to namespace service
                        // name, while the work queue does not use service port. Create a list
                        // of unique work queue keys using a map.
@@ -3318,6 +3321,9 @@ func (appMgr *Manager) ProcessNodeUpdate(
                }
        } else {
                // Initialize appMgr nodes on our first pass through
+               for _, node := range newNodes {
+                       log.Infof("!appMgr.steadyState newNode: %s\n", node.Name)
+               }
                appMgr.oldNodes = newNodes
        }
 }

run k8s cluster with kubeadm and remove one worker node with kubeadm reset

watch CIS log to monitor new watched node

Expected Result

CIS ProcessNodeUpdate should only watch healthy k8s node

Actual Result

CIS keeps NotReady state node watched

Diagnostic Information

Observations (if any)

The text was updated successfully, but these errors were encountered:

vincentmli · 2022-12-15T15:32:09Z

note this is because getNodes does not check ready state or health status of k8s nodes

3335 // Get a list of Node addresses
3336 func (appMgr *Manager) getNodes(
3337         obj interface{},
3338 ) ([]Node, error) {
3339         nodes, ok := obj.([]v1.Node)
3340         if false == ok {
3341                 return nil,
3342                         fmt.Errorf("poll update unexpected type, interface is not []v1.Node")
3343         }
3344 
3345         watchedNodes := []Node{}
3346 
3347         var addrType v1.NodeAddressType
3348         if appMgr.UseNodeInternal() {
3349                 addrType = v1.NodeInternalIP
3350         } else {
3351                 addrType = v1.NodeExternalIP
3352         }
3353 
3354         // Append list of nodes to watchedNodes
3355         for _, node := range nodes {
3356                 nodeAddrs := node.Status.Addresses
3357                 for _, addr := range nodeAddrs {
3358                         if addr.Type == addrType {
3359                                 n := Node{
3360                                         Name: node.ObjectMeta.Name,
3361                                         Addr: addr.Address,
3362                                 }
3363                                 watchedNodes = append(watchedNodes, n)
3364                         }
3365                 }
3366         }
3367

trinaths · 2022-12-16T06:52:37Z

Created [CONTCNTR-3696] for internal tracking.

vincentmli added bug untriaged no JIRA created labels Dec 15, 2022

trinaths added JIRA and removed untriaged no JIRA created labels Dec 16, 2022

lavanya-f5 mentioned this issue Jan 20, 2023

handle node notready update #2731

Merged

2 tasks

lavanya-f5 closed this as completed in #2731 Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIS ProcessNodeUpdate keeps monitoring node in NotReady state #2677

CIS ProcessNodeUpdate keeps monitoring node in NotReady state #2677

vincentmli commented Dec 15, 2022

vincentmli commented Dec 15, 2022

trinaths commented Dec 16, 2022

CIS ProcessNodeUpdate keeps monitoring node in NotReady state #2677

CIS ProcessNodeUpdate keeps monitoring node in NotReady state #2677

Comments

vincentmli commented Dec 15, 2022

Setup Details

Description

Steps To Reproduce

Expected Result

Actual Result

Diagnostic Information

Observations (if any)

vincentmli commented Dec 15, 2022

trinaths commented Dec 16, 2022