Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIS ProcessNodeUpdate keeps monitoring node in NotReady state #2677

Closed
vincentmli opened this issue Dec 15, 2022 · 2 comments · Fixed by #2731
Closed

CIS ProcessNodeUpdate keeps monitoring node in NotReady state #2677

vincentmli opened this issue Dec 15, 2022 · 2 comments · Fixed by #2731

Comments

@vincentmli
Copy link
Contributor

Setup Details

CIS Version : 2.10
Build: f5networks/k8s-bigip-ctlr:latest
BIGIP Version: Big IP 16.2
AS3 Version: 3.24
Agent Mode: AS3
Orchestration: K8S/OSCP
Orchestration Version:
Pool Mode: Cluster
Additional Setup details: Cilium CNI

Description

runs k8s cluster with 3 nodes, run kubeadm reset on one worker node attempting to remove the worker node from the cluster

[root@cilium-worker ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W1215 10:09:57.336678  203504 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

now the worker node is in NotReady state

Every 1.0s: kubectl get no -o wide                                                                                                                                  cilium-dev: Thu Dec 15 10:11:14 2022

NAME                     STATUS     ROLES                  AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE          KERNEL-VERSION          CONTAINER-RUNTIME
centos-dev.localdomain   Ready      <none>                 324d   v1.21.3   10.169.72.233   <none>        CentOS Linux 8    6.0.0-rc7+              docker://20.10.7
cilium-dev               Ready      control-plane,master   324d   v1.21.1   10.169.72.239   <none>        CentOS Linux 8    5.18.0+                 docker://20.10.7
cilium-worker            NotReady   <none>                 9h     v1.21.1   10.169.72.238   <none>        CentOS Stream 8   4.18.0-348.el8.x86_64   docker://20.10.10

but CIS ProcessNodeUpdate is not triggered and still keeps the NotReady state worker node, noted the VXLAN manager ProcessNodeUpdate is triggered to update FDB entry

2022/12/15 06:06:33 [INFO] [CORE] ProcessNodeUpdate: Change in Node state detected
2022/12/15 06:06:33 [INFO] newNode: centos-dev.localdomain

2022/12/15 06:06:33 [INFO] newNode: cilium-dev

2022/12/15 06:06:33 [INFO] newNode: cilium-worker

2022/12/15 06:06:34 [INFO] [2022-12-15 06:06:34,065 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan
2022/12/15 06:06:37 [INFO] [2022-12-15 06:06:37,067 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan
2022/12/15 06:06:47 [INFO] [2022-12-15 06:06:47,228 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan
2022/12/15 15:10:42 [INFO] [2022-12-15 15:10:42,436 f5_cccl.resource.resource INFO] Updating ApiFDBTunnel: /Common/flannel_vxlan

Steps To Reproduce

patched CIS to log new watched node

diff --git a/pkg/appmanager/appManager.go b/pkg/appmanager/appManager.go
index 3c597c94..94c8193f 100644
--- a/pkg/appmanager/appManager.go
+++ b/pkg/appmanager/appManager.go
@@ -3296,6 +3296,9 @@ func (appMgr *Manager) ProcessNodeUpdate(
                // Compare last set of nodes with new one
                if !reflect.DeepEqual(newNodes, appMgr.oldNodes) {
                        log.Infof("[CORE] ProcessNodeUpdate: Change in Node state detected")
+                       for _, node := range newNodes {
+                               log.Infof("newNode: %s\n", node.Name)
+                       }
                        // ServiceKey contains a service port in addition to namespace service
                        // name, while the work queue does not use service port. Create a list
                        // of unique work queue keys using a map.
@@ -3318,6 +3321,9 @@ func (appMgr *Manager) ProcessNodeUpdate(
                }
        } else {
                // Initialize appMgr nodes on our first pass through
+               for _, node := range newNodes {
+                       log.Infof("!appMgr.steadyState newNode: %s\n", node.Name)
+               }
                appMgr.oldNodes = newNodes
        }
 }

run k8s cluster with kubeadm and remove one worker node with kubeadm reset

watch CIS log to monitor new watched node

Expected Result

CIS ProcessNodeUpdate should only watch healthy k8s node

Actual Result

CIS keeps NotReady state node watched

Diagnostic Information

Observations (if any)

@vincentmli vincentmli added bug untriaged no JIRA created labels Dec 15, 2022
@vincentmli
Copy link
Contributor Author

note this is because getNodes does not check ready state or health status of k8s nodes

3335 // Get a list of Node addresses
3336 func (appMgr *Manager) getNodes(
3337         obj interface{},
3338 ) ([]Node, error) {
3339         nodes, ok := obj.([]v1.Node)
3340         if false == ok {
3341                 return nil,
3342                         fmt.Errorf("poll update unexpected type, interface is not []v1.Node")
3343         }
3344 
3345         watchedNodes := []Node{}
3346 
3347         var addrType v1.NodeAddressType
3348         if appMgr.UseNodeInternal() {
3349                 addrType = v1.NodeInternalIP
3350         } else {
3351                 addrType = v1.NodeExternalIP
3352         }
3353 
3354         // Append list of nodes to watchedNodes
3355         for _, node := range nodes {
3356                 nodeAddrs := node.Status.Addresses
3357                 for _, addr := range nodeAddrs {
3358                         if addr.Type == addrType {
3359                                 n := Node{
3360                                         Name: node.ObjectMeta.Name,
3361                                         Addr: addr.Address,
3362                                 }
3363                                 watchedNodes = append(watchedNodes, n)
3364                         }
3365                 }
3366         }
3367 

@trinaths
Copy link
Contributor

Created [CONTCNTR-3696] for internal tracking.

@trinaths trinaths added JIRA and removed untriaged no JIRA created labels Dec 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants