Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an upper limit or size for calico BGP route mapper? #9610

Open
asskss opened this issue Dec 18, 2024 · 5 comments
Open

Is there an upper limit or size for calico BGP route mapper? #9610

asskss opened this issue Dec 18, 2024 · 5 comments

Comments

@asskss
Copy link

asskss commented Dec 18, 2024

k8s version 1.22.17
calico version 3.22.4
system ubuntu20.04 5.4.0-81-generic

We have a cluster size of 152 nodes that belong to bgp group one.

rr-gw-1      10.128.122.1     calico/peer == "default-1"     65004   
rr-122-26   10.128.122.26    calico/peer == "default-1"     65004  
rr-122-31   10.128.122.31    calico/peer == "default-1"     65004   

When we proceed to add nodes to this BGP group, the error occurs. There is no error in the log of calico-node, but the health check fails.
POD

kube-system              calico-node-gncfn                                               0/1     Running             0                   28h 

POD events

Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  79s (x1489 over 3h40m)  kubelet  (combined from similar events): Readiness probe failed: 2024-12-18 07:19:36.112 [INFO][698878] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31

birdcl -s /var/run/calico/bird.ctl show protocols

name     proto    table    state  since       info
static1  Static   master   up     2024-12-17  
kernel1  Kernel   master   up     2024-12-17  
device1  Device   master   up     2024-12-17  
direct1  Direct   master   up     2024-12-17  
Node_10_128_122_1 BGP      master   start  2024-12-17  Connect
Node_10_128_122_26 BGP      master   start  2024-12-17  Connect   
Node_10_128_122_31 BGP      master   start  2024-12-17  Connect

At this time, the bgp connections of the other 152 nodes are normal.

calicoctl node status

IPv4 BGP status
+---------------+---------------+-------+------------+-------------+
| PEER ADDRESS  |   PEER TYPE   | STATE |   SINCE    |    INFO     |
+---------------+---------------+-------+------------+-------------+
| 10.128.122.26 | node specific | up    | 2023-05-12 | Established |
| 10.128.122.1  | node specific | start | 2023-05-12 | Connect     |
| 10.128.122.31 | node specific | up    | 2023-06-12 | Established |
+---------------+---------------+-------+------------+-------------+

When we added this node to the second bgp group, the calico-node health test passed and the bgp connection was normal.

rr-gw-1      10.128.122.1     calico/peer == "default-2"     65004   
rr-122-182   10.128.122.182    calico/peer == "default-2"     65004  
rr-122-181   10.128.122.181    calico/peer == "default-2"     65004   

POD

kube-system              calico-node-gncfn                                               1/1     Running             0                   28h 

birdcl -s /var/run/calico/bird.ctl show protocols

bird> show protocols
name     proto    table    state  since       info
static1  Static   master   up     06:02:27    
kernel1  Kernel   master   up     06:02:27    
device1  Device   master   up     06:02:27    
direct1  Direct   master   up     06:02:27    
Node_10_128_122_1 BGP      master   start  06:02:27    Connect       
Node_10_128_122_181 BGP      master   up     06:37:14    Established   
Node_10_128_122_182 BGP      master   up     06:37:14    Established 
@asskss
Copy link
Author

asskss commented Dec 19, 2024

  Warning  Unhealthy  118s (x2 over 119s)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
  Warning  Unhealthy  110s                 kubelet            Readiness probe failed: 2024-12-19 02:01:02.612 [INFO][422] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31
  Warning  Unhealthy  100s  kubelet  Readiness probe failed: 2024-12-19 02:01:12.637 [INFO][493] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31
  Warning  Unhealthy  90s  kubelet  Readiness probe failed: 2024-12-19 02:01:22.622 [INFO][552] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31
  Warning  Unhealthy  80s  kubelet  Readiness probe failed: 2024-12-19 02:01:32.577 [INFO][604] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31
  Warning  Unhealthy  70s  kubelet  Readiness probe failed: 2024-12-19 02:01:42.567 [INFO][662] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31
  Warning  Unhealthy  60s  kubelet  Readiness probe failed: 2024-12-19 02:01:52.597 [INFO][722] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31
  Warning  Unhealthy  50s  kubelet  Readiness probe failed: 2024-12-19 02:02:02.584 [INFO][789] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31
  Warning  Unhealthy  41s  kubelet  Readiness probe failed: 2024-12-19 02:02:11.144 [INFO][852] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.128.122.1,10.128.122.26,10.128.122.31

@asskss
Copy link
Author

asskss commented Dec 23, 2024

Execute show protocol on the rr node:

Node_10_128_122_175 BGP      master   up     2024-12-13  Established   
Node_10_128_122_178 BGP      master   up     2024-12-19  Established   
Node_10_128_122_176 BGP      master   down   2024-12-20  Error: Kernel MD5 auth failed
Node_10_128_122_177 BGP      master   down   2024-12-20  Error: Kernel MD5 auth failed
Node_10_128_122_27 BGP      master   down   2024-12-20  Error: Kernel MD5 auth failed

@caseydavenport
Copy link
Member

Node_10_128_122_176 BGP master down 2024-12-20 Error: Kernel MD5 auth failed

Are you using a BGP password on these connections? This seems to suggest that some sort of authentication is failing on these peerings.

@asskss
Copy link
Author

asskss commented Jan 6, 2025

Node_10_128_122_176 BGP master down 2024-12-20 Error: Kernel MD5 auth failed

Are you using a BGP password on these connections? This seems to suggest that some sort of authentication is failing on these peerings.

  1. Configure BGP password using secret.
  2. Currently, both sets of RRs have the same key.
  3. When these nodes join the first set of RR, this error occurs, and the other nodes in this set of RR have no problems, only the newly added ones have problems. Is there really an upper limit.
  4. This error will not occur when joining the second group RR.

@caseydavenport
Copy link
Member

Is there really an upper limit.

There is no hard coded limit, no. There are always scale limitations in software applications though and the limit will depending on your particular environment - how much resources you have allocated to your nodes, etc.

From the logs, this appears to be an MD5 issue though rather than a scale issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants