Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 使用外部网络创建的Subnet 的v4usingIPs字段超过网段范围 #4747

Open
kldancer opened this issue Nov 19, 2024 · 18 comments
Open
Assignees
Labels
bug Something isn't working no-issue-activity subnet

Comments

@kldancer
Copy link

Kube-OVN Version

v1.12.22

Kubernetes Version

v1.27.6

Operation-system/Kernel Version

5.10.0

Description

使用外部网络(macvlan)创建的Subnet 的v4usingIPs字段超过了网段范围

Steps To Reproduce

  1. 该子网是macvlan外部网络创建的
企业微信截图_d502a8fc-92d0-4cdc-b933-702b818bcda5 2. 查看代码里对于外部网络的subnet,v4usingIPs字段是加上了相关ips、eips资源数量的总和的,而这两个CRD中IP有重合,导致了该情况。 企业微信截图_b0f7beeb-7a05-4b51-8a7e-88c2004463a9 3. 但是这个和官网中描述该字段的意义不一致,想知道该字段的含义是否不同,或者是否为一个bug。 企业微信截图_beb33042-68b6-408f-a793-e76946d4e74b

Current Behavior

subnet的v4usingIPs字段超过了当前CIDR的范围

Expected Behavior

subnet的v4usingIPs字段正确展示子网现在已用的 IPv4 IP 地址数量

@kldancer kldancer added the bug Something isn't working label Nov 19, 2024
@dosubot dosubot bot added the subnet label Nov 19, 2024
Copy link

dosubot bot commented Nov 19, 2024

I found some similar issues that might be relevant to the problem you're experiencing:

  1. subnet status v4usingIPrange not stay the same with v4usingIPs - This issue is closed and was last updated on January 04, 2024. It was related to discrepancies in IP statistics and was referenced by several pull requests aimed at addressing IP counting issues.

  2. Incorrect IP statistics in subnet status after subnet expansion - This issue is also closed and was resolved by a pull request that aimed to avoid concurrent subnet status updates, which could lead to incorrect IP statistics.

These issues suggest that there have been previous problems with how IP statistics are calculated or reported, particularly in relation to overlapping IPs and external networks. It might be worth reviewing the changes made in the related pull requests to see if they address the problem you're encountering.

To continue talking to Dosu, mention @dosu.

@zbb88888
Copy link
Collaborator

@kldancer ip crd 和 iptables-eip 存在冲突?

@kldancer
Copy link
Author

kldancer commented Nov 20, 2024

@kldancer ip crd 和 iptables-eip 存在冲突?

是的有不少重叠
5071732069751_ pic

@zbb88888
Copy link
Collaborator

@kldancer 帮忙在 kube-ovn-controller 中过滤下 grep -C 10 181.16.3.160 相关的 log

@kldancer
Copy link
Author

kldancer commented Nov 20, 2024

@kldancer 帮忙在 kube-ovn-controller 中过滤下 grep -C 10 181.16.3.160 相关的 log

日志如下:
`[root@mgtnode198 ~]# cat /var/log/kube-ovn/kube-ovn-controller.log | grep -C 10 181.16.3.160
I1120 09:38:57.338910 7 ipam.go:72] allocating static ip 110.64.60.223 from subnet ovn-default
I1120 09:38:57.338931 7 ipam.go:102] allocate v4 110.64.60.223, mac 00:00:00:6C:5A:87 for vm-74f00bab/i-25pvofwe from subnet ovn-default
I1120 09:38:57.338936 7 ipam.go:72] allocating static ip 102.99.3.1 from subnet subnet-lml6fvb0
I1120 09:38:57.338951 7 ipam.go:102] allocate v4 102.99.3.1, mac 00:00:00:57:84:86 for vm-74f00bab/i-bssjo9g2 from subnet subnet-lml6fvb0
I1120 09:38:57.338957 7 ipam.go:72] allocating static ip 102.99.92.1 from subnet subnet-lu577hiw
I1120 09:38:57.338971 7 ipam.go:102] allocate v4 102.99.92.1, mac 00:00:00:74:31:5D for vm-74f00bab/i-umlkc9ay from subnet subnet-lu577hiw
I1120 09:38:57.338976 7 ipam.go:72] allocating static ip 181.16.3.148 from subnet net-f30e7yge
I1120 09:38:57.338997 7 ipam.go:102] allocate v4 181.16.3.148, mac for kube-system/vpc-nat-gw-eip-jvytp870-0 from subnet net-f30e7yge
I1120 09:38:57.339003 7 ipam.go:72] allocating static ip 181.16.3.47 from subnet net-f30e7yge
I1120 09:38:57.339023 7 ipam.go:102] allocate v4 181.16.3.47, mac for kube-system/vpc-nat-gw-eip-lan8jmql-0 from subnet net-f30e7yge
I1120 09:38:57.339029 7 ipam.go:72] allocating static ip 181.16.3.160 from subnet net-f30e7yge
I1120 09:38:57.339053 7 ipam.go:102] allocate v4 181.16.3.160, mac for kube-system/vpc-nat-gw-eip-dmrrowhk-0 from subnet net-f30e7yge
I1120 09:38:57.339058 7 ipam.go:72] allocating static ip 181.16.3.152 from subnet net-f30e7yge
I1120 09:38:57.339077 7 ipam.go:102] allocate v4 181.16.3.152, mac for kube-system/vpc-nat-gw-eip-o80r3ngu-0 from subnet net-f30e7yge
I1120 09:38:57.339082 7 ipam.go:72] allocating static ip 102.99.86.1 from subnet subnet-jenrvoc6
I1120 09:38:57.339098 7 ipam.go:102] allocate v4 102.99.86.1, mac 00:00:00:78:E2:BC for vm-74f00bab/i-ei5xv05g from subnet subnet-jenrvoc6
I1120 09:38:57.339103 7 ipam.go:72] allocating static ip 102.99.44.1 from subnet subnet-owh47yfn
I1120 09:38:57.339117 7 ipam.go:102] allocate v4 102.99.44.1, mac 00:00:00:44:E8:5E for vm-74f00bab/i-2wuq8n7b from subnet subnet-owh47yfn
I1120 09:38:57.339122 7 ipam.go:72] allocating static ip 110.64.95.97 from subnet ovn-default
I1120 09:38:57.339144 7 ipam.go:102] allocate v4 110.64.95.97, mac 00:00:00:0A:C6:46 for vm-74f00bab/i-wi0a6u8k from subnet ovn-default
I1120 09:38:57.339150 7 ipam.go:72] allocating static ip 181.16.3.139 from subnet net-f30e7yge
I1120 09:38:57.339173 7 ipam.go:102] allocate v4 181.16.3.139, mac for kube-system/vpc-nat-gw-eip-j2q0vhi2-0 from subnet net-f30e7yge

--

E1120 09:38:57.712149 7 ipam.go:89] failed to allocate static ip 181.16.3.56 for eip-nwghjrqw
E1120 09:38:57.712157 7 init.go:404] failed to init ipam from iptables eip cr eip-nwghjrqw: AddressConflict
I1120 09:38:57.712162 7 ipam.go:72] allocating static ip 181.16.3.112 from subnet net-f30e7yge
E1120 09:38:57.712171 7 subnet.go:392] ip 181.16.3.112 has been allocated to [kube-system/vpc-nat-gw-eip-96dnn05x-0]
E1120 09:38:57.712186 7 ipam.go:89] failed to allocate static ip 181.16.3.112 for eip-oyk182kv
E1120 09:38:57.712195 7 init.go:404] failed to init ipam from iptables eip cr eip-oyk182kv: AddressConflict
I1120 09:38:57.712200 7 ipam.go:72] allocating static ip 181.16.3.96 from subnet net-f30e7yge
E1120 09:38:57.712209 7 subnet.go:392] ip 181.16.3.96 has been allocated to [kube-system/vpc-nat-gw-eip-obny521i-0]
E1120 09:38:57.712218 7 ipam.go:89] failed to allocate static ip 181.16.3.96 for eip-xpabxdcr
E1120 09:38:57.712227 7 init.go:404] failed to init ipam from iptables eip cr eip-xpabxdcr: AddressConflict
I1120 09:38:57.712232 7 ipam.go:72] allocating static ip 181.16.3.160 from subnet net-f30e7yge
E1120 09:38:57.712240 7 subnet.go:392] ip 181.16.3.160 has been allocated to [kube-system/vpc-nat-gw-eip-dmrrowhk-0]
E1120 09:38:57.712258 7 ipam.go:89] failed to allocate static ip 181.16.3.160 for eip-1p63d8pj
E1120 09:38:57.712265 7 init.go:404] failed to init ipam from iptables eip cr eip-1p63d8pj: AddressConflict
I1120 09:38:57.712269 7 ipam.go:72] allocating static ip 181.16.3.222 from subnet net-f30e7yge
I1120 09:38:57.712297 7 ipam.go:102] allocate v4 181.16.3.222, mac 6a:71:ac:61:3e:7e for eip-7n84aptf from subnet net-f30e7yge
I1120 09:38:57.712302 7 ipam.go:72] allocating static ip 181.16.3.226 from subnet net-f30e7yge
I1120 09:38:57.712325 7 ipam.go:102] allocate v4 181.16.3.226, mac 62:b2:7f:6d:95:b7 for eip-izwehynt from subnet net-f30e7yge
I1120 09:38:57.712330 7 ipam.go:72] allocating static ip 181.16.3.48 from subnet net-f30e7yge
E1120 09:38:57.712340 7 subnet.go:392] ip 181.16.3.48 has been allocated to [kube-system/vpc-nat-gw-eip-cs0433ap-0]
E1120 09:38:57.712348 7 ipam.go:89] failed to allocate static ip 181.16.3.48 for eip-prr336kc
E1120 09:38:57.712359 7 init.go:404] failed to init ipam from iptables eip cr eip-prr336kc: AddressConflict
I1120 09:38:57.712364 7 ipam.go:72] allocating static ip 181.16.3.142 from subnet net-f30e7yge`

@zbb88888
Copy link
Collaborator

看起来没问题,这个分配是失败的,

image

@kldancer
Copy link
Author

eip-1p63d8pj

但是实际上该ip是被成功分配到了目标nat-gw(vpc-nat-gw-eip-1p63d8pj-0)里
反而是找不到vpc-nat-gw-eip-dmrrowhk-0
5091732071931_ pic

@zbb88888
Copy link
Collaborator

[root@mgtnode198 ~]# cat /var/log/kube-ovn/kube-ovn-controller.log | grep -C 10 181.16.3.160

I1120 09:38:57.339029 7 ipam.go:72] allocating static ip 181.16.3.160 from subnet net-f30e7yge
I1120 09:38:57.339053 7 ipam.go:102] allocate v4 181.16.3.160, mac for kube-system/vpc-nat-gw-eip-dmrrowhk-0 from subnet net-f30e7yge
--
I1120 09:38:57.712232 7 ipam.go:72] allocating static ip 181.16.3.160 from subnet net-f30e7yge
E1120 09:38:57.712240 7 subnet.go:392] ip 181.16.3.160 has been allocated to [kube-system/vpc-nat-gw-eip-dmrrowhk-0]
E1120 09:38:57.712258 7 ipam.go:89] failed to allocate static ip 181.16.3.160 for eip-1p63d8pj
E1120 09:38:57.712265 7 init.go:404] failed to init ipam from iptables eip cr eip-1p63d8pj: AddressConflict


看下你的 IP 和 eip 的 yaml,我看下创建时间

@zbb88888
Copy link
Collaborator

是不是 eip 先创建的, ip 后创建的 ?

@kldancer
Copy link
Author

是不是 eip 先创建的, ip 后创建的 ?
看起来是ip先创建的
5101732073090_ pic

@zbb88888
Copy link
Collaborator

11-19 号,iptables-eip 创建时的 log 可以帮忙找下不?

@kldancer
Copy link
Author

11-19 号,iptables-eip 创建时的 log 可以帮忙找下不?

抱歉,找不到那之前的日志了。只有今天的
kube-ovn-controller.log
我比较在意的是ips这个资源,环境里已不存在占用该IP的pod资源,为什么还存在。一般pod资源删除,IP释放,应该会删除该IPS资源吧。

5111732087726_ pic

@kldancer
Copy link
Author

kldancer commented Nov 20, 2024

11-19 号,iptables-eip 创建时的 log 可以帮忙找下不?

在另一个环境试了一下,删除nat-gw,它net1网卡的主IP相对应的ips资源也相应删除掉了,是符合预期的。
5121732089491_ pic
上文环境中的问题,应该就是做了什么操作导致pod删除时,其ips资源没有删除,产生了脏数据。可惜没有日志

是否有机制会自动清理过期的IPS呢?

@zbb88888
Copy link
Collaborator

环境重新部署过么?

@zbb88888
Copy link
Collaborator

@dolibali 跟下这个bug,看是否能复现?

@zbb88888 zbb88888 self-assigned this Nov 21, 2024
@kldancer
Copy link
Author

问题再次复现了,操作是反复重建删除nat-gw,最终导致了一堆nat-gw eip的ips脏数据,以下是节选日志:可以搜索关键字eip-0aineocq-0

[root@mgtnode197 ~]# cat /var/log/kube-ovn/kube-ovn-controller.log  | grep eip-0aineocq-0 -C 5
I1122 14:40:04.779550       7 network_policy.go:289] UpdateNp Ingress, allows is [110.64.0.62 10.96.35.206 10.110.218.19 10.98.110.140 110.64.0.170 10.104.54.131], excepts is [], log false, protocol IPv4
I1122 14:40:04.779928       7 network_policy.go:289] UpdateNp Ingress, allows is [110.64.0.62 10.96.35.206 10.110.218.19 10.98.110.140], excepts is [], log false, protocol IPv4
I1122 14:40:09.271186       7 gc.go:380] gc logical switch port vpc-nat-gw-eip-0aineocq-0.kube-system
I1122 14:40:09.271446       7 ovn-nb-logical_switch_port.go:685] delete logical switch port vpc-nat-gw-eip-0aineocq-0.kube-system with id 726479c3-fe18-4991-a573-61740b433695 from logical switch subnet-ue9ralxi
I1122 14:40:09.273994       7 gc.go:385] gc ip vpc-nat-gw-eip-0aineocq-0.kube-system
I1122 14:40:09.275845       7 gc.go:395] gc ip vpc-nat-gw-eip-0aineocq-0.kube-system
I1122 14:40:09.279851       7 subnet.go:496] release v4 102.99.20.253 mac be:9f:05:8d:24:57 from subnet subnet-ue9ralxi for kube-system/vpc-nat-gw-eip-0aineocq-0, add ip to released list
I1122 14:40:10.233816       7 ippool.go:205] handle delete ippool vpc-nat-gw-subnet-ue9ralxi
I1122 14:40:10.239698       7 subnet.go:350] format subnet subnet-ue9ralxi, changed false
I1122 14:40:10.246670       7 vpc.go:117] handle delete vpc vpc-s1s0rmz6
I1122 14:40:10.246682       7 vpc_lb.go:50] delete vpc lb deployment for vpc-vpc-s1s0rmz6-lb
I1122 14:40:10.247360       7 subnet.go:999] delete u2o interconnection policy route for subnet subnet-ue9ralxi
--
E1122 14:40:16.058895       7 pod.go:487] subnet.kubeovn.io "subnet-ue9ralxi" not found
E1122 14:40:16.058902       7 pod.go:269] failed to get newPod nets subnet.kubeovn.io "subnet-ue9ralxi" not found
E1122 14:40:16.065431       7 pod.go:1345] failed to get subnet subnet.kubeovn.io "subnet-ue9ralxi" not found
E1122 14:40:16.065444       7 pod.go:487] subnet.kubeovn.io "subnet-ue9ralxi" not found
E1122 14:40:16.065450       7 pod.go:269] failed to get newPod nets subnet.kubeovn.io "subnet-ue9ralxi" not found
I1122 14:40:16.068362       7 pod.go:249] enqueue delete pod kube-system/vpc-nat-gw-eip-0aineocq-0
I1122 14:40:16.068373       7 network_policy.go:144] handle add/update network policy argocd/argocd-repo-server-network-policy
I1122 14:40:16.068384       7 pod.go:922] handle delete pod kube-system/vpc-nat-gw-eip-0aineocq-0
I1122 14:40:16.068416       7 network_policy.go:144] handle add/update network policy argocd/argocd-dex-server-network-policy
I1122 14:40:16.068412       7 network_policy.go:144] handle add/update network policy argocd/argocd-application-controller-network-policy
I1122 14:40:16.068715       7 network_policy.go:213] UpdateNp, releated subnet protocols [IPv4]
I1122 14:40:16.068781       7 network_policy.go:213] UpdateNp, releated subnet protocols [IPv4]
I1122 14:40:16.068843       7 network_policy.go:213] UpdateNp, releated subnet protocols [IPv4]
I1122 14:40:16.069949       7 network_policy.go:289] UpdateNp Ingress, allows is [110.64.0.62 10.96.35.206 10.110.218.19 10.98.110.140], excepts is [], log false, protocol IPv4
I1122 14:40:16.070017       7 network_policy.go:289] UpdateNp Ingress, allows is [110.64.0.62 10.110.218.19 10.98.110.140 10.96.35.206 110.64.0.170 10.104.54.131], excepts is [], log false, protocol IPv4
E1122 14:40:16.071541       7 pod.go:1345] failed to get subnet subnet.kubeovn.io "subnet-ue9ralxi" not found
E1122 14:40:16.071562       7 pod.go:487] subnet.kubeovn.io "subnet-ue9ralxi" not found
E1122 14:40:16.071573       7 pod.go:959] failed to get pod nets subnet.kubeovn.io "subnet-ue9ralxi" not found
I1122 14:40:16.072310       7 pod.go:1026] release all ip address for deleting pod kube-system/vpc-nat-gw-eip-0aineocq-0
I1122 14:40:16.072382       7 subnet.go:496] release v4 181.16.3.92 mac  from subnet net-kquq4ao2 for kube-system/vpc-nat-gw-eip-0aineocq-0, add ip to released list
I1122 14:40:16.072409       7 pod.go:439] take 4 ms to handle delete pod kube-system/vpc-nat-gw-eip-0aineocq-0
I1122 14:40:27.522966       7 pod.go:249] enqueue delete pod vm-74f00bab/virt-launcher-i-c8nsakoo-4gqtd
I1122 14:40:27.522989       7 network_policy.go:144] handle add/update network policy vm-74f00bab/network-policy-tag
--
E1122 14:40:43.936459       7 pod.go:487] subnet.kubeovn.io "subnet-0keg5526" not found
E1122 14:40:43.936470       7 pod.go:959] failed to get pod nets subnet.kubeovn.io "subnet-0keg5526" not found
I1122 14:40:43.937205       7 pod.go:1026] release all ip address for deleting pod kube-system/vpc-nat-gw-eip-3wsdxa2b-0
I1122 14:40:43.937285       7 subnet.go:496] release v4 181.16.3.90 mac  from subnet net-kquq4ao2 for kube-system/vpc-nat-gw-eip-3wsdxa2b-0, add ip to released list
I1122 14:40:43.937301       7 pod.go:439] take 4 ms to handle delete pod kube-system/vpc-nat-gw-eip-3wsdxa2b-0
I1122 14:40:44.057730       7 pod.go:922] handle delete pod kube-system/vpc-nat-gw-eip-0aineocq-0
E1122 14:40:44.062110       7 pod.go:1345] failed to get subnet subnet.kubeovn.io "subnet-ue9ralxi" not found
E1122 14:40:44.062131       7 pod.go:487] subnet.kubeovn.io "subnet-ue9ralxi" not found
E1122 14:40:44.062143       7 pod.go:959] failed to get pod nets subnet.kubeovn.io "subnet-ue9ralxi" not found
I1122 14:40:44.062834       7 pod.go:1026] release all ip address for deleting pod kube-system/vpc-nat-gw-eip-0aineocq-0
I1122 14:40:44.062879       7 pod.go:439] take 5 ms to handle delete pod kube-system/vpc-nat-gw-eip-0aineocq-0
I1122 14:40:44.613200       7 pod.go:347] enqueue update pod vm-74f00bab/virt-launcher-i-wzi29pk9-4d2nr
I1122 14:40:44.613240       7 pod.go:519] handle add/update pod vm-74f00bab/virt-launcher-i-wzi29pk9-4d2nr

不清楚为什么这条日志执行后“release v4 181.16.3.92 mac from subnet net-kquq4ao2 for kube-system/vpc-nat-gw-eip-0aineocq-0, add ip to released list”,eip到ips资源为什么没有被清理掉

最后的结果是nat网关的默认网络ips资源清理掉了,但是net1网卡的IPs资源没有清理
5161732270446_ pic

@zbb88888
Copy link
Collaborator

好的,我们再继续跟一下

Copy link
Contributor

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working no-issue-activity subnet
Projects
None yet
Development

No branches or pull requests

2 participants