Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENI分配失败导致IP分配以及GC停止工作 #100

Closed
mengskysama opened this issue May 30, 2020 · 3 comments
Closed

ENI分配失败导致IP分配以及GC停止工作 #100

mengskysama opened this issue May 30, 2020 · 3 comments

Comments

@mengskysama
Copy link
Contributor

重现方法

  • 创建带有较小数量IP的集群
  • 一次性创建出超过IP数量的POD
  • 减小POD副本数到1
  • 调整POD副本到合理范围,观察IP分配状态
  • 此时集群POD无法正常分配IP,观察GC无法被触发
  • 且重启terway无法回复正常

一些日志

加了一些日志来排查问题

time="2020-05-29T14:59:52Z" level=debug msg="waiting popResult: 1"
time="2020-05-29T14:59:52Z" level=warning msg="Assign private ip address failed: Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress., retrying"
time="2020-05-29T14:59:52Z" level=debug msg="allocated ips for eni: eni = &{ID:eni-xxx Name:eth4 Address:{IP:10.110.12.15 Mask:ffffff80} MAC:00:16:3e:00:2b:9b Gateway:10.110.12.125 DeviceNumber:82 MaxIPs:20 VSwitch:vsw-xxx}, ips = [], err = error assign address for eniID: eni-xxx, Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress.: Assign private ip address failed: Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress."
time="2020-05-29T14:59:52Z" level=error msg="error allocate ips for eni: error assign address for eniID: eni-xxx, Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress.: Assign private ip address failed: Aliyun API Error: RequestId: AD60191D-4184-45FC-9709-2BA52FA85538 Status Code: 403 Code: InvalidVSwitchId.IpNotEnough Message: The specified VSwitch \"vsw-xxx\" has not enough IpAddress."
time="2020-05-29T14:59:52Z" level=info msg="eni's associated vswitch vsw-xxx has no available IP, set eni ipAllocInhibitExpireAt = 2020-05-29 15:09:52"
time="2020-05-29T14:59:52Z" level=debug msg="waiting popResult: done"

因为ENI创建失败,超过maxIPBacklog,第11次调用Alloc之后所有Alloc都处于异常状态

time="2020-05-29T14:59:56Z" level=debug msg="simpleObjectPool wait tokenCh or ctx.Done"
time="2020-05-29T14:59:56Z" level=debug msg="simpleObjectPool p.factory.Create(1) begin"
time="2020-05-29T14:59:56Z" level=debug msg=submit
time="2020-05-29T14:59:56Z" level=info msg="adjusted vswitch slice: [], original eni slice: [0xc000046080 0xc000a02480 0xc0010de780 0xc000046380 0xc000a02080 0xc0009f9580]"
...
ime="2020-05-29T14:59:56Z" level=debug msg="Create submit begin waiting:1 count:1"
time="2020-05-29T14:59:56Z" level=debug msg="Create submit done initENIIPCount:0  f.eniMaxIP:20  waiting:1"
time="2020-05-29T14:59:56Z" level=debug msg="waiting popResult: 1"
time="2020-05-29T15:00:48Z" level=debug msg="do resource gc on node"
time="2020-05-29T15:00:48Z" level=debug msg="GC: try lock ..."

原因

ENI创建失败之后,allocateWorker不会被执行

go eni.allocateWorker(f.ipResultChan)

导致popResult操作channel被饿死

func (f *eniIPFactory) popResult() (ip *types.ENIIP, err error) {
	result := <-f.ipResultChan
	if result.ENIIP == nil || result.err != nil {
@BSWANG
Copy link
Member

BSWANG commented Jun 1, 2020

主要问题还是在于vswitch满了导致新的ENI无法创建和分配出来,这时候kubelet会一直重试cni占用Pod操作的锁导致GC无法执行。

出现问题时可以通过以下命令获取terway的调用栈

curl --unix-socket /var/run/eni/eni_debug.socket http://x/debug/pprof/goroutine?debug=2

@BSWANG
Copy link
Member

BSWANG commented Jun 2, 2020

#102 #103

@BSWANG
Copy link
Member

BSWANG commented Jun 2, 2020

Fixed by #102 #103
terway version: v1.0.10.173-g0c65df8-aliyun

@BSWANG BSWANG closed this as completed Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants