Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci-ingress-gce-e2e-scale is failing #438

Closed
MrHohn opened this issue Aug 16, 2018 · 12 comments
Closed

ci-ingress-gce-e2e-scale is failing #438

MrHohn opened this issue Aug 16, 2018 · 12 comments
Assignees

Comments

@MrHohn
Copy link
Member

MrHohn commented Aug 16, 2018

Ingress scale job (https://k8s-testgrid.appspot.com/sig-network-ingress-gce-e2e#ingress-gce-e2e-scale) has been failing since 08-13. Most likely due to resource leakage. Sample failure:

[sig-network] Loadbalancing: L7 Scalability GCE [Slow] [Serial] [Feature:IngressScale] Creating and updating ingresses should happen promptly with small/medium/large amount of ingresses 2h41m
go run hack/e2e.go -v --test --test_args='--ginkgo.focus=\[sig\-network\]\sLoadbalancing\:\sL7\sScalability\sGCE\s\[Slow\]\s\[Serial\]\s\[Feature\:IngressScale\]\sCreating\sand\supdating\singresses\sshould\shappen\spromptly\swith\ssmall\/medium\/large\samount\sof\singresses$'

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/ingress_scale.go:57
Aug 16 02:13:01.499: Unexpected error while running ingress scale test: [Ingress failed to acquire an IP address within 1h20m0s Ingress failed to acquire an IP address within 1h20m0s Ingress failed to acquire an IP address within 1h20m0s Ingress failed to acquire an IP address within 1h20m0s Ingress failed to acquire an IP address within 1h20m0s Ingress failed to acquire an IP address within 1h20m0s]
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/network/ingress_scale.go:59

/assign

@rramkumar1
Copy link
Contributor

Most likely same issue as https://k8s-testgrid.appspot.com/sig-network-ingress-gce-e2e#ingress-gce-e2e

I suspect there is some GC issue that was introduced.

@MrHohn
Copy link
Member Author

MrHohn commented Aug 16, 2018

Most likely due to resource leakage.

I said that too quick, in the latest run test hit quota issue, but from some previous runs (e.g. https://storage.googleapis.com/kubernetes-jenkins/logs/ci-ingress-gce-e2e-scale/742/artifacts/e2e-742-886f8-master/glbc.log) I saw some nil pointer exceptions:

I0814 17:50:03.663632       1 l7s.go:66] Creating l7 e2e-tests-ingress-scale-fhhpn-ing-scale-5--2d57c5cb5515f4cb
E0814 17:50:03.767275       1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/panic.go:63
/usr/local/go/src/runtime/signal_unix.go:388
/go/src/k8s.io/ingress-gce/pkg/loadbalancers/url_maps.go:66
/go/src/k8s.io/ingress-gce/pkg/loadbalancers/l7.go:122
/go/src/k8s.io/ingress-gce/pkg/loadbalancers/l7s.go:88
/go/src/k8s.io/ingress-gce/pkg/controller/controller.go:370
/go/src/k8s.io/ingress-gce/pkg/controller/controller.go:297
/go/src/k8s.io/ingress-gce/pkg/controller/controller.go:115
/go/src/k8s.io/ingress-gce/pkg/utils/taskqueue.go:90
/go/src/k8s.io/ingress-gce/pkg/utils/taskqueue.go:58
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/k8s.io/ingress-gce/pkg/utils/taskqueue.go:58
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1693311]

goroutine 84 [running]:
k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x19a5260, 0x29b0020)
	/usr/local/go/src/runtime/panic.go:502 +0x229
k8s.io/ingress-gce/pkg/loadbalancers.(*L7).ensureComputeURLMap(0xc420cf4fc0, 0xc42062d1d0, 0x10)
	/go/src/k8s.io/ingress-gce/pkg/loadbalancers/url_maps.go:66 +0x231
k8s.io/ingress-gce/pkg/loadbalancers.(*L7).edgeHop(0xc420cf4fc0, 0x1e58238, 0xc42041a9b0)
	/go/src/k8s.io/ingress-gce/pkg/loadbalancers/l7.go:122 +0x2f
k8s.io/ingress-gce/pkg/loadbalancers.(*L7s).Sync(0xc42058b470, 0xc42043d500, 0x0, 0x0)
	/go/src/k8s.io/ingress-gce/pkg/loadbalancers/l7s.go:88 +0x153
k8s.io/ingress-gce/pkg/controller.(*LoadBalancerController).ensureIngress(0xc420593110, 0xc420a4fce0, 0xc4203ff0c0, 0x3, 0x4, 0x1, 0x0)
	/go/src/k8s.io/ingress-gce/pkg/controller/controller.go:370 +0x77e
k8s.io/ingress-gce/pkg/controller.(*LoadBalancerController).sync(0xc420593110, 0xc420405080, 0x29, 0xc4205cc201, 0xc42057e6d0)
	/go/src/k8s.io/ingress-gce/pkg/controller/controller.go:297 +0x3bb
k8s.io/ingress-gce/pkg/controller.(*LoadBalancerController).(k8s.io/ingress-gce/pkg/controller.sync)-fm(0xc420405080, 0x29, 0xf, 0xc4207e3dd8)
	/go/src/k8s.io/ingress-gce/pkg/controller/controller.go:115 +0x3e
k8s.io/ingress-gce/pkg/utils.(*PeriodicTaskQueue).worker(0xc4206ca640)
	/go/src/k8s.io/ingress-gce/pkg/utils/taskqueue.go:90 +0x17a
k8s.io/ingress-gce/pkg/utils.(*PeriodicTaskQueue).(k8s.io/ingress-gce/pkg/utils.worker)-fm()
	/go/src/k8s.io/ingress-gce/pkg/utils/taskqueue.go:58 +0x2a
k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc4202b67a8)
	/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc4207e3fa8, 0x3b9aca00, 0x0, 0x1, 0xc420d33740)
	/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc4202b67a8, 0x3b9aca00, 0xc420d33740)
	/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
k8s.io/ingress-gce/pkg/utils.(*PeriodicTaskQueue).Run(0xc4206ca640, 0x3b9aca00, 0xc420d33740)
	/go/src/k8s.io/ingress-gce/pkg/utils/taskqueue.go:58 +0x55
created by k8s.io/ingress-gce/pkg/controller.(*LoadBalancerController).Run
	/go/src/k8s.io/ingress-gce/pkg/controller/controller.go:224 +0x86

@MrHohn
Copy link
Member Author

MrHohn commented Aug 16, 2018

That nil pointer issue seems to be fixed by #434 already.

@MrHohn
Copy link
Member Author

MrHohn commented Aug 16, 2018

From

it seems like GLBC kept failing liveness probe, hence kubelet was constantly killing it:

I0814 00:06:30.876030    1525 http.go:99] Probe failed for http://10.40.0.2:8086/healthz with request headers map[User-Agent:[kube-probe/1.12+]], response body: ingress: err: googleapi: Error 404: The resource 'projects/k8s-ingress-boskos-20/global/backendServices/foo' was not found, notFound
neg-controller: OK
I0814 00:06:30.876091    1525 prober.go:111] Liveness probe for "l7-lb-controller-v1.2.2-e2e-2694-58614-master_kube-system(833af3e7141c0c4697f5553859fa4556):l7-lb-controller" failed (failure): HTTP probe failed with statuscode: 500
I0814 00:06:30.879772    1525 server.go:458] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"l7-lb-controller-v1.2.2-e2e-2694-58614-master", UID:"833af3e7141c0c4697f5553859fa4556", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{l7-lb-controller}"}): type: 'Warning' reason: 'Unhealthy' Liveness probe failed: HTTP probe failed with statuscode: 500

Probably related to neg controller? @freehan

@MrHohn
Copy link
Member Author

MrHohn commented Aug 17, 2018

#441 should fix the liveness probe issue.

@rramkumar1
Copy link
Contributor

Latest ingress-gce-e2e run passed. I suspect the next scale test run will pass as well. Seemed like this was all being caused by the liveness probe issue.

@rramkumar1
Copy link
Contributor

@MrHohn Looks like latest scale test failures are being caused by quota issues.

@MrHohn
Copy link
Member Author

MrHohn commented Aug 17, 2018

Manually cleaned up resource. Let's see if the next run goes green.

@rramkumar1
Copy link
Contributor

Tests are back green.
/close

@MrHohn
Copy link
Member Author

MrHohn commented Sep 6, 2018

Scale test started failing again since 08/30 (https://k8s-testgrid.appspot.com/sig-network-ingress-gce-e2e#ingress-gce-e2e&width=20).

GLBC kept panicing during test (https://storage.googleapis.com/kubernetes-jenkins/logs/ci-ingress-gce-e2e-scale/811/artifacts/e2e-811-886f8-master/glbc.log):

I0901 04:33:33.346271       1 reflector.go:240] Listing and watching *v1.Endpoints from k8s.io/ingress-gce/pkg/context/context.go:181
E0901 04:33:33.417632       1 runtime.go:66] Observed a panic: &runtime.TypeAssertionError{interfaceString:"interface {}", concreteString:"bool", assertedString:"*compute.BackendService", missingMethod:""} (interface conversion: interface {} is bool, not *compute.BackendService)
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/iface.go:252
/go/src/k8s.io/ingress-gce/pkg/backends/backends.go:59
/go/src/k8s.io/ingress-gce/pkg/storage/pools.go:107
/go/src/k8s.io/ingress-gce/pkg/storage/pools.go:153
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: interface conversion: interface {} is bool, not *compute.BackendService [recovered]
	panic: interface conversion: interface {} is bool, not *compute.BackendService

goroutine 68 [running]:
k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x19ba740, 0xc420b14200)
	/usr/local/go/src/runtime/panic.go:502 +0x229
k8s.io/ingress-gce/pkg/backends.NewPool.func1(0x1821360, 0x296e7c1, 0x1, 0x1, 0x0, 0x0)
	/go/src/k8s.io/ingress-gce/pkg/backends/backends.go:59 +0x148
k8s.io/ingress-gce/pkg/storage.(*CloudListingPool).ReplenishPool(0xc42054f580)
	/go/src/k8s.io/ingress-gce/pkg/storage/pools.go:107 +0x19f
k8s.io/ingress-gce/pkg/storage.(*CloudListingPool).ReplenishPool-fm()
	/go/src/k8s.io/ingress-gce/pkg/storage/pools.go:153 +0x2a
k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc420302ec0)
	/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc420302ec0, 0x6fc23ac00, 0x0, 0x1, 0xc42062d1a0)
	/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc420302ec0, 0x6fc23ac00, 0xc42062d1a0)
	/go/src/k8s.io/ingress-gce/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by k8s.io/ingress-gce/pkg/storage.NewCloudListingPool
	/go/src/k8s.io/ingress-gce/pkg/storage/pools.go:153 +0x273

@MrHohn
Copy link
Member Author

MrHohn commented Sep 19, 2018

Manually deleted leftover resource again. Let's see if the next run turns green.

@MrHohn
Copy link
Member Author

MrHohn commented Sep 19, 2018

Test turns green :)

@MrHohn MrHohn closed this as completed Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants