-
Notifications
You must be signed in to change notification settings - Fork 84
Closed
Description
I have some frontends using kuberesolver to update and find backends and talk to them. I deployed updated backends and one of the frontends never updated to see the new backends, just started erroring with:
2017/11/19 13:02:10 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.2.10:9095: getsockopt: connection refused"; Reconnecting to {10.60.2.10:9095 <nil>}
On the other frontends, at around the same time, I get:
2017/11/19 13:02:10 kuberesolver: 10.60.2.10:9095 DELETED from querier
2017/11/19 13:02:10 Failed to dial 10.60.2.10:9095: context canceled; please retry.
2017/11/19 13:02:27 kuberesolver: 10.60.2.11:9095 ADDED to querier
2017/11/19 13:02:37 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.1.8:9095: getsockopt: connection refused"; Reconnecting to {10.60.1.8:9095 <nil>}
2017/11/19 13:02:37 kuberesolver: 10.60.1.8:9095 DELETED from querier
2017/11/19 13:02:37 Failed to dial 10.60.1.8:9095: context canceled; please retry.
2017/11/19 13:02:53 kuberesolver: 10.60.1.18:9095 ADDED to querier
2017/11/19 13:03:03 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.0.8:9095: getsockopt: connection refused"; Reconnecting to {10.60.0.8:9095 <nil>}
2017/11/19 13:03:03 kuberesolver: 10.60.0.8:9095 DELETED from querier
Looking at the goroutine dump for the frontend with the failures, I see that one of the watch goroutines is just sitting there (maybe didn't get events from api-server?):
goroutine 30 [select, 2397 minutes]:
github.com/sercand/kuberesolver.(*kubeResolver).watch(0xc4201c3f20, 0xc4201c3f0d, 0x7, 0xc4201c4c60, 0xc4201c4c00, 0x0, 0x0)
vendor/github.com/sercand/kuberesolver/resolver.go:73 +0x56b
github.com/sercand/kuberesolver.(*kubeResolver).Resolve.func1()
vendor/github.com/sercand/kuberesolver/resolver.go:38 +0x52
github.com/sercand/kuberesolver.until.func1(0xc4201c1cb0)
vendor/github.com/sercand/kuberesolver/util.go:20 +0x43
github.com/sercand/kuberesolver.until(0xc4201c1cb0, 0x3b9aca00, 0xc4201c4c60)
vendor/github.com/sercand/kuberesolver/util.go:21 +0x73
created by github.com/sercand/kuberesolver.(*kubeResolver).Resolve
vendor/github.com/sercand/kuberesolver/resolver.go:42 +0x1ac
The only other similar stack trace:
goroutine 50 [select, 28 minutes]:
github.com/sercand/kuberesolver.(*kubeResolver).watch(0xc420226740, 0xc42022672d, 0xb, 0xc4201c4f60, 0xc4201c4f00, 0x0, 0x0)
vendor/github.com/sercand/kuberesolver/resolver.go:73 +0x56b
github.com/sercand/kuberesolver.(*kubeResolver).Resolve.func1()
vendor/github.com/sercand/kuberesolver/resolver.go:38 +0x52
github.com/sercand/kuberesolver.until.func1(0xc420230060)
vendor/github.com/sercand/kuberesolver/util.go:20 +0x43
github.com/sercand/kuberesolver.until(0xc420230060, 0x3b9aca00, 0xc4201c4f60)
vendor/github.com/sercand/kuberesolver/util.go:21 +0x73
created by github.com/sercand/kuberesolver.(*kubeResolver).Resolve
vendor/github.com/sercand/kuberesolver/resolver.go:42 +0x1ac
Which corresponds nicely with the two kuberesolver-d backend services I have.
Perhaps there should be a timeout in this watch, to catch intermittent errors like this? I think this is how the kubernetes golang client behaves.
Metadata
Metadata
Assignees
Labels
No labels