Skip to content

kuberesolver didn't update endpoints when service changed #4

@tomwilkie

Description

@tomwilkie

I have some frontends using kuberesolver to update and find backends and talk to them. I deployed updated backends and one of the frontends never updated to see the new backends, just started erroring with:

2017/11/19 13:02:10 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.2.10:9095: getsockopt: connection refused"; Reconnecting to {10.60.2.10:9095 <nil>}

On the other frontends, at around the same time, I get:

2017/11/19 13:02:10 kuberesolver: 10.60.2.10:9095 DELETED from querier
2017/11/19 13:02:10 Failed to dial 10.60.2.10:9095: context canceled; please retry.
2017/11/19 13:02:27 kuberesolver: 10.60.2.11:9095 ADDED to querier
2017/11/19 13:02:37 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.1.8:9095: getsockopt: connection refused"; Reconnecting to {10.60.1.8:9095 <nil>}
2017/11/19 13:02:37 kuberesolver: 10.60.1.8:9095 DELETED from querier
2017/11/19 13:02:37 Failed to dial 10.60.1.8:9095: context canceled; please retry.
2017/11/19 13:02:53 kuberesolver: 10.60.1.18:9095 ADDED to querier
2017/11/19 13:03:03 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 10.60.0.8:9095: getsockopt: connection refused"; Reconnecting to {10.60.0.8:9095 <nil>}
2017/11/19 13:03:03 kuberesolver: 10.60.0.8:9095 DELETED from querier

Looking at the goroutine dump for the frontend with the failures, I see that one of the watch goroutines is just sitting there (maybe didn't get events from api-server?):

goroutine 30 [select, 2397 minutes]:
github.com/sercand/kuberesolver.(*kubeResolver).watch(0xc4201c3f20, 0xc4201c3f0d, 0x7, 0xc4201c4c60, 0xc4201c4c00, 0x0, 0x0)
	vendor/github.com/sercand/kuberesolver/resolver.go:73 +0x56b
github.com/sercand/kuberesolver.(*kubeResolver).Resolve.func1()
	vendor/github.com/sercand/kuberesolver/resolver.go:38 +0x52
github.com/sercand/kuberesolver.until.func1(0xc4201c1cb0)
	vendor/github.com/sercand/kuberesolver/util.go:20 +0x43
github.com/sercand/kuberesolver.until(0xc4201c1cb0, 0x3b9aca00, 0xc4201c4c60)
	vendor/github.com/sercand/kuberesolver/util.go:21 +0x73
created by github.com/sercand/kuberesolver.(*kubeResolver).Resolve
	vendor/github.com/sercand/kuberesolver/resolver.go:42 +0x1ac

The only other similar stack trace:

goroutine 50 [select, 28 minutes]:
github.com/sercand/kuberesolver.(*kubeResolver).watch(0xc420226740, 0xc42022672d, 0xb, 0xc4201c4f60, 0xc4201c4f00, 0x0, 0x0)
	vendor/github.com/sercand/kuberesolver/resolver.go:73 +0x56b
github.com/sercand/kuberesolver.(*kubeResolver).Resolve.func1()
	vendor/github.com/sercand/kuberesolver/resolver.go:38 +0x52
github.com/sercand/kuberesolver.until.func1(0xc420230060)
	vendor/github.com/sercand/kuberesolver/util.go:20 +0x43
github.com/sercand/kuberesolver.until(0xc420230060, 0x3b9aca00, 0xc4201c4f60)
	vendor/github.com/sercand/kuberesolver/util.go:21 +0x73
created by github.com/sercand/kuberesolver.(*kubeResolver).Resolve
	vendor/github.com/sercand/kuberesolver/resolver.go:42 +0x1ac

Which corresponds nicely with the two kuberesolver-d backend services I have.

Perhaps there should be a timeout in this watch, to catch intermittent errors like this? I think this is how the kubernetes golang client behaves.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions