Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confd fails to find etcd using SRV records if there are more than 3 SRV records #285

Closed
mariusgrigaitis opened this issue May 8, 2015 · 4 comments

Comments

@mariusgrigaitis
Copy link
Contributor

confd -srv-domain="home24.services"
2015-05-08T15:22:40+02:00 cluster-node1 confd[21522]: INFO SRV domain set to company.services
2015-05-08T15:22:40+02:00 cluster-node1 confd[21522]: FATAL Cannot get nodes from SRV records lookup _etcd._tcp.company.services on 10.128.1.82:53: cannot unmarshal DNS message
dig srv _etcd._tcp.company.services
;; Truncated, retrying in TCP mode.

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> srv _etcd._tcp.company.services
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41774
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 4

;; QUESTION SECTION:
;_etcd._tcp.company.services.   IN  SRV

;; ANSWER SECTION:
_etcd._tcp.company.services. 19 IN  SRV 10 25 2379 0eb1f60388dc71cc5b8060048e9472b7._etcd._tcp.company.services.
_etcd._tcp.company.services. 19 IN  SRV 10 25 2379 91040f0bac880b58cbd43b7313e9aa6e._etcd._tcp.company.services.
_etcd._tcp.company.services. 19 IN  SRV 10 25 2379 577f3685642fbcc5214cdd88bbb318eb._etcd._tcp.company.services.
_etcd._tcp.company.services. 19 IN  SRV 10 25 2379 f938bd262b732df5e49cd6ffb8a7e5a3._etcd._tcp.company.services.

;; ADDITIONAL SECTION:
0eb1f60388dc71cc5b8060048e9472b7._etcd._tcp.company.services. 20 IN A 10.0.0.1
91040f0bac880b58cbd43b7313e9aa6e._etcd._tcp.company.services. 19 IN A 10.0.0.2
577f3685642fbcc5214cdd88bbb318eb._etcd._tcp.company.services. 19 IN A 10.0.0.3
f938bd262b732df5e49cd6ffb8a7e5a3._etcd._tcp.company.services. 20 IN A 10.0.0.4

Notice the ;; Truncated, retrying in TCP mode. in dig output.

As we debugged it, the error comes from golang net package.

https://golang.org/src/net/dnsclient_unix.go (line 49)

@miekg
Copy link

miekg commented May 14, 2015

Truncated?! With only 3 SRV records in the message? What's the initial size used to get the answer, 512 B? And yes, it's the clients responsibility to retry with TCP when the TC bit is set on a message.

@miekg
Copy link

miekg commented May 14, 2015

Although I'm not sure if the DNS stub (i.e. libc) on the local machine actually does this... and SkyDNS should try harder to make the answer fit...

@elsonrodriguez
Copy link
Contributor

I gave a shot a reproducing this, didn't have much luck. I tried to reproduce with golang 1.6, with both cgo and pure go resolvers.

Environment

Golang

$ go version
go version go1.6 darwin/amd64

DNS SRV Records

dig SRV _etcd-client._tcp.confd.io

; <<>> DiG 9.8.3-P1 <<>> SRV _etcd-client._tcp.confd.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55503
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;_etcd-client._tcp.confd.io.    IN  SRV

;; ANSWER SECTION:
_etcd-client._tcp.confd.io. 53  IN  SRV 0 0 2379 etcd0.confd.io.
_etcd-client._tcp.confd.io. 53  IN  SRV 0 0 12379 etcd1.confd.io.
_etcd-client._tcp.confd.io. 53  IN  SRV 0 0 22379 etcd2.confd.io.
_etcd-client._tcp.confd.io. 53  IN  SRV 0 0 32379 etcd3.confd.io.
_etcd-client._tcp.confd.io. 53  IN  SRV 0 0 42379 etcd4.confd.io.

All etcd nodes resolve to localhost (127.0.0.1)

dig etcd0.confd.io

; <<>> DiG 9.8.3-P1 <<>> etcd0.confd.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31591
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;etcd0.confd.io.            IN  A

;; ANSWER SECTION:
etcd0.confd.io.     300 IN  A   127.0.0.1

;; Query time: 410 msec
;; SERVER: 10.0.1.1#53(10.0.1.1)
;; WHEN: Mon Feb 22 16:25:57 2016
;; MSG SIZE  rcvd: 48

etcd cluster

The following Procfile was used to provision a 5 node etcd cluster on my local machine:
https://github.com/kelseyhightower/confd/blob/master/contrib/etcd/Procfile

With netdns

The following confd run uses the native Go DNS resolver (netgo)

GODEBUG=netdns=go ./build
./bin/confd -srv-record _etcd-client._tcp.confd.io
2016-02-22T16:17:04-08:00 ireul.local ./bin/confd[97698]: INFO SRV record set to _etcd-client._tcp.confd.io
2016-02-22T16:17:04-08:00 ireul.local ./bin/confd[97698]: INFO Backend set to etcd
2016-02-22T16:17:04-08:00 ireul.local ./bin/confd[97698]: INFO Starting confd
2016-02-22T16:17:04-08:00 ireul.local ./bin/confd[97698]: INFO Backend nodes set to http://etcd0.confd.io:2379, http://etcd1.confd.io:12379, http://etcd2.confd.io:22379, http://etcd3.confd.io:32379, http://etcd4.confd.io:42379
2016-02-22T16:17:04-08:00 ireul.local ./bin/confd[97698]: INFO Target config /tmp/myconfig.conf out of sync
2016-02-22T16:17:04-08:00 ireul.local ./bin/confd[97698]: INFO Target config /tmp/myconfig.conf has been updated

Notice all 5 etcd backends where discovered via the DNS SRV record.

With cgo

The following confd run used the C resolver on OS X.

GODEBUG=netdns=cgo ./build
./bin/confd -srv-record _etcd-client._tcp.confd.io
2016-02-22T16:20:39-08:00 ireul.local ./bin/confd[97899]: INFO SRV record set to _etcd-client._tcp.confd.io
2016-02-22T16:20:39-08:00 ireul.local ./bin/confd[97899]: INFO Backend set to etcd
2016-02-22T16:20:39-08:00 ireul.local ./bin/confd[97899]: INFO Starting confd
2016-02-22T16:20:39-08:00 ireul.local ./bin/confd[97899]: INFO Backend nodes set to http://etcd0.confd.io:2379, http://etcd1.confd.io:12379, http://etcd2.confd.io:22379, http://etcd3.confd.io:32379, http://etcd4.confd.io:42379
2016-02-22T16:20:40-08:00 ireul.local ./bin/confd[97899]: INFO Target config /tmp/myconfig.conf out of sync
2016-02-22T16:20:40-08:00 ireul.local ./bin/confd[97899]: INFO Target config /tmp/myconfig.conf has been updated

Notice all 5 etcd backends where discovered via the DNS SRV record.

Any chance you can try again using the latest golang, and confd? This might not be an issue any more.

@kelseyhightower
Copy link
Owner

@elsonrodriguez Thanks for jumping in on this one. I've verified your results and it all looks good to me. I'm closing this one out, feel free to reopen if you are still having issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants