Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

possible malformed response with large responses #326

Closed
spacejam opened this issue Oct 14, 2015 · 8 comments · Fixed by #330
Closed

possible malformed response with large responses #326

spacejam opened this issue Oct 14, 2015 · 8 comments · Fixed by #330
Assignees

Comments

@spacejam
Copy link
Contributor

[root@ip-10-0-1-234 ~]# tcpdump -vvv -s 0 -l -n port 53
20:17:51.513349 IP (tos 0x0, ttl 64, id 56497, offset 0, flags [DF], proto UDP (17), length 74)
    10.0.1.234.37336 > 10.0.4.245.domain: [bad udp cksum 0x1b26 -> 0x84d3!] 34960+ SRV? _etcd-server._tcp.etcd.mesos. (46)
20:17:51.514077 IP (tos 0x0, ttl 64, id 25692, offset 0, flags [DF], proto UDP (17), length 740)
    10.0.4.245.domain > 10.0.1.234.37336: [udp sum ok] 34960*| q: SRV? _etcd-server._tcp.etcd.mesos. 9/0/9 _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-19237-s0.etcd.mesos.:1025 0 0, _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-43841-s3.etcd.mesos.:1026 0 0, _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-43841-s3.etcd.mesos.:1025 0 0, _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-55660-s1.etcd.mesos.:1025 0 0, _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-19237-s0.etcd.mesos.:1027 0 0, _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-55660-s1.etcd.mesos.:1027 0 0, _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-19237-s0.etcd.mesos.:1026 0 0, _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-55660-s1.etcd.mesos.:1026 0 0, _etcd-server._tcp.etcd.mesos. [1m] SRV etcd-server-43841-s3.etcd.mesos.:1027 0 0 ar: etcd-server-19237-s0.etcd.mesos. [1m] A 10.0.1.236, etcd-server-19237-s0.etcd.mesos. [1m] A 10.0.1.236, etcd-server-19237-s0.etcd.mesos. [1m] A 10.0.1.236, etcd-server-55660-s1.etcd.mesos. [1m] A 10.0.1.235, etcd-server-55660-s1.etcd.mesos. [1m] A 10.0.1.235, etcd-server-55660-s1.etcd.mesos. [1m] A 10.0.1.235, etcd-server-43841-s3.etcd.mesos. [1m] A 10.0.1.234, etcd-server-43841-s3.etcd.mesos. [1m] A 10.0.1.234, etcd-server-43841-s3.etcd.mesos. [1m] A 10.0.1.234 (712)

In particular, note the length 740. This manifests as SRV record lookup failures in go. Dig works, however, so it's not clear if we are mishandling large responses or if go's srv lookup code is buggy. Go is able to resolve small responses, when I only query for _leader._tcp.mesos. I'm working around this in etcd-mesos by providing discoveryinfo that will limit the size of the response.

@s-urbaniak
Copy link

  1. What version of the project are you using?
    fd4c5fc
  2. What operating system and processor architecture are you using?
    Mac OSX, amd64
  3. What did you do?
    See below
  4. What did you expect to see?
    We expect to see a valid response from net.LookupSRV
  5. What did you see instead?
    lookup _manyports._tcp.marathon.mesos on 127.0.0.1:53: cannot unmarshal DNS message

Zookeeper:

$ zkServer start-foreground

Mesos master:

$ /usr/local/sbin/mesos-master --registry=in_memory --zk='zk://127.0.0.1:2181/mesos' --ip=127.0.0.1

Mesos slave:

$ /usr/local/sbin/mesos-slave --master=127.0.0.1:5050

mesos-dns config.json:

{
  "zk": "zk://localhost:2181/mesos",
  "refreshSeconds": 2,
  "ttl": 60,
  "domain": "mesos",
  "port": 53,
  "resolvers": ["8.8.8.8"],
  "timeout": 5, 
  "httpon": true,
  "dsnon": true,
  "httpport": 8123,
  "externalon": true,
  "listener": "127.0.0.1",
  "SOAMname": "root.ns1.mesos",
  "SOARname": "ns1.mesos",
  "SOARefresh": 60,
  "SOARetry":   600,
  "SOAExpire":  86400,
  "SOAMinttl": 60
}

having mesos-dns as the primare dns server:

$ cat /etc/resolv.conf
nameserver 127.0.0.1

start mesos 10 tasks each having 5 port mappings, i.e. using marathon:

{
  "id": "manyports",
  "cmd": "sleep 500",
  "cpus": 0.2,
  "mem": 128,
  "instances": 10,
  "ports": [
    10500,
    10501,
    10502,
    10503,
    10504,
    10505
  ]
}

When we run the following program

package main

import (
        "fmt"
        "net"
)

func main() {
        _, addrs, err := net.LookupSRV("manyports", "tcp", "marathon.mesos")
        fmt.Printf("addrs are %q\n", addrs)
        fmt.Printf("err is %v\n", err)
}

The above mentioned error message is being generated

@s-urbaniak
Copy link

@s-urbaniak
Copy link

xref hashicorp/consul#854

@s-urbaniak
Copy link

xref #237

@s-urbaniak
Copy link

/cc @discordianfish

@s-urbaniak
Copy link

Compression is already introduced in b204a79, so that card is played already.

@discordianfish
Copy link
Contributor

In general, an answer can be larger than 512 bytes, in that case EDNS is used (not supported by go / netgo) or it falls back to TCP for resolution (which golang should do). From the server side we just need to make sure that we set the truncated bit if the answer doesn't fit into 512 bytes.

@discordianfish discordianfish self-assigned this Oct 15, 2015
@discordianfish
Copy link
Contributor

Just verified: We set the truncated flag, but it seems like we also need to make sure that what we send back is complete. I've created a branch to fix the issue, but I haven't tested it yet: https://github.com/mesosphere/mesos-dns/compare/fish/fix-truncate?expand=1
This is roughly based on https://github.com/skynetservices/skydns/blob/4c00898b5ed8769af24f8dde381386f3969c2275/server/server.go#L175

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants