-
Notifications
You must be signed in to change notification settings - Fork 396
Description
This issue has been migrated from #9774.
Originally discovered on #synapse:matrix.org
by @LTangaF
On Joel's server, doing the following DNS query times out;
root@5d0681f56cda:/# dig _matrix._tcp.matrix.lion.fm SRV
; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> _matrix._tcp.matrix.lion.fm SRV
;; global options: +cmd
;; connection timed out; no servers could be reached
While a valid SRV record doesn't time out;
root@5d0681f56cda:/# dig _matrix._tcp.jboi.nl SRV
; <<>> DiG 9.11.5-P4-5.1+deb10u3-Debian <<>> _matrix._tcp.jboi.nl SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 560
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_matrix._tcp.jboi.nl. IN SRV
;; ANSWER SECTION:
_matrix._tcp.jboi.nl. 120 IN SRV 0 0 443 matrix.jboi.nl.
;; Query time: 40 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Thu Apr 08 20:50:06 UTC 2021
;; MSG SIZE rcvd: 83
This is already odd, but synapse currently doesn't specify a timeout when looking up SRV records.
The offending snippet is this:
When the underlying DNS query times out, this does never complete, and it causes a federation transmission loop to "time out" the whole request, putting it on catchup.
twisted
has the following interface for lookupService
:
def lookupService(name: str, timeout: Sequence[int]) -> "Deferred":
"""
Perform an SRV record lookup.
@param name: DNS name to resolve.
@param timeout: Number of seconds after which to reissue the query.
When the last timeout expires, the query is considered failed.
@return: A L{Deferred} which fires with a three-tuple of lists of
L{twisted.names.dns.RRHeader} instances. The first element of the
tuple gives answers. The second element of the tuple gives
authorities. The third element of the tuple gives additional
information. The L{Deferred} may instead fail with one of the
exceptions defined in L{twisted.names.error} or with
C{NotImplementedError}.
"""
The optional parameter timeout
defines that timeout, however, synapse isn't giving it any, so it never times out. Or synapse doesn't give it a strict enough timeout.
I propose adding a 15 second timeout by adding timeout=(15,)
to the SrvResolver.resolve_service
snippet.
Edit: The default resolver defines the timeouts of (1, 3, 11, 45)
, however, it adds these up with eachother, so it basically tries to resolve dns for exactly 60 seconds before giving up, and then it has a "timeout race" with the previously-established HTTP agent timeout (also of 60 seconds), which causes this DNS query to never promptly "time out" before it's overlaying "HTTP request timeout" could.