Add support for STUN/TURN server configuration #140

lachesis · 2023-11-03T17:37:31Z

I have a client that wants to use Willow for ASR in meetings but often operates behind restrictive firewalls and/or with broken IPv6 configurations. @richardklafter believes that using a TURN server might improve connection reliability and performance in this case.

Testing:
I have deployed an eturnal server, then set up firewall rules that drop all UDP traffic to my test instance of willow-inference-server. With TURN enabled, connections are able to proceed (relatively quickly). Without it, they fail.

One note: if you specify a STUN server but the STUN server does not work, connections will fail even if the TURN server works. This appears to be an aioice limitation, though maybe it has now been resolved? This warrants a bit more investigation.

kristiankielhofner · 2023-11-09T13:17:18Z

I'd like to better understand the underlying issue.

Generally speaking, TURN servers are basically used as a last result when peer-to-peer ICE candidates are unable to confirm bi-directional flow due restrictive firewalls on both sides. There a way to basically include an offer with a real public IP that should be wide open and reachable from anywhere. However, in the end the firewall still needs to allow some kind of outbound connection. Obviously in that case you're SOL regardless of what you're doing from the server/TURN side.

WIth WIS being on one end of the negotiation we should be able to always ensure we are at least reachable. What should happen (theoretically) is we provide an ICE candidate with a public IP and listening port the client can make an outbound connection to through the firewall/NAT/etc.

Generally speaking TURN servers were born out of desperation for the peer-to-peer one or more very restrictive firewall scenario. They're generally frowned upon because it's not a good idea to have an additional component in a latency, jitter, and loss sensitive audio path. It's also another component to manage, fail, scale, etc.

I added some debugging in ts-client to show the offer from WIS:

v=0
o=- 3908523921 3908523921 IN IP4 0.0.0.0
s=-
t=0 0
a=group:BUNDLE 0 1
a=msid-semantic:WMS *
m=audio 10043 UDP/TLS/RTP/SAVPF 111 0 8
c=IN IP4 172.21.0.3
a=recvonly
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:4 urn:ietf:params:rtp-hdrext:sdes:mid
a=mid:0
a=msid:7e964f7a-437f-437b-bfba-315490abb291 741e1e19-3da6-4d73-b418-5b2f72c2a65b
a=rtcp:9 IN IP4 0.0.0.0
a=rtcp-mux
a=ssrc:3821658038 cname:bce384b2-fe3a-478b-9705-9903ea60e1b8
a=rtpmap:111 opus/48000/2
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=candidate:461573564b34ea161d3793846aacf055 1 udp 2130706431 172.21.0.3 10043 typ host
a=candidate:6c724dc241cfd940d474bf86bf8a21db 1 udp 1694498815 12.11.215.207 10043 typ srflx raddr 172.21.0.3 rport 10043
a=end-of-candidates
a=ice-ufrag:6iIk
a=ice-pwd:o1HnRilAUPHvpE5TIda3bG
a=fingerprint:sha-256 73:A1:E7:B5:3E:DE:8E:25:B9:39:E6:B4:A6:9F:73:A0:5A:CE:EE:F1:09:F7:6C:EA:C1:AE:C4:21:A4:52:45:53
a=setup:active
m=application 10043 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 172.21.0.3
a=mid:1
a=sctp-port:5000
a=max-message-size:65536
a=candidate:461573564b34ea161d3793846aacf055 1 udp 2130706431 172.21.0.3 10043 typ host
a=candidate:6c724dc241cfd940d474bf86bf8a21db 1 udp 1694498815 12.11.215.207 10043 typ srflx raddr 172.21.0.3 rport 10043
a=end-of-candidates
a=ice-ufrag:6iIk
a=ice-pwd:o1HnRilAUPHvpE5TIda3bG
a=fingerprint:sha-256 73:A1:E7:B5:3E:DE:8E:25:B9:39:E6:B4:A6:9F:73:A0:5A:CE:EE:F1:09:F7:6C:EA:C1:AE:C4:21:A4:52:45:53
a=setup:active

If we look at these candidates even though there is a WIS public IP candidate all of the docker 172.x candidates (and SDP lines) probably shouldn't be there. It would be interesting for you to try running WIS outside of docker (where these candidates wouldn't be included) to see if that addresses the issue. If it does, we can tweak this on the WIS side to not include these candidates, doing what a lot of VoIP stuff does and make a configuration option to essentially configure the real public IP behind any NAT implementations (including docker). Alternatively, WIS could use STUN to figure this out but that's a much heavier lift and adds the requirement that the configured STUN server is reachable which IMO isn't great.

What really sticks out here is the c= line. You should be able to confirm that when using a TURN server it's set to the public IP of the TURN server, in addition to the TURN server being included as an ICE candidate. This is one of the things we would want to tweak in the case of WIS always been reachable at a known (configured or discovered) IP.

lachesis · 2023-11-09T17:16:11Z

The primary goal is to support clients operating from behind a very restrictive firewall that blocks all outbound traffic except for that on TCP port 443. How might this be otherwise accomplished?

lachesis · 2023-11-09T18:09:43Z

Alternatively, WIS could use STUN to figure this out but that's a much heavier lift and adds the requirement that the configured STUN server is reachable which IMO isn't great.

For what it's worth, WIS already uses the Google STUN server to determine its public IP. This is apparent if you run a tcpdump like:
sudo tcpdump -i any -n -A -w a.pcap port 19302 or port 3478
(see attached) a.pcap.gz

Or if you look at the aiortc source code:
https://github.com/aiortc/aiortc/blob/cdba00d07a3c53659c2a8abfdfa1ecfab4cecebd/src/aiortc/rtcicetransport.py#L209

This is why I did not pass an empty list for ICE servers if none was configured.

Additionally, if you do configure a STUN server (with this patch) and it is unreachable by the WIS server, WIS will fail to make any WebRTC connections.

kristiankielhofner · 2023-11-09T20:27:32Z

Alternatively, WIS could use STUN to figure this out but that's a much heavier lift and adds the requirement that the configured STUN server is reachable which IMO isn't great.

For what it's worth, WIS already uses the Google STUN server to determine its public IP. This is apparent if you run a tcpdump like: sudo tcpdump -i any -n -A -w a.pcap port 19302 or port 3478 (see attached) a.pcap.gz

Or if you look at the aiortc source code: https://github.com/aiortc/aiortc/blob/cdba00d07a3c53659c2a8abfdfa1ecfab4cecebd/src/aiortc/rtcicetransport.py#L209

Well that answers my question about how WIS knew its real public IP in the first place (which I've never really thought too much about before).

The issue is it adds as an ICE candidate, which is kind of ok, but there are probably some other changes that could be made to reduce the number of ICE candidates. Given the docker use case it makes little sense to include the RFC 1918 address of the docker network at all, and because that's the actual address of the network interface it's used for the c= line, which isn't ideal. Unless a client is (somehow) on the docker network this candidate will always fail anyway, and it slows down ICE because it needs to be evaluated.

For most of our use cases this is pretty suboptimal even though we only do ICE once at initial load. It's another point of failure (and remote call) and for the vast majority of these WIS use cases the IP is likely static and known in advance. We could/should provide an option for a known, configured IP and if enabled skip STUN altogether and use it and essentially make all aiortc messaging act as though that's the address of the local (docker) interface.

If not static/configured we already have STUN today. Another option could be to just do STUN once at WIS startup, store the learned IP from the STUN response, and use that with the approach from the IP configuration option.

This is why I did not pass an empty list for ICE servers if none was configured.

Additionally, if you do configure a STUN server (with this patch) and it is unreachable by the WIS server, WIS will fail to make any WebRTC connections.

Same points as above.

Generally speaking this is all good stuff but it's a very specific use case and I want to try to minimize the required changes in WIS as much as possible.

gfodor · 2024-01-06T17:55:09Z

Ultimately SFUs like Janus and Mediasoup allow you to specify a broadcasted external URL for cases where you are behind a NAT and you want to inject IPs into the candidates. This seems strictly necessary if you want to cover those kinds of cases where the server is available for a direct connection but the candidates that will be generated by the usual methods won't be correct. For example, on AWS, STUN will not actually discover the right IP address.

In practice, many companies simply run a TURN server on 80 and 443 for the reasons mentioned. I'm not sure if the latency cost is a real concern - the "happy path" is to just run coturn alongside the webrtc server on the same machine and then just specify the TURN server as the only iceServer. It would be interesting to know more about how much latency and overhead this incurs but in practice I haven't seen it matter, and now your system is operating in a way where the world's firewalls are all assured to be configured to let traffic pass through and so on.

Add support for STUN/TURN server configuration

9170551

lachesis requested a review from kristiankielhofner November 3, 2023 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for STUN/TURN server configuration #140

Add support for STUN/TURN server configuration #140

lachesis commented Nov 3, 2023

kristiankielhofner commented Nov 9, 2023

lachesis commented Nov 9, 2023

lachesis commented Nov 9, 2023 •

edited

Loading

kristiankielhofner commented Nov 9, 2023

gfodor commented Jan 6, 2024 •

edited

Loading

Add support for STUN/TURN server configuration #140

Are you sure you want to change the base?

Add support for STUN/TURN server configuration #140

Conversation

lachesis commented Nov 3, 2023

kristiankielhofner commented Nov 9, 2023

lachesis commented Nov 9, 2023

lachesis commented Nov 9, 2023 • edited Loading

kristiankielhofner commented Nov 9, 2023

gfodor commented Jan 6, 2024 • edited Loading

lachesis commented Nov 9, 2023 •

edited

Loading

gfodor commented Jan 6, 2024 •

edited

Loading