Lower sender information max byte size to safely comply with smaller MTUs #391

dchristle · 2024-11-05T07:18:02Z

Uploads from OpenWebRX to PSKReporter have their payload sizes constrained by a check on the size of the Sender Information FlowSet being < 1200 bytes. The intent of this check is to keep the total packet size below the effective MTU over the Internet, so that the packets aren't fragmented. In real traffic, the earlier part of the packet can be 250 bytes or more, making the total size ~1450 bytes, which is above the MTU of real networks.

This change refactors the constraint check to be against the total padded packet size, rather than on only the Sender Information FlowSet. The max size is also lowered to 1190 bytes, which should fix the fragmentation observed in real traffic analysis of OpenWebRX uploads on some mobile network providers.

Calculate the size constraint based on the total padded message size. With a 1190 byte limit, fragmentation seen on some mobile networks should be eliminated.

jketterl · 2024-11-05T08:04:48Z

What is a real network for you? For me, a real network has an MTU of 1500...

But either way, I did look into this when I built this... The problem is that I couldn't figure out a way to get the actual MTU, and I don't think constraining this to random values is a good idea in the first place. Lowering the limit might make this work for you, but it will cause additional traffic for others, so I don't consider this a good solution either...

dchristle · 2024-11-05T20:54:27Z

I should have made the motivation for avoiding fragmentation clearer in my description; sorry about that. The way I wrote it makes it sound like this is a esoteric spec issue/theoretical nice-to-have, without much practical impact.

When UDP packets are fragmented, they often don't get delivered completely - meaning PSKReporter receives malformed data that must be discarded. This results in 100% data loss for affected users. Based on packet captures at PSKReporter, there are at least two European mobile networks where OpenWebRX uploads are getting fragmented down to just 1,276 bytes, so this isn't a theoretical concern. These stations upload quite a lot of spots, too, so it's unfortunate.

With the current code's 1200-byte sender info limit, OpenWebRX generates 1,450-1,478 byte packets (or larger). We're seeing consistent fragmentation across the 1278-1490 byte range (coming from more sources than just OpenWebRX, to be clear). Any of these fragmented uploading stations are experiencing significant data loss -- usually 100%.

What is a real network for you? For me, a real network has an MTU of 1500...

1500 bytes is common for local networks and many ISPs, so it's a great reference point, but the effective MTU for Internet traffic is often lower. Mobile providers (4G/5G) typically recommend MTUs of ~1420-1450 bytes, and VPN/tunnel configurations can reduce this further (~1,324 bytes). There are other adverse conditions that can push it even lower. But more directly, we can see the ranges of real-world values in the packet captures.

The problem is that I couldn't figure out a way to get the actual MTU, and I don't think constraining this to random values is a good idea in the first place.

Yes, after reading the various specs/conditions that affect MTU, I can relate to this challenge. There is an automated way called "path MTU discovery", but I think that's overkill for our application. Instead, the conservative 1190-byte limit is derived from analysis of actual PSKReporter traffic showing consistent fragmentation issues between 1278-1490 bytes from various uploader stations. There's even a few down at 1,210 bytes. The 1,190 byte limit computed over the whole UDP payload is a simple way to solve real problems users are experiencing today. I'm also working with maintainers of other spot uploading software to mitigate this same issue; it's a pesky detail of UDP networking that isn't top-of-mind for developers, but has real effects on users.

Lowering the limit might make this work for you, but it will cause additional traffic for others...

Even for high volume stations, where the overhead would be maximal, this change would add only about 1 kilobyte of overhead per 5-minute reporting interval (~3.5 bytes/second), in exchange for preventing complete data loss for affected users. I think this is a worthwhile trade-off for improving reliability across diverse network conditions, particularly for amateur radio, which is a global audience & also tends to sometimes operate from remote/harsh locations. The value of the trade-off is especially tangible since we have actual evidence of stations losing 100% of their data from fragmentation.

jketterl · 2024-12-11T17:00:12Z

Well, I've been thinking about this for a bit now, and I gotta say, I don't like this approach. Let me elaborate a bit.

First of all, I'd like to point out that I believe that this problem is in no way caused by OpenWebRX. As such, I find it kind of difficult to implement any changes in OpenWebRX in the first place. Problems should be fixed at their root. I believe the UDP fragmentation is a feature that is exactly the thing that should prevent this from happening in the first place, and if this feature is broken for certain providers, this should be fixed with those providers, and not in the software that is using their infrastructure.

That being said, I do also understand the difficulty and how various ways of encapsulating network traffic have complicated the MTU "landscape". I have no idea what exactly is used in the mobile sector, but I know that the wired world is just as bad (PPPoE, DSLite, various VPN protocols, just to give a few examples).

I do see that OpenWebRX could improve the situation by sending UDP packets that do not need fragmentation in the first place. The problem is that in order to do that correctly and efficiently, it needs to know the actual MTU of the link in question. I do not believe that making assumptions based on current traffic is a good solution in general, and as such I don't think that limiting the packet size as implemented in this PR is the right solution.

There is two ideas which I'd say would be worth considering:

Try to detect the MTU using the already mentioned path MTU discovery. This may be a bit harder to implement but certainly would alleviate such issues in the future.
Add a configuration option so that users can provide the applicable MTU for their location and respective network infrastructure. This would certainly be easier to implement, but won't resolve all the affected cases due to potential lack of awareness, and there is also the chance of future (unnoticed) network changes will require changes to the configuration.

Lastly, I would also like to suggest switching to a different protocol. TCP clearly has an advantage in this scenario since MSS clamping works transparently and since it is part of the network stack does not need to be taken into account on the application level.

What do you think?

refactor: calculate pskreporter message size constraint on total bytes

c2ce73a

Calculate the size constraint based on the total padded message size. With a 1190 byte limit, fragmentation seen on some mobile networks should be eliminated.

dchristle mentioned this pull request Nov 5, 2024

Set scope fields to 1 in receiver template descriptors. #390

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower sender information max byte size to safely comply with smaller MTUs #391

Lower sender information max byte size to safely comply with smaller MTUs #391

dchristle commented Nov 5, 2024

jketterl commented Nov 5, 2024

dchristle commented Nov 5, 2024 •

edited

Loading

jketterl commented Dec 11, 2024 •

edited

Loading

Lower sender information max byte size to safely comply with smaller MTUs #391

Are you sure you want to change the base?

Lower sender information max byte size to safely comply with smaller MTUs #391

Conversation

dchristle commented Nov 5, 2024

jketterl commented Nov 5, 2024

dchristle commented Nov 5, 2024 • edited Loading

jketterl commented Dec 11, 2024 • edited Loading

dchristle commented Nov 5, 2024 •

edited

Loading

jketterl commented Dec 11, 2024 •

edited

Loading