Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower sender information max byte size to safely comply with smaller MTUs #391

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

dchristle
Copy link

Uploads from OpenWebRX to PSKReporter have their payload sizes constrained by a check on the size of the Sender Information FlowSet being < 1200 bytes. The intent of this check is to keep the total packet size below the effective MTU over the Internet, so that the packets aren't fragmented. In real traffic, the earlier part of the packet can be 250 bytes or more, making the total size ~1450 bytes, which is above the MTU of real networks.

This change refactors the constraint check to be against the total padded packet size, rather than on only the Sender Information FlowSet. The max size is also lowered to 1190 bytes, which should fix the fragmentation observed in real traffic analysis of OpenWebRX uploads on some mobile network providers.

Calculate the size constraint based on the total padded message size.
With a 1190 byte limit, fragmentation seen on some mobile networks
should be eliminated.
@jketterl
Copy link
Owner

jketterl commented Nov 5, 2024

What is a real network for you? For me, a real network has an MTU of 1500...

But either way, I did look into this when I built this... The problem is that I couldn't figure out a way to get the actual MTU, and I don't think constraining this to random values is a good idea in the first place. Lowering the limit might make this work for you, but it will cause additional traffic for others, so I don't consider this a good solution either...

@dchristle
Copy link
Author

dchristle commented Nov 5, 2024

I should have made the motivation for avoiding fragmentation clearer in my description; sorry about that. The way I wrote it makes it sound like this is a esoteric spec issue/theoretical nice-to-have, without much practical impact.

When UDP packets are fragmented, they often don't get delivered completely - meaning PSKReporter receives malformed data that must be discarded. This results in 100% data loss for affected users. Based on packet captures at PSKReporter, there are at least two European mobile networks where OpenWebRX uploads are getting fragmented down to just 1,276 bytes, so this isn't a theoretical concern. These stations upload quite a lot of spots, too, so it's unfortunate.

With the current code's 1200-byte sender info limit, OpenWebRX generates 1,450-1,478 byte packets (or larger). We're seeing consistent fragmentation across the 1278-1490 byte range (coming from more sources than just OpenWebRX, to be clear). Any of these fragmented uploading stations are experiencing significant data loss -- usually 100%.

What is a real network for you? For me, a real network has an MTU of 1500...

1500 bytes is common for local networks and many ISPs, so it's a great reference point, but the effective MTU for Internet traffic is often lower. Mobile providers (4G/5G) typically recommend MTUs of ~1420-1450 bytes, and VPN/tunnel configurations can reduce this further (~1,324 bytes). There are other adverse conditions that can push it even lower. But more directly, we can see the ranges of real-world values in the packet captures.

The problem is that I couldn't figure out a way to get the actual MTU, and I don't think constraining this to random values is a good idea in the first place.

Yes, after reading the various specs/conditions that affect MTU, I can relate to this challenge. There is an automated way called "path MTU discovery", but I think that's overkill for our application. Instead, the conservative 1190-byte limit is derived from analysis of actual PSKReporter traffic showing consistent fragmentation issues between 1278-1490 bytes from various uploader stations. There's even a few down at 1,210 bytes. The 1,190 byte limit computed over the whole UDP payload is a simple way to solve real problems users are experiencing today. I'm also working with maintainers of other spot uploading software to mitigate this same issue; it's a pesky detail of UDP networking that isn't top-of-mind for developers, but has real effects on users.

Lowering the limit might make this work for you, but it will cause additional traffic for others...

Even for high volume stations, where the overhead would be maximal, this change would add only about 1 kilobyte of overhead per 5-minute reporting interval (~3.5 bytes/second), in exchange for preventing complete data loss for affected users. I think this is a worthwhile trade-off for improving reliability across diverse network conditions, particularly for amateur radio, which is a global audience & also tends to sometimes operate from remote/harsh locations. The value of the trade-off is especially tangible since we have actual evidence of stations losing 100% of their data from fragmentation.

@jketterl
Copy link
Owner

jketterl commented Dec 11, 2024

Well, I've been thinking about this for a bit now, and I gotta say, I don't like this approach. Let me elaborate a bit.

First of all, I'd like to point out that I believe that this problem is in no way caused by OpenWebRX. As such, I find it kind of difficult to implement any changes in OpenWebRX in the first place. Problems should be fixed at their root. I believe the UDP fragmentation is a feature that is exactly the thing that should prevent this from happening in the first place, and if this feature is broken for certain providers, this should be fixed with those providers, and not in the software that is using their infrastructure.

That being said, I do also understand the difficulty and how various ways of encapsulating network traffic have complicated the MTU "landscape". I have no idea what exactly is used in the mobile sector, but I know that the wired world is just as bad (PPPoE, DSLite, various VPN protocols, just to give a few examples).

I do see that OpenWebRX could improve the situation by sending UDP packets that do not need fragmentation in the first place. The problem is that in order to do that correctly and efficiently, it needs to know the actual MTU of the link in question. I do not believe that making assumptions based on current traffic is a good solution in general, and as such I don't think that limiting the packet size as implemented in this PR is the right solution.

There is two ideas which I'd say would be worth considering:

  • Try to detect the MTU using the already mentioned path MTU discovery. This may be a bit harder to implement but certainly would alleviate such issues in the future.
  • Add a configuration option so that users can provide the applicable MTU for their location and respective network infrastructure. This would certainly be easier to implement, but won't resolve all the affected cases due to potential lack of awareness, and there is also the chance of future (unnoticed) network changes will require changes to the configuration.

Lastly, I would also like to suggest switching to a different protocol. TCP clearly has an advantage in this scenario since MSS clamping works transparently and since it is part of the network stack does not need to be taken into account on the application level.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants