Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half-Request: Better Compatibility with OS X #14

Open
ghost opened this issue Apr 17, 2016 · 6 comments
Open

Half-Request: Better Compatibility with OS X #14

ghost opened this issue Apr 17, 2016 · 6 comments
Labels
under construction The issue is being worked on.

Comments

@ghost
Copy link

ghost commented Apr 17, 2016

When OS X creates a torrent file, it uses a UTF-8 naming convention, however it's using the Normalization Form Canonical Decomposition (NFD) instead of Normalization Form Canonical Composition (NFC). Most clients on other operating systems, who would of course torrent using torrent files made on OS X, are going to expect NFC. Meanwhile, OS X torrent clients are, for the most part, able to handle NFC- or NFD-normalized filenames. They ensure the end filenames are NFD-normalized, which is what the HFS+ filesystem expects.

The ideal, therefore, is for all torrent files to be generated as UTF-8/NFC, which is the implicit standard for the Bittorrent protocol: "All strings in a .torrent file that contains text must be UTF-8 encoded."

At the moment, mktorrent (and all other torrent clients I've tested) do not distinguish between UTF-8/NFC and UTF-8/NFD, and therefore does not convert UTF-8/NFD text strings to UTF-8/NFC,

My question is, would there be a general interest in an option to format torrent files generated on OS X with the NFC unicode equivalent encoding for the torrent-creating computer's NFD-normalized filenames? If so, would forking the project be the best way to contribute? (New to GitHub, sorry!) Or is there already a super-secret option for something like that?

@denkristoffer
Copy link

denkristoffer commented Nov 27, 2016

The above user deleted their account so I assume the offer to add this is off the table, but it would definitely be a welcome addition as this has been giving me problems lately!

@pobrn pobrn added the information needed Work on the issue cannot advance due to insufficient information. label Jan 10, 2021
@pobrn
Copy link
Owner

pobrn commented Jan 10, 2021

@denkristoffer if this issue is still relevant, could you please elaborate the nature of the problems it's causing?

@gennaios
Copy link

gennaios commented Apr 5, 2021

As the torrent is created with a file name encoded in NFD, when attempting to seed on another operating system, the file name is different and as such the torrent client thinks the file does not exist. I often create torrents on macOS and seed on Linux. Currently, I have to first ensure there are no accents, diacritics, or non-Latin characters in the file name before creating.

@pobrn
Copy link
Owner

pobrn commented Apr 21, 2021

@gennaios thanks for the explanation.

@pobrn pobrn added this to the mktorrent 2.0-rc0 milestone Apr 21, 2021
@pobrn pobrn added under construction The issue is being worked on. and removed information needed Work on the issue cannot advance due to insufficient information. labels Apr 21, 2021
@gennaios
Copy link

Any updates as to when this might be addressed? Mentioning it to someone, he said with accents, the created torrent file from macOS is even unusable on any system, even on macOS itself.

@taylorthurlow
Copy link

I just wanted to confirm that I'm also encountering this issue, and that it is definitely a property of mktorrent on macOS.

APFS (compared to HFS) seems to be happy to allow you to write unicode filenames with NFC-normalized characters, and they will stay that way, but mktorrent seems to read and generate its bencoded data structure with the path strings re-normalized back to NFD. This is how we get into the scenario that @gennaios mentioned, where it's even possible to generate a torrent on macOS, load it into a torrent client on that same system, and have it fail verification. This would require torrent clients to auto-normalize back to NFC unicode, which I can at least say that Deluge on linux is not doing.

taylorthurlow added a commit to taylorthurlow/redacted_better that referenced this issue Feb 6, 2022
Some unicode characters can be represented in "NFC" or "NFD" format. The
C and D stand for composed and decomposed. Composed means characters
like the umlaut-ed "u" are a single code point, whereas decomposed means
a standard "u" followed by an umlaut "diaeresis" character which
combines to be VIEWED as a single character, when it is actually two
characters.

In the HFS filesystem era of macOS, all files used NFD format which,
while valid, was not how every single linux-based system does it - they
use NFC. As of APFS, either form is accepted and the filesystem will not
modify/normalize the written paths.

However, in the case of mktorrent (and probably because mktorrent uses
libraries built-in to macOS), files read from the APFS disk (which are
written in NFC) are actually normalized back to NFD when mktorrent
generates the encoded data for the torrent.

This means that a torrent built with mktorrent on macOS will fail to
verify when loaded in a torrent client on a linux machine, when a path
within that torrent contains a string that is different between NFC and
NFD normalization forms.

This is not something I can fix here (despite this commit making sure we
write data in NFC format). There is a GitHub issue open on the mktorrent
repo: pobrn/mktorrent#14

For now I need to make sure that it is clear that we need to generate
the torrents on a Linux box until we can confirm that mktorrent makes
the right normalization decisions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
under construction The issue is being worked on.
Projects
None yet
Development

No branches or pull requests

4 participants