Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how do we add the encoding = UTF-8 option? #59

Open
Suitear opened this issue May 14, 2021 · 2 comments
Open

how do we add the encoding = UTF-8 option? #59

Suitear opened this issue May 14, 2021 · 2 comments

Comments

@Suitear
Copy link

Suitear commented May 14, 2021

image

After looking through the output of mktorrent --help command, I can't find it.
So wondering any hotshot can help me out.
Many thanks.

@FranciscoPombal
Copy link

FranciscoPombal commented May 14, 2021

That's a non-standard field AFAIK. It's useless anyway - the correct thing to do nowadays is (and has been for along time now) to simply assume UTF-8.

Usage of any other encodings for text is just wrong and should be considered a bug/misfeature in software that uses/expects them by default.

Similarly, if someone creates a torrent whose filenames/title/comment are not UTF-8 encoded, that's a problem on their side that they should fix.

@FranciscoPombal
Copy link

FranciscoPombal commented May 14, 2021

To further strengthen my argument, here is a relevant quote from the spec, https://www.bittorrent.org/beps/bep_0052.html (emphasis mine):

BEP authors are encouraged to use ASCII-compatible strings for dictionary keys and UTF-8 for human-readable data.
(...)
All strings in a .torrent file defined by this BEP that contain human-readable text are UTF-8 encoded.
(...)
file tree

A tree of dictionaries where dictionary keys represent UTF-8 encoded path elements. Entries with zero-length keys describe the properties of the composed path at that point. 'UTF-8 encoded' in this context only means that if the native encoding is known at creation time it must be converted to UTF-8. Keys may contain invalid UTF-8 sequences or characters and names that are reserved on specific filesystems. Implementations must be prepared to sanitize them. On most platforms path components exactly matching '.' and '..' must be sanitized since they could lead to directory traversal attacks and conflicting path descriptions. On platforms that require valid UTF-8 path components this sanitizing step must happen after normalizing overlong UTF-8 encodings.

And also from the legacy v1 spec, https://www.bittorrent.org/beps/bep_0003.html (again, emphasis mine):

All strings in a .torrent file that contains text must be UTF-8 encoded.
(...)
The name key maps to a UTF-8 encoded string which is the suggested name to save the file (or directory) as. It is purely advisory.
(...)
path - A list of UTF-8 encoded strings corresponding to subdirectory names, the last of which is the actual file name (a zero length list is an error case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants