Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spec] Specify if UTF-8 encoding should have BOM #42

Closed
Tuupertunut opened this issue Dec 26, 2023 · 6 comments · Fixed by #45
Closed

[Spec] Specify if UTF-8 encoding should have BOM #42

Tuupertunut opened this issue Dec 26, 2023 · 6 comments · Fixed by #45
Labels
Approved approved by majority Specification affects specification (spec.md)

Comments

@Tuupertunut
Copy link

Suggestion

Specify in the Ultrastar format specification whether files should use UTF-8 BOM or UTF-8 without BOM.

Use case

The Ultrastar format specification speficies that encoding should be UTF-8, but does not distinguish between UTF-8 BOM and UTF-8 without BOM. There are currently tools in the Ultrastar ecosystem that accept or produce only one of these.

Examples:

  • Ultrastar Play 0.9.0 song editor can read both formats, but always saves the file as UTF-8 BOM.
  • Performous Composer 2.0 can only open UTF-8 without BOM.

Therefore if you save a song in Ultrastar Play song editor and try to open it in Performous Composer, it will fail to open, even though both are UTF-8.

Extra info/examples/attachments

No response

@Baklap4
Copy link
Collaborator

Baklap4 commented Dec 26, 2023

as far as previous discussions assembled i think we came onto without bom

@bohning
Copy link
Collaborator

bohning commented Dec 26, 2023

BOM does only really make sense for UTF16 and UTF32. It was only introduced by Vocaluxe afaik to detect UTF8 and differentiate CP1252 (as alternative to the Vocaluxe-specific #ENCODING tag). The Vocaluxe reasoning is: if there is no #ENCODING tag, it is UTF8 if there is a BOM, otherwise it's CP1252. But there are definitely better ways to implement encoding detection and if the standard proposes UTF8, it should not even be necessary anymore to detect encodings.

I strongly suggest that some Vocaluxe developers change the logic to default to UTF8 (and use other encodings via the #ENCODING tag, as long as anything other than UTF8 is not deprecated yet).

@basisbit
Copy link
Member

I also vote for (and strongly suggest) UTF-8 without BOM. There is no need for it.

@Baklap4
Copy link
Collaborator

Baklap4 commented Dec 26, 2023

@marwin89 can you make a pr to add this :)?

@marwin89 marwin89 linked a pull request Dec 27, 2023 that will close this issue
@marwin89 marwin89 added the Approved approved by majority label Dec 27, 2023
@marwin89 marwin89 self-assigned this Dec 27, 2023
@marwin89
Copy link
Collaborator

Here is the pull request. Please approve and merge 👋

FYI: there is an estimated issue for implementing support for UTF-8 (without BOM) in vocaluxe repository

I close this issue. thanks @Tuupertunut for discussion and refining the spec.

@codello
Copy link
Contributor

codello commented May 6, 2024

The current version of the formal specification has a slightly more relaxed phrasing wrt to the BOM, acknowledging that applications may ignore a BOM if one is present. Is this phrasing in line with the result of this discussion?

format/spec.md

Lines 64 to 67 in 23bf930

Songs are plain text files
The UTF-8 encoding MUST be used.
Implementations MUST NOT add a byte order mark to the beginning of a file.
In the interests of interoperability, implementations MAY ignore the presence of a byte order mark rather than treating it as an error.

@marwin89 marwin89 added the Specification affects specification (spec.md) label Feb 2, 2025
@marwin89 marwin89 removed their assignment Feb 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Approved approved by majority Specification affects specification (spec.md)
Projects
Status: Implemented in spec.md
Development

Successfully merging a pull request may close this issue.

6 participants