-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-16 Encoding Issue #86
Comments
This looks relevant: Lines 162 to 164 in 34286c4
Also, oddly, in your code,
This is the little-endian BOM. Not big-endian. But your program seems to write big-endian BOM (and big-endian text). |
Alright I think I got it: Lines 24 to 33 in 34286c4
When I call func (bw *bufWriter) EncodeAndWriteText(src string, to Encoding) {
if bw.err != nil {
return
}
bw.err = encodeWriteText(bw, src, to)
}
// encodeWriteText encodes src from UTF-8 to "to" encoding and writes to bw.
func encodeWriteText(bw *bufWriter, src string, to Encoding) error {
if to.Equals(EncodingUTF8) {
bw.WriteString(src)
return nil
}
toXEncoding := resolveXEncoding(nil, to)
encoded, err := toXEncoding.NewEncoder().String(src)
if err != nil {
return err
}
bw.WriteString(encoded)
// Here we go! 💣
if to.Equals(EncodingUTF16) && !bytes.HasSuffix([]byte(encoded), []byte{0}) {
bw.WriteByte(0)
}
return nil
} So at this point, before After func (tf TextFrame) WriteTo(w io.Writer) (int64, error) {
return useBufWriter(w, func(bw *bufWriter) {
bw.WriteByte(tf.Encoding.Key)
bw.EncodeAndWriteText(tf.Text, tf.Encoding) // <- this added a single 0
// https://github.com/bogem/id3v2/pull/52
// https://github.com/bogem/id3v2/pull/33
bw.Write(tf.Encoding.TerminationBytes) // <- now we have two more
})
} There you go, three null bytes at the end. So there's a bug with UTF-16, and that's consistent with the tools I'm using to read it. |
Thanks for a great library! I appreciate your hard work on it.
Like in my last issue (which is a distinct encoding issue), I'm running into more trouble with encoding.
Note that I'm calling SetVersion prior to SetDefaultEncoding because of #85
When I run
exiftool -v3 -l myfile.mp3
, it indicates this forTIT2
I see that the first 0x01 indicates that we're dealing with UCS-2 with a BOM. Let's disregard that. Then the BOM says it's big endian (fine). I edited out the 0x01 tag in a text editor then pasted it into an interactive python session:
Kinda hard to get to the standard at the moment (it's been like this for a while):
Here's a copy: https://web.archive.org/web/20190207033339/https://id3.org/id3v2.3.0#ID3v2_frame_overview
At least in 2.3.0, it does say that you're supposed to have a null terminator (so,
00 00
for UTF-16/UCS-2). I don't think that you can have a UCS-2 string with an odd number of characters. So it seems like you're appending an additional null character, and the length becomes not a multiple of 2, which makes it not legit UCS-2 or UTF-16.I'm pasting a screenshot rather than copying the text of the output of id3edit because I think the colors are cool:
So that thinks it's bad too. If I just remove the one extra null byte, python seems happy with it (and other tools):
I think this has something to do with other PRs that were meant to fix things for other encodings maybe in 2.4...
The text was updated successfully, but these errors were encountered: