-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csv/encoding: UTF-8 Byte Order Mark (BOM) messes up quote handling #33887
Comments
Roughly speaking this is a test for the encoding/csv/reader_test.go {
Name: "UTF-8WithByteOrderMark",
Input: "\xef\xbb\xbf\"BOM!\"",
Output: [][]string{{"BOM!"}},
},
|
Per RFC4180, CSV, which by far predates UTF and everything that comes with the latter, is composed from subset of OCTETS. |
I agree that RFC4180 does not specify it. But also because Excel seems to do this on export
parse(data, {
bom: true
}) |
The encoding/csv package implements RFC 4180, as the package docs state. A BOM is meaningless for CSV, and it's straightforward to use a reader that skips a BOM (e.g., a search on godoc.org turned up https://godoc.org/github.com/spkg/bom, though I don't know anything about that package). There is no need for CSV to know anything about BOM's. |
KK. thanks for considering. At least this will serve as a reference to future google searches 😃 |
For future (beginner Golang) Googlers like myself, you can use |
The stack overflow solutions all require moving the entire CSV into memory. This works for small test strings, but may be prohibitive in production code. A solution is to use a streaming (io.Reader) filter, such as https://github.com/dimchansky/utfbom roughly like this: f, _ := os.Open("/tmp/dat.csv")
sr, enc := utfbom.Skip(f)
fmt.Printf("Detected encoding: %s\n", enc)
myCSV := csv.NewReader(sr) |
@MaerF0x0 that's a much better solution, thank you so much! 😆 |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
1.12.9 is the newest brew installs , lmk if i need to build 1.13
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Attempted to use encoding/csv#NewReader on a file starting with the UTF-8 BOM
0xEF,0xBB,0xBF
( https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8 )Example go playground
https://play.golang.org/p/op6f2xI5h0X
What did you expect to see?
Leading BOM is ignored.
What did you see instead?
Failure to parse CSV
The text was updated successfully, but these errors were encountered: