Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can in2csv add a byte order mark (BOM) so that when opening csv in Excel it correctly formats unicode text? #1267

Open
river-ride opened this issue Oct 25, 2024 · 3 comments

Comments

@river-ride
Copy link

river-ride commented Oct 25, 2024

There is a short write up here [https://hilton.org.uk/blog/csv-excel] that describes the issue i.e. on double clicking a .csv file to open it, Excel doesn't recognise that it is UTF-8 encoded without a Byte Order Mark.

This can be fixed by simply appending the correct BOM when writing the csv:
echo -ne "\xEF\xBB\xBF" | cat - data.csv > data-with-BOM.csv

Would be great if in2csv could incorporate this as standard in the csv output if possible

Thanks

@jpmckinney
Copy link
Member

jpmckinney commented Oct 25, 2024

Adding a BOM to all output will break a lot of CSV applications, which do not expect an extra 3 bytes.

We could add an option to csvformat (the tool that controls output format – all other tools have a consistent output format), but it will not be much different than that command.

@river-ride
Copy link
Author

river-ride commented Nov 15, 2024

OK - understood. Thanks for responding. It would be nice as a format option so that another process doesn't have to be run to make them clickable.

One other thing, if I may... we have a column in Excel as a % and this is a string which in2csv unfortunately (and probably sensibly) strips out the % sign. But this needs to be output as a string as it gets passed onto to a data visualisation app (we have a separate column for the decimal percentage). Is there any way to get in2csv to respect the string formatting of the xlsx?

@jpmckinney
Copy link
Member

Please open a new issue for your second issue. I cannot replicate it with an XLSX file that has "%" as one of the column names. You should attach a file that causes the problem, to that new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants