Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte order marks in source files #408

Closed
jonorossi opened this issue Jun 6, 2018 · 12 comments
Closed

Byte order marks in source files #408

jonorossi opened this issue Jun 6, 2018 · 12 comments

Comments

@jonorossi
Copy link
Member

Continuing from the discussing that started in #406.

My view is that BOMs can be annoying and are completely pointless for UTF-8, but rarely cause problems in recent years. After a quick Google search I just discovered that the Unicode 5 standard actually discourages their use with UTF-8 files, I'm glad that is their position because it makes sense.

However, I know older versions of Visual Studio insert BOMs and adds them on save to files without one. I didn't test out Visual Studio but the following web site makes it sound like VS still inserts them:

The free Fix File Encoding extension prevents Visual Studio 2017/2015/2013/2012 from adding BOM to UTF-8 files.
(https://vlasovstudio.com/fix-file-encoding/)

If VS doesn't insert them anymore we could set the charset EditorConfig property and normalise all files to be without one, however it doesn't appear that is that case which means if someone uses any other text editor to work on the source code you won't get BOMs so we'll always have a mix of with and without.

@Fir3pho3nixx @stakx if you'd like to look into/confirm any of that feel free, but I think we should just slowly walk away from this beehive and close this issue 😉.

@ghost
Copy link

ghost commented Jun 7, 2018

Hey I don’t mind honestly. Glad I was right to begin with but there is more important to be getting on with. I just feel bad because I accidentally trolled @davidshen84’s PR. Sorry :/

@stakx
Copy link
Member

stakx commented Jun 7, 2018

I agree with both of you.

@jonorossi, you've already mentioned charset = utf-8 for .editorconfig, but perhaps even that can wait.

@Fir3pho3nixx, don't feel bad, something good has come out of this. We now know our position regarding encoding—we all favor the ideal of UTF-8 files without BOM—so if this topic ever crops up again, we'll already know where we stand.

@jonorossi
Copy link
Member Author

you've already mentioned charset = utf-8 for .editorconfig, but perhaps even that can wait.

@stakx Yep, as I mentioned above if we were going to set this we'd have to be confident Visual Studio no longer adds them, otherwise we'd have every other editor doing as we asked with VS adding them back.

I'm closing this exactly as @stakx said, we've got a position now, and if this comes up again we know we need to confirm what recent VS versions do before making any changes.

@ghost
Copy link

ghost commented Jun 7, 2018

@stakx Yep, as I mentioned above if we were going to set this we'd have to be confident Visual Studio no longer adds them

Not picking a fight here but that is fucking bonkers. What about vs-code, atom, sublime et al.

@ghost
Copy link

ghost commented Jun 7, 2018

Pre-commit hook in git would be legendary.

@ghost
Copy link

ghost commented Jun 7, 2018

Complete aside:

Just to show there is a human side to me(@stakx), the amount of fights I have had in the office about git installations with windows guys(checkout as-is, commit as-is) over my career has probably wasted enough time perhaps for the man hours required to supply all the programmers around the world with a single cup of coffee in one day.

Today, I found out AGAIN that LF-CR screws up bash files on Linux. Encoding is the least of our worries!

I wish everyone would checkout as-is and commit unix.

image

@stakx
Copy link
Member

stakx commented Jun 7, 2018

@Fir3pho3nixx - I can sympathize. It's true that these little issues sometimes needlessly try our patience when we collaborate across different platforms.

"Be liberal in what you receive, be conservative in what you send" is a great guideline for successful interoperation, and I'm under the impression that tools nowadays are generally getting better at implementing it and using sane defaults.

Things might get even easier once Microsoft flushes all their CR LFs down the toilet and adopts UNIX-style line endings (... and, who knows, perhaps even a good command line processor). Let's be patient for a few more years. 😉

@jonorossi
Copy link
Member Author

Not picking a fight here but that is fucking bonkers. What about vs-code, atom, sublime et al.

@Fir3pho3nixx sorry no idea what you are referring to. You'll find most sane editors will keep the encoding with or without a BOM as the file is on disk, whereas Visual Studio would/still does (?) add one no matter how the file is on disk. All I was pointing out is that if you instruct well behaved editors (via EditorConfig) to always save without BOMs and Visual Studio does indeed still write them you'll be in a worse position of the first line constantly showing up in diffs being flipped back and forth.

Today, I found out AGAIN that LF-CR screws up bash files on Linux. Encoding is the least of our worries!

I wish everyone would checkout as-is and commit unix.

Aren't we all glad we don't have to deal with core.autocrlf anymore, I remember the days of using Git and having to deal with that nightmare, especially with .NET being Windows focused and the repository having CRLFs. I also remember in earlier versions of Visual Studio you couldn't set checkout as-is because solution files would fail to load if they didn't have CRLFs as they aren't XML. There seems to be a common theme here, that Visual Studio has never played well with others.

@ghost
Copy link

ghost commented Jun 11, 2018

@Fir3pho3nixx sorry no idea what you are referring to.

Let's par le.

You'll find most sane editors will keep the encoding with or without a BOM as the file is on disk, whereas Visual Studio would/still does (?).

Why is Visual Studio a "go to" point for you? A sane editor for you is not the same as the next guy. Vs will now save it in the same way it found it. It still moans about LF-only endings on windows.

All I was pointing out is that if you instruct well behaved editors (via EditorConfig) to always save without BOMs and Visual Studio does indeed still write them you'll be in a worse position of the first line constantly showing up in diffs being flipped back and forth

I am thinking pre-commit hooks using git or really forcing things down to the client using .gitattributes.

Aren't we all glad we don't have to deal with core.autocrlf anymore

Yes sir!

@ghost
Copy link

ghost commented Jun 11, 2018

There seems to be a common theme here, that Visual Studio has never played well with others.

I use dotnet core on Linux. True.

@jonorossi
Copy link
Member Author

Vs will now save it in the same way it found it.

If that's the case we can get rid of BOMs and set EditorConfig's charset.

It still moans about LF-only endings on windows.

Thought so, the reason we use text=auto.

I am thinking pre-commit hooks using git or really forcing things down to the client using .gitattributes.

Is there an option for encoding in gitattributes? I'm not aware of one.

@ghost
Copy link

ghost commented Jun 11, 2018

You could write your own text converter in bash using sed to remove the BOM. That is Linux though.

Guys over here are saying pre-commit hooks are the way: https://stackoverflow.com/questions/15780867/is-it-possible-to-make-git-ignore-utf-8-bom-when-commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants