-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve #405. Unicode please!!! #406
Conversation
Hi David Thanks for this. Do you think we should run something like this over the entire repository? I tested it and it does not break any tests on my side. class Program
{
static void Main(string[] args)
{
var inputDirectory = @"c:\code\windsor"; // <- change to your own checkout path
WriteUTF8WithoutBOM(inputDirectory, "*.cs");
WriteUTF8WithoutBOM(inputDirectory, "*.xml");
WriteUTF8WithoutBOM(inputDirectory, "*.config");
WriteUTF8WithoutBOM(inputDirectory, "*.csproj");
WriteUTF8WithoutBOM(inputDirectory, "*.props");
Console.ReadLine();
}
private static void WriteUTF8WithoutBOM(string inputDirectory, string searchPattern)
{
foreach (var file in Directory.GetFiles(inputDirectory, searchPattern, SearchOption.AllDirectories))
{
if (file.Contains(".git\\")) continue;
if (file.Contains(".vs\\")) continue;
Console.WriteLine(file);
Encoding utf8WithoutBom = new UTF8Encoding(false);
var fileContents = File.ReadAllText(file, utf8WithoutBom);
var streamWriter = new StreamWriter(file, false, utf8WithoutBom);
streamWriter.Write(fileContents);
streamWriter.Dispose();
}
}
} Also don't forget to add a CHANGELOG.md entry. |
We don't have to remove the dot if the file is encoded properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should maybe fix all the files. There are some files with UTF8 BOMS in them too.
Please see: #406 (comment)
@Fir3pho3nixx , good idea. I have updated the PR. |
@davidshen84 - Don't forget to make a note in the CHANGELOG.md |
@Fir3pho3nixx why did you want to remove BOMs? They are valid as part of the Unicode standard and not hurting anyone; Visual Studio is actually the one saving them there, not sure if the new versions of VS still adds them to new files. I don't see a reason to mention this in the changelog, unless the change affects users of the library. If I recall looking at the debugger views in Windsor the dots might be those fancy angle brackets, I recall seeing them somewhere in Windsor debugger views (added in 2011: d7ff1da). I see no reason not to just make this function output exactly what you'd type as a C# developer, no need for fancy characters. |
If you need a (sort of) good excuse to normalize BOMs across the whole codebase, you could just update the year to 2018 in the copyright (if that's in fact something that needs to be done at all)... and do the BOM adjustment as an aside. ;-) |
Oh dear this was never about BOM’s. Sorry if that is what you thought. The code I supplied actually converts ANSI to UTF8 but happens to remove BOM’s. I thought this was a good approach as BOM’s infer things about character streams. @jonorossi This actually a defect which prevents people with non standard encoding from building the solution. So I thought it was a fix of some description. Updating the change log is neither here not there. As long as it builds! Sweet. @stakx I am not sure I follow. License headers and UTF encoding of source files that allow this solution to be built in Asia are very different. The real problem is we have some ANSI duff ruining the party. |
@Fir3pho3nixx, please disregard my remark, I was half joking. (But to explain what I meant, I was also under the impression that this was about BOMs mainly, and since the diffs already affect the first line of each code file, which happens to contain the copyright line, you could've killed two birds with one stone but put the focus on only one of those. Sneaky, and not altogether serious.) |
@stakx - Awww man, if only the BOM's were distributed across all code files. Then we could update the headers! :) |
CHANGELOG.md
Outdated
@@ -5,6 +5,7 @@ | |||
Enhancements: | |||
- Added XML documentation to BeginScope and RequireScope lifetime extensions (@jonorossi) | |||
- Upgraded build to use NUnit Adapters (@fir3pho3nixx, #243) | |||
- Remove BOM from all source code files (@davidshen84, #405) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should say:
Fixed encoding problem where builds fail under GB2312 character encoding system
Yeah~the discussions carried me away, and I forgot why I opened this bug in
the first place :)
…On Tue, Jun 5, 2018 at 8:07 AM Gavin van der Merwe ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In CHANGELOG.md
<#406 (comment)>:
> @@ -5,6 +5,7 @@
Enhancements:
- Added XML documentation to BeginScope and RequireScope lifetime extensions ***@***.***)
- Upgraded build to use NUnit Adapters ***@***.***, #243)
+- Remove BOM from all source code files ***@***.***, #405)
Maybe we should say:
Fixed encoding problems prevented build under GB2312 character encoding
system
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#406 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEAzQk48WdvI8hf5aY0JhUygPDI-dKoks5t5cvUgaJpZM4UXk4N>
.
|
Including this request in the PR referenced above. |
Guys, should we put the BOM'S back? @davidshen84 sorry if the answer is yes. I will submit a PR to your fork if this is the case. Are we happy to unify on UTF-8 with BOM'S? |
Interesting to note we have +/-624 files with BOM's and 1508 without BOM's. |
What endianness are you referring to? UTF-8 does not have any endianness, it's a byte stream. Endianness would only be applicable with UTF-16 or UTF-32 (AFAIK). |
Quite telling, right? |
It infers endian nature for character streams. |
I am still not sure I understand what you're referring to. Unlike with UTF-16 (where the BOM can be the byte sequence |
Yes and with UTF-8 it is 0xEF,0xBB,0xBF. |
As to the question of whether to reverse the BOM change or not: I was just passing by, so I'll leave the decision up to you folks. (My purely personal opinion is that the the IT industry has got to a point where tools can finally be expected to handle UTF-8 properly. We should no longer have to adjust to broken tooling, so removing the BOM for that reason seems like the wrong incentive. The broken tooling should be made UTF-ready instead. On a second note, the presence of a UTF-8 BOM can help tools recognize UTF-8 encoding, but today most good tooling probably assumes that encoding by default, anyway. So whether the BOM is present or not shouldn't make much difference, as long as the characters in the file are actually encoded as UTF-8. I agree that from an "aesthetic" viewpoint, it feels nice to have uniformity, but it's probably not worth spending too much time on the matter. It's fine either way.) |
Cool. This PR just grew to a change set over 624 files to one of
Pick one. |
@davidshen84 - Can you please merge my PR into your master branch? |
Reverse the BOM
@Fir3pho3nixx, merged. I think we should have opened another issue to discuss this BOM problem. This issue was only about a few files that were in ANSI encoding. |
True. I just felt encoding was a valid way to bring this up.
1 from what I could tell. |
Sorry for all the stress people! |
@Fir3pho3nixx - Please accept my apologies if any of my above statements upset you... that certainly wasn't my intention at all! While I don't see any immediate need for it, I can still appreciate why you'd like to harmonize BOMs across the codebase. I'll stay out of this. Again, sorry if I upset you. |
Never. You can tell me off anytime! :)
No. You are always welcome. |
Oops, yer this thread appears to have gone a bit off track. Looking at your commit that matters (62e2461), @stakx has made this fix in #407 along with others to I've logged #408 to continue the BOM discussion. |
Sorry sorry sorry |
I converted that peculiar dot into utf-8 encoding, and the solution can build on my Windows 10 64 bit system which has GB2312 as the default character encoding system.
Also, if possible, can we remove that dot?