-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure consistent manifest encoding #1863
Comments
Related to #1666 |
@megamorf this is a great suggestion! |
@denelon Before we get to some sort of auto-fix, can we fix the incorrectly encoded files manually for now? If so, can we open 1 PR that fixes the encoding for all files, or would you still prefer 1 package per PR? P.S. The us-ascii encoded files aren't an issues, as us-ascii is a subset of utf-8. |
@JamieMagee I'm not a member of the winget team but in my opinion a single PR would be better. |
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (MiKTeX 2.9.7440 64-bit) Sorry, but pdflatex did not succeed. The log file hopefully contains the information to get MiKTeX going again: C:\Users\sai\AppData\Local\MiKTeX\2.9\miktex\log\pdflatex.log Sorry, but "MiKTeX Compiler Driver" did not succeed. The log file hopefully contains the information to get MiKTeX going again: C:\Users\sai\AppData\Local\MiKTeX\2.9\miktex\log\texify.log |
I closed #2120 since it's a dump. Besides I hope the Tools/YamlCreate.ps1 can change to UTF-8 output... |
This is unnecessary because Github already does this automatically. The automatic conversion just hasn't yet been configured for this repository using the existing gitattributes file here: Add the below line to .gitattributes and Github will automatically convert/normalize files for every checkout and commit without requiring a bot to checkout and recommit the files:
|
You're right. That's both a simple and an elegant solution. 👍 Note: |
@dmex I'm not sure that is a complete solution. My reading of Also this line from the documentation makes it sound like we should just re-encode the files.
|
I know. I just don't care to differentiate since having to explain Git vs Github is a waste of time.
The files are already encoded internally using UTF-8 and that won't ever change. This just enforces consistent checkin and checkout encoding for all files on all platforms. UTF8 encoding supports 4-bytes per character identical to UTF16 and it's common these days to store encoded UTF16 inside UTF8.
Yes. Files not already UTF16 will be encoded as UTF16 on checkout.
Yes? If the file was committed before the adding the
After this is done you won't have to ever covert or normalize file encodings when checking in or checking out files. |
This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. |
Instead, I recommend:
Outstanding questions:
|
@riverar do you see any reason not to do both? It might reduce the number of undesirable encodings being submitted with PRs. Thanks for the information on the distinction between client-side and service side validation. I'll talk with the team about this. |
@denelon If you're referring to the use of On further thought, I think you should require a byte order mark (BOM) be used regardless of encoding you land on. This will greatly simplify your automated checks at PR time. If the manfiest does not start with the appropriate BOM, reject immediately. |
Description of the new feature/enhancement
Right now the encoding of the manifests is quite heterogenous:
I'd welcome a consistent encoding of utf-8 for all manifests.
Some users even managed to submit binary encoded files (I assume utf-16). Here's one recent example: https://github.com/microsoft/winget-pkgs/pull/1653/files that @KevinLaMS came across.
Proposed technical implementation details (optional)
Have a bot automatically convert file encodings to utf-8 as post-commit hook or have a scheduled Azure DevOps pipeline fix wrongly encoded files at regular intervals.
File lists
text/plain; charset=us-ascii
text/plain; charset=iso-8859-1
text/plain; charset=utf-16le
text/plain; charset=utf8
The text was updated successfully, but these errors were encountered: