-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"iso-8859-1" (latin1) not "windows1252" #5
Comments
Also, you can open |
And can you remove ? |
Awesome! In fact I did created this repo based on these changes, but I had to step back to original implementation for being able to fix all the tests in here. I do plan to update Latin1 and ISO-8859-1 indexes and I'm sure this will help. Thanks.
I do see it running the project locally, I'm not very interested right now. I will create some pages in the future.
Yeah definitely, also I thought I had already published a version where this was done. Here's a script that checks for the polyfill in the latest version: https://codepen.io/odahcam/pen/abvepmQ?editors=1010 Edit: fixed. It took me a little while to answer, it happens I'm a little busy right now, but I'll keep this work soon. |
You can see the changes, and the comment for this commit: username1565@5eb0906 After this all, I think, we can open Also, I did add some another tests there,
No problems. I think we should not rush anywhere. So, as I said here: #1 (comment) And of course, you can fix it, and add, and do this only in your free time, P.S.: I did add your changes to my fork, fix "CRLF", draft new release, then publish NPM-pachage. |
1. Remove "new TextEncoding.TextEncoder()" and "new TextEncoding.TextDecoder()" from browser tests, and leave just old "new TextEncoder()" and "new TextDecoder()". 2. Define "new TextEncoder()" and "new TextDecoder()" if this is undefined. 3. To override "TextEncoder/TextDecoder", use: <script> window.TextEncoder = undefined; window.TextDecoder = undefined; //or //window.TextEncoder = window.TextDecoder = null; </script> 4. "new TextEncoding.TextEncoder()" and "new TextEncoding.TextDecoder()" working too, when "TextEncoder/TextDecoder" is already defined.
FYI: I'm a little away right now, I pretend to come back in the next semester. |
I'm trying to understand better the key differences between Windows-1252 and ISO-8859-1. I ran into this answer, which is pretty straight forward and interesting: https://stackoverflow.com/a/31800761/4367683 Also, I found this very elegant table which compares characters differences between both encodings: https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html I'd like to let it here for documentation reasons. Is there something else you'd like to add? |
The both those encodings, this was been an extended ASCII. On your picture, I see Also, you can compare the differences, by this way, <script src="https://unpkg.com/@username1565/[email protected]/umd/encoding-indexes.js"></script>
<script src="https://unpkg.com/@username1565/[email protected]/umd/encoding.js"></script>
<script>
// generate latin-1 string
var s = ''; for(var i = 0; i<256; i++){s+= String.fromCharCode(i);} console.log('s: \n'+s); //generate string with all latin-1 characters
// get consecutive bytes by decoding this
var latin1_bytes = new TextEncoding.TextEncoder('iso-8859-1', { NONSTANDARD_allowLegacyEncoding: true }).encode(s); //try to encode this
console.log('latin1_bytes', latin1_bytes); //show this
// generate consecutive bytes as Uint8Array
var allBytes = new Uint8Array(256); for(var i = 0; i<256; i++){allBytes[i] = i;} console.log('allBytes', allBytes); //generate all consecutive bytes
// Decode this as latin-1 encoded string
var latin1 = new TextEncoding.TextDecoder('iso-8859-1', { NONSTANDARD_allowLegacyEncoding: true }).decode(allBytes); //try to decode this as latin-1 string
console.log('latin1: ', latin1, '(latin1 === s)', (latin1 === s)); //show the string and compare it with previous string. //true
// decode bytes as windows-1252 chars
var windows1252 = new TextEncoding.TextDecoder('windows-1252', { NONSTANDARD_allowLegacyEncoding: true }).decode(allBytes); //try to encode bytes as windows-1252 encoded string
console.log('\n\n'+ 'windows1252: ', windows1252); //show the string
// get consecutive bytes by decoding this
var bytes = new TextEncoding.TextEncoder('windows-1252', { NONSTANDARD_allowLegacyEncoding: true }).encode(windows1252); //try to encode this to bytes
console.log('bytes', bytes, '(windows1252 === decoded): ', (windows1252 === new TextEncoding.TextDecoder('windows-1252').decode(bytes))); //show this bytes, encode this back and compare with string
//compare strings, encoded as iso-8859-1 (latin-1) and windows-1252, and write diff
var diff = []; //in empty array with diff
for(var i=0; i<allBytes.length; i++){ //for each byte
if(latin1[i] !== windows1252[i]){ //if symbol is not equal
diff.push({ 'i': i, 'latin-1 char': latin1[i], 'windows1252 char': windows1252[i]}); //write charcode, latin1-char and cp1252-char, as one JSON-object, as item of array.
}
}
console.log("diff: ", JSON.stringify(diff, null, 1)); //show array with differences, as formatted-indented JSON.
</script> And, as result, there is 27 different characters, from your image, charcodes, and the chars for both encodings:
Also, as you can see, some chars of
Anyway, the both encodings are reversive, This makes it possible to work with byte arrays as with strings, |
Look here c1193bd
the changes and tests for
latin-1
encoding.Here https://github.com/username1565/text-encoding/blob/dc7b6481e47e731d3ddae0fb0f4cffe876b1efa9/src/encoding/encodings.ts#L313
latin-1
(and synonyms) is switched towindows-1252
.Test:
UPD:
Seems, like I'm already fixed this, in this commits:
username1565@61d721d
username1565@d438e53
username1565@f2d46a7
username1565@82e412c
username1565@728f622
username1565@30e36c0
username1565@dbfbc88
You can see full differences, by compare across forks: master...username1565:master
The text was updated successfully, but these errors were encountered: