-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing to detect ISO-8859 encodings #8
Comments
Same problem here. |
Unfortunately I don't have plans to improve the detection quality. Could you share the data you get poor results with? I can take a look. Hopefully there will be some things to do to get around the issue. Thanks. |
See the attached file, which is encoded in Windows-1252 but detected as GB18030. Thanks for helping! |
I'm joining test files and the results I get, as you can see I'm satisfied with the unicode detections (and mostly for asian encodings) but really disappointed by ISO, particularly for western European languages (ISO-1 and 15 for instance, which are really widespread, see:
|
Just on the off-chance... do you have any idea on how this might be tackled; in case somebody else wants to take a crack at it? |
I have good results detecting Unicode encodings and Asian codepages, but really poor results with common European languages files saved in the ISO-8859 family, which are really common and this problem makes compact_enc_det unusable for me.
Encoding is always detected as ASCII (and reliable is set to true) for these encodings.
ISO-8859-6 for Arabic is OK.
Am I the only one?
Thanks for letting me know, so I can check if there is a problem or just look for an alternative.
The text was updated successfully, but these errors were encountered: