-
-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify FLAG UTF-8 when converting to UTF-8, if there was no explicit FLAG option #25
Comments
Yeah good idea. I do remember thinking about this, but it never came up. Perhaps a send expression in crawl.sh could do the trick. PR welcome! |
Yes, some combination of bash and unix text processing utilities should help. Neither of them are my strong side, so I wouldn't hold breath from a PR by me in the very near future :) |
Shouldn’t this issue be about setting an |
No, it's not enough. At the moment of submisson |
That sounds more complex than I thought... But, then this is a bug in Portuguese though? It should either use ASCII flags, or |
Well, it was so. Now |
Hmm, that still seems like an issue for them though? That should be fixed in the upstream, rather than patched here? |
The issue should be addressed where the dictionaries are converted into UTF-8. My understanding was that it was here, at least partly. If I'm mistaken, then this is a wrong repo indeed :) |
Hunspell read the affix file byte by byte and decodes UTF-8 on demand. If it's not instructed to do so for flags, it doesn't. So non-ASCII characters like "ý" are treated like several characters, and due to another bug Hunspell silently takes just the first character and ignores the rest. So the words can have unexpected flags.
Example:
pt
containsFORBIDDENWORD ý
, and the perfectly valid wordtrabalhar/akYMjLÀÚ
is treated as having this flag and thus considered misspelled.The text was updated successfully, but these errors were encountered: