-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 characters getting problems #657
Comments
I add some more debug info, textPlain downloaded over 3.1.0:
The same email downloaded over any version newer, for ex 4.5.1:
In this attachment is the raw email from IMAP server (downloaded from IMAP server by SCP), sended by MS Outlook, recieved by Postfix, IMAP use DOVECOT: |
I have fixed this issue by changing:
to:
in Mailbox.php:1490 Not sure if this is the right solution, but it works for me... |
Just to explain, it looks like
returns in $elements->charset default in case it's not, so this handle this from another elements... |
I can confirm this bug. Email parts with i.e. iso-8859-1 (Windows-1252) encoding are not utf-8, but iso-8859-1. I looks like there has been a related merge and revert on Nov 16, 2020: More Info in this discussion: #525 (comment) I hope this helps. |
I tested it with quite a lot emails. Most iso-8859-1 emails are fixed with the above patch. However, I still found some iso-8859-1 emails, that get 'default' as charset from imap_mime_header_decode in decodeMimeStr. Maybe it is better to fix it this way in DataPartInfo.php:111? At least this works for me with all tested emails. protected function convertEncodingAfterFetch(): string
{
if (isset($this->charset) && !empty(\trim($this->charset))) {
$this->data = $this->mail->decodeMimeStr(
(string) $this->data // Data to convert
);
+ $this->data = $this->mail->convertToUtf8(
+ $this->data,
+ $this->charset
+ );
+ $this->charset = 'utf-8';
}
return (null === $this->data) ? '' : $this->data;
} |
I can confirm, this fixes encoding issues for me. I think this library desperately needs unit tests or similar that load a set of test emails and then checks whether they encoded correctly. During the last versions, encoding issues that were already fixed reappeared multiple times. |
Thanks for your code improvements, testings and feedback. I've created a merge request with your changes, but I need to check, if we can implement it like this and if we instead can / need to remove something else as it doesn't make any sense anymore then. Feel free to comment and provide feedback on the above linked merge request. :) Yes, I agree, we need some unit tests, but I'm unfortunately not that familiar with writing unit tests, so I'm not really sure, how we can properly test this. Feel free to create a merge request with some or let me know, how we can define such a good unit test, then I'll add it to the above merge request. Edit: We already have some decoding tests: https://github.com/barbushin/php-imap/blob/master/tests/unit/MailboxTest.php#L696-L731 They are may just not working as expected or we need to test this somehow else? Or we need more test cases? 🤔 |
I've just released a new version 4.5.3 with the above fix and some unit tests. |
Thank you for this library, I am using it for years, I was using up to version 3.1.0 but now I wanted to upgrade to 4.5 ang get problems with UTF-8 characters.
Email with UTF-8 characters - these characters are unreadable.
The used code:
I tried to test any version newer than 3.1.0 but without success, only till 3.1.0 works.
Thank you.
The text was updated successfully, but these errors were encountered: