Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 characters getting problems #657

Closed
JiriSch opened this issue Jan 26, 2022 · 8 comments · Fixed by #666
Closed

UTF-8 characters getting problems #657

JiriSch opened this issue Jan 26, 2022 · 8 comments · Fixed by #666
Labels
needs investigation This will be tested / debugged or checked out.

Comments

@JiriSch
Copy link

JiriSch commented Jan 26, 2022

Thank you for this library, I am using it for years, I was using up to version 3.1.0 but now I wanted to upgrade to 4.5 ang get problems with UTF-8 characters.

  • PHP IMAP version: tested anything newver than 3.1.0
  • PHP Version: 7.4
  • Type of execution: web server

Email with UTF-8 characters - these characters are unreadable.

The used code:

	$mailbox = new PhpImap\Mailbox('{xxxxxx:143/imap4/notls/novalidate-cert}INBOX', 'xxxxx', 'xxxx');
	var_dump($mailbox->getServerEncoding()); //---GETTING UTF-8---
	$mailsIds = $mailbox->searchMailbox('ALL');
	if(!$mailsIds) return 0;
	foreach ($mailsIds as $key=>$mailsId)
	{
		$mail = $mailbox->getMail($mailsId);
		//var_dump($mail);
		
		var_dump($mail->textHtml);
	}

I tried to test any version newer than 3.1.0 but without success, only till 3.1.0 works.

Thank you.

@JiriSch JiriSch added the needs investigation This will be tested / debugged or checked out. label Jan 26, 2022
@JiriSch
Copy link
Author

JiriSch commented Jan 26, 2022

I add some more debug info, textPlain downloaded over 3.1.0:

With interpunction: ěščřžýáíé Without: escrzyaie

The same email downloaded over any version newer, for ex 4.5.1:

With interpunction: �������� Without: escrzyaie

In this attachment is the raw email from IMAP server (downloaded from IMAP server by SCP), sended by MS Outlook, recieved by Postfix, IMAP use DOVECOT:
EmailTest.txt

@JiriSch
Copy link
Author

JiriSch commented Jan 27, 2022

I have fixed this issue by changing:

foreach ($elements as $element) {
$newString .= $this->convertToUtf8($element->text, $element->charset);
}

to:

foreach ($elements as $element) {
if ($element->charset!='default') $this->decodeMimeStrDefaultCharset = $element->charset;
$newString .= $this->convertToUtf8($element->text, $element->charset);
}

in Mailbox.php:1490

Not sure if this is the right solution, but it works for me...

@JiriSch
Copy link
Author

JiriSch commented Jan 27, 2022

Just to explain, it looks like

$elements = \imap_mime_header_decode($string);

returns in $elements->charset default in case it's not, so this handle this from another elements...

@christianasche
Copy link
Contributor

christianasche commented Feb 9, 2022

I can confirm this bug.

Email parts with i.e. iso-8859-1 (Windows-1252) encoding are not utf-8, but iso-8859-1.
The suggested patch works for me.

I looks like there has been a related merge and revert on Nov 16, 2020:
#551 #558

More Info in this discussion: #525 (comment)

I hope this helps.

@christianasche
Copy link
Contributor

christianasche commented Feb 11, 2022

I tested it with quite a lot emails. Most iso-8859-1 emails are fixed with the above patch. However, I still found some iso-8859-1 emails, that get 'default' as charset from imap_mime_header_decode in decodeMimeStr.

Maybe it is better to fix it this way in DataPartInfo.php:111? At least this works for me with all tested emails.

    protected function convertEncodingAfterFetch(): string
    {
        if (isset($this->charset) && !empty(\trim($this->charset))) {
            $this->data = $this->mail->decodeMimeStr(
                (string) $this->data // Data to convert
            );

+            $this->data = $this->mail->convertToUtf8( 
+                $this->data, 
+                $this->charset 
+            );
+            $this->charset = 'utf-8';
        }

        return (null === $this->data) ? '' : $this->data;
    }

@hrst
Copy link

hrst commented Mar 5, 2022

I tested it with quite a lot emails. Most iso-8859-1 emails are fixed with the above patch. However, I still found some iso-8859-1 emails, that get 'default' as charset from imap_mime_header_decode in decodeMimeStr.

Maybe it is better to fix it this way in DataPartInfo.php:111? At least this works for me with all tested emails.

I can confirm, this fixes encoding issues for me.

I think this library desperately needs unit tests or similar that load a set of test emails and then checks whether they encoded correctly. During the last versions, encoding issues that were already fixed reappeared multiple times.

@Sebbo94BY
Copy link
Collaborator

Sebbo94BY commented Mar 7, 2022

Thanks for your code improvements, testings and feedback.

I've created a merge request with your changes, but I need to check, if we can implement it like this and if we instead can / need to remove something else as it doesn't make any sense anymore then.

Feel free to comment and provide feedback on the above linked merge request. :)

Yes, I agree, we need some unit tests, but I'm unfortunately not that familiar with writing unit tests, so I'm not really sure, how we can properly test this. Feel free to create a merge request with some or let me know, how we can define such a good unit test, then I'll add it to the above merge request.

Edit: We already have some decoding tests: https://github.com/barbushin/php-imap/blob/master/tests/unit/MailboxTest.php#L696-L731

They are may just not working as expected or we need to test this somehow else? Or we need more test cases? 🤔

Sebbo94BY pushed a commit that referenced this issue Mar 12, 2022
Sebbo94BY pushed a commit that referenced this issue Mar 12, 2022
Sebbo94BY added a commit that referenced this issue Mar 12, 2022
@Sebbo94BY
Copy link
Collaborator

I've just released a new version 4.5.3 with the above fix and some unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs investigation This will be tested / debugged or checked out.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants