MimePart.to_string() can produce different output depending on MimePart.was_changed() #194

mhahnenberg · 2018-05-14T19:37:41Z

RFC1341 states:
"The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long."

If we pass a long 7bit text part with simple '\n' in it, it will be encoded as quoted-printable which leads to invalid output. This is due to two things:

has_long_lines() uses str.splitlines(), which splits universal newlines (which include '\n'), thus reducing the perceived length of the lines (even though they're not considered line breaks in the final quoted-printable CTE). Instead I believe it should use str.split('\r\n').
Even after fixing this, we still pass quoted-printable as the preferred encoding inside _choose_text_encoding even though the line length is too long. I think we should either prefer base64 for text parts with long lines (simpler implementation, but base64 incurs a higher overhead) or convert the long lines to a quoted-printable compatible form using "soft" line breaks.

Relevant code is here: https://github.com/mailgun/flanker/blob/master/flanker/mime/message/part.py#L657

The text was updated successfully, but these errors were encountered:

mhahnenberg · 2018-05-14T19:42:10Z

I can make these changes if all of this makes sense, I just wanted to discuss it first and make sure I'm understanding everything correctly.

horkhe · 2018-05-14T19:46:58Z

All is reasonable. I would prefer the soft line breaks option. @b0d0nne11 what do you think?

mhahnenberg · 2018-05-14T22:14:08Z

After some more investigation it appears quopri.encodestring does the correct thing with line lengths > 76, so that removes the need to handle the 2nd bullet. I assumed that it didn't bother with line length-related things because the documentation for the quopri module didn't explicitly mention it.

That leaves just the first bullet, which is a much easier change 😄

mhahnenberg · 2018-05-15T12:20:38Z

Okay, so I did even more investigation and I think the original problem I described doesn't exist. I think the actual issue is that when we first scan a MIME string the resulting MimePart(s) return False for was_changed(), so when we call message.to_string() we just return the string as it was scanned in without re-encoding it. This avoids all the logic of detecting which CTE should be used. The following test demonstrates this:

def text_long_line_quoted_printable():
    # TEXT_LONG_LINES is a string containing a MIME message with lines 
    # longer than allowed for quoted-printable.
    message = scan(TEXT_LONG_LINES)

    # Without the following lines this test will fail.
    # message._container._body_changed = True
    # assert message.was_changed(ignore_prepends=True)
    print(message.to_string())
    eq_('quoted-printable', message.content_encoding.value)
    eq_('ascii', message.content_type.get_charset())

Is this the desired behavior?

horkhe · 2018-05-15T13:38:55Z

@mhahnenberg the idea is that if we do not need to make changes to the mime the we return it as is. This is intentional. Originally we did encode message in to_string even if it was not changed, but some customers complained (do not remember why though) and we introduced this is_changed behavior.

@anton-efimenko may be you remember what why customers complained?

mhahnenberg · 2018-05-15T16:19:51Z

You're going to laugh at me, and I'm really sorry for continually changing what this issue is about, but this has been a very tricky bug for me to figure out 😞 However, I think I might have it now (hopefully).

flanker hardcodes line endings as CRLF by default. To encode messages as quoted-printable flanker uses quopri which tries to automatically detect which line ending to use for soft line breaks based on its input (see https://bugs.python.org/issue20121). quopri must generate soft line breaks for long lines to obey the 76 character limit imposed by the quoted-printable standard.

By default quopri assumes LF unless it runs into a CRLF first. flanker encodes each part's body in isolation, so if a part's body contains only LF line endings (or no line endings at all) quopri will assume that it should use =\n for soft line breaks when in reality we always want =\r\n since CRLF is hardcoded in flanker and it is unwise to mix line endings.

I think a quick fix for this would be for flanker to trick quopri by prepending a CRLF to the part's body and then removing it after the encoding has been performed.

mhahnenberg changed the title ~~Handling of long lines in text parts can violate RFC1341~~ Round-trip parsing doesn't produce the same output May 15, 2018

mhahnenberg changed the title ~~Round-trip parsing doesn't produce the same output~~ MimePart.to_string() can produce different output depending on MimePart.was_changed() May 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MimePart.to_string() can produce different output depending on MimePart.was_changed() #194

MimePart.to_string() can produce different output depending on MimePart.was_changed() #194

mhahnenberg commented May 14, 2018 •

edited

Loading

mhahnenberg commented May 14, 2018

horkhe commented May 14, 2018

mhahnenberg commented May 14, 2018 •

edited

Loading

mhahnenberg commented May 15, 2018 •

edited

Loading

horkhe commented May 15, 2018

mhahnenberg commented May 15, 2018

MimePart.to_string() can produce different output depending on MimePart.was_changed() #194

MimePart.to_string() can produce different output depending on MimePart.was_changed() #194

Comments

mhahnenberg commented May 14, 2018 • edited Loading

mhahnenberg commented May 14, 2018

horkhe commented May 14, 2018

mhahnenberg commented May 14, 2018 • edited Loading

mhahnenberg commented May 15, 2018 • edited Loading

horkhe commented May 15, 2018

mhahnenberg commented May 15, 2018

mhahnenberg commented May 14, 2018 •

edited

Loading

mhahnenberg commented May 14, 2018 •

edited

Loading

mhahnenberg commented May 15, 2018 •

edited

Loading