Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip header area? #59

Open
msdemlei opened this issue Feb 9, 2023 · 3 comments
Open

Skip header area? #59

msdemlei opened this issue Feb 9, 2023 · 3 comments

Comments

@msdemlei
Copy link

msdemlei commented Feb 9, 2023

RFC 1341 says about multipart payloads:

This [the multipart content-type] indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty...

I read that as: "a multipart payload may have a header area". As a matter of fact, when you construct a multipart payload using email.mime.multipart.MIMEMultipart and then calling as_string(), you get:

MIME-Version: 1.0
Content-Type: multipart/form-data; boundary="========== bounda r y 930"

--========== bounda r y 930\nMIME-Version: 1.0
Content-Type: application/octet-stream
...

The multipart parser from the old cgi module would correctly grok that when I uploaded it (provided I arranged for the content-type to be in the HTTP header). python-multipart (tried 0.0.5) does not and fails when it sees the M of the MIME-Version.

I appreciate the "as browsers send it" part of multipart's rationale; however, being somewhat friendly to machine-generated multipart messages would, I think, be a friendly gesture, and it might prevent breakage as people move from cgi's FieldStorage (used, e.g., in twisted) to python-multipart.

What I think should be done: in MultipartParser's _internal_write, we should blindly skip everything until the MIME header area is consumed, i.e., until we have found a CRLFCRLF sequence. Would you consider such a PR?

@defnull
Copy link
Contributor

defnull commented Oct 11, 2024

RFC 7578 would be the relevant standard for multipart/form-data and that states "Each part MUST contain a Content-Disposition header field" in section 4.2.

@msdemlei
Copy link
Author

msdemlei commented Oct 11, 2024 via email

@defnull
Copy link
Contributor

defnull commented Oct 11, 2024

Ummm... but this is not about header areas of the parts, so I'm not sure what you are arguing for or against.

Ahh, I misunderstood. Stuff in front of the first boundary is fine, it's strange but allowed and should also be accepted by most parsers. But in your example there are more issues. The boundary must end in \r\n not just \n and a MIME-Version: 1.0 header is not allowed within parts.

I'm just a guest here and do not speak for the maintainers, but cgi.FieldStorage is very old and deprecated for very good reasons. New parsers should not inherit the flaws of a 29 years old library just to be compatible with broken clients. Accepting broken input may even be a security risk in some cases. Cleaning up this mess is long overdue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants