Skip header area? #59

msdemlei · 2023-02-09T12:22:46Z

RFC 1341 says about multipart payloads:

This [the multipart content-type] indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty...

I read that as: "a multipart payload may have a header area". As a matter of fact, when you construct a multipart payload using email.mime.multipart.MIMEMultipart and then calling as_string(), you get:

MIME-Version: 1.0
Content-Type: multipart/form-data; boundary="========== bounda r y 930"

--========== bounda r y 930\nMIME-Version: 1.0
Content-Type: application/octet-stream
...

The multipart parser from the old cgi module would correctly grok that when I uploaded it (provided I arranged for the content-type to be in the HTTP header). python-multipart (tried 0.0.5) does not and fails when it sees the M of the MIME-Version.

I appreciate the "as browsers send it" part of multipart's rationale; however, being somewhat friendly to machine-generated multipart messages would, I think, be a friendly gesture, and it might prevent breakage as people move from cgi's FieldStorage (used, e.g., in twisted) to python-multipart.

What I think should be done: in MultipartParser's _internal_write, we should blindly skip everything until the MIME header area is consumed, i.e., until we have found a CRLFCRLF sequence. Would you consider such a PR?

The text was updated successfully, but these errors were encountered:

defnull · 2024-10-11T07:13:55Z

RFC 7578 would be the relevant standard for multipart/form-data and that states "Each part MUST contain a Content-Disposition header field" in section 4.2.

msdemlei · 2024-10-11T09:19:41Z

On Fri, Oct 11, 2024 at 12:14:16AM -0700, Marcel Hellkamp wrote: RFC 7578 would be the relevant standard for `multipart/form-data` and that states: "Each part MUST contain a Content-Disposition header field" in section 4.2.

Ummm... but this is not about header areas of the parts, so I'm not sure what you are arguing for or against. After all these months, however, I only half-remember my rationale. What happens is that MIMEMultipart.as_string() includes header fields in the payload, and that uploading such a thing to the old cgi module used to work. Now that I bother to look, this is because they have this code: # Ensure that we consume the file until we've hit our inner boundary while (first_line.strip() != (b"--" + self.innerboundary) and first_line): first_line = self.fp.readline() self.bytes_read += len(first_line) multipart does not and will therefore fail if there is leading junk. True, dumping MIMEMultipart.as_string's result into the POST payload is an error, but apparently an error you could get away with in several contexts. You can argue that cgi is over-lenient here, and re-thinking the matter, I think I would agree. Still, not breaking things that used to work with cgi would seem to be of a certain value. Does anyone have strong feelings as to whether or not to do cgi's skip-to-boundary thing?

defnull · 2024-10-11T10:36:59Z

Ummm... but this is not about header areas of the parts, so I'm not sure what you are arguing for or against.

Ahh, I misunderstood. Stuff in front of the first boundary is fine, it's strange but allowed and should also be accepted by most parsers. But in your example there are more issues. The boundary must end in \r\n not just \n and a MIME-Version: 1.0 header is not allowed within parts.

I'm just a guest here and do not speak for the maintainers, but cgi.FieldStorage is very old and deprecated for very good reasons. New parsers should not inherit the flaws of a 29 years old library just to be compatible with broken clients. Accepting broken input may even be a security risk in some cases. Cleaning up this mess is long overdue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip header area? #59

Skip header area? #59

msdemlei commented Feb 9, 2023

defnull commented Oct 11, 2024 •

edited

Loading

msdemlei commented Oct 11, 2024 via email

defnull commented Oct 11, 2024

Skip header area? #59

Skip header area? #59

Comments

msdemlei commented Feb 9, 2023

defnull commented Oct 11, 2024 • edited Loading

msdemlei commented Oct 11, 2024 via email

defnull commented Oct 11, 2024

defnull commented Oct 11, 2024 •

edited

Loading