-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip header area? #59
Comments
RFC 7578 would be the relevant standard for |
On Fri, Oct 11, 2024 at 12:14:16AM -0700, Marcel Hellkamp wrote:
RFC 7578 would be the relevant standard for `multipart/form-data`
and that states: "Each part MUST contain a Content-Disposition
header field" in section 4.2.
Ummm... but this is not about header areas of the parts, so I'm not
sure what you are arguing for or against.
After all these months, however, I only half-remember my rationale.
What happens is that MIMEMultipart.as_string() includes header
fields in the payload, and that uploading such a thing to the old
cgi module used to work.
Now that I bother to look, this is because they have this code:
# Ensure that we consume the file until we've hit our inner boundary
while (first_line.strip() != (b"--" + self.innerboundary) and
first_line):
first_line = self.fp.readline()
self.bytes_read += len(first_line)
multipart does not and will therefore fail if there is leading junk.
True, dumping MIMEMultipart.as_string's result into the POST payload
is an error, but apparently an error you could get away with in
several contexts.
You can argue that cgi is over-lenient here, and re-thinking the
matter, I think I would agree. Still, not breaking things that used
to work with cgi would seem to be of a certain value.
Does anyone have strong feelings as to whether or not to do cgi's
skip-to-boundary thing?
|
Ahh, I misunderstood. Stuff in front of the first boundary is fine, it's strange but allowed and should also be accepted by most parsers. But in your example there are more issues. The boundary must end in I'm just a guest here and do not speak for the maintainers, but |
RFC 1341 says about multipart payloads:
I read that as: "a multipart payload may have a header area". As a matter of fact, when you construct a multipart payload using email.mime.multipart.MIMEMultipart and then calling as_string(), you get:
The multipart parser from the old cgi module would correctly grok that when I uploaded it (provided I arranged for the content-type to be in the HTTP header). python-multipart (tried 0.0.5) does not and fails when it sees the M of the MIME-Version.
I appreciate the "as browsers send it" part of multipart's rationale; however, being somewhat friendly to machine-generated multipart messages would, I think, be a friendly gesture, and it might prevent breakage as people move from cgi's FieldStorage (used, e.g., in twisted) to python-multipart.
What I think should be done: in MultipartParser's _internal_write, we should blindly skip everything until the MIME header area is consumed, i.e., until we have found a CRLFCRLF sequence. Would you consider such a PR?
The text was updated successfully, but these errors were encountered: