Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Nov 12, 2021

Multipart requests allow HTTP headers to be set on individual form items.
This pull request allows multipart file uploads to specify those additional headers on a per-file basis.

Similar pull request against the Starlette project: Kludex/starlette#1311

Edit by @tomchristie: Updated description

if "Content-Type" in headers:
raise ValueError(
"Content-Type cannot be included in multipart headers"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we don't need to do this check.

It's odd behaviour for the developer to set the content_type and then override it with the actual value provided in the custom headers. But it's not broken.

My preference would be that we don't do the explicit check here. In the case of conflicts I'd probably have header values take precedence.

I'm not absolute on this one, but slight preference.

Copy link
Contributor Author

@adriangb adriangb Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I thought it'd be a good idea to check what requests does here. It looks like it silently ignores the header in the header. That is:

requests.post("http://example.com", files=[("test", ("test_filename", b"data", "text/plain", {"Content-Type": "text/csv"}))])

Gets sent as text/plain.

Digging into why this is the case, it seems like it's just an implementation detail in urllib3. It happens here.

I'm not sure what the right thing to do here is, but if you feel like it's best to go with no error and making header values take precedence, I'm happy to implement that.

Another alternative would be to have the 3rd parameter be either a string representing the content type or a headers dict. We can't really make the 3rd parameter always be a headers dict because that would be a breaking change for httpx.
This would eliminate the edge case, but deviates from requests' API. It seems pretty reasonable that if I'm specifying headers I'm doing advanced stuff and so specifying the content type in the headers directly would not be an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the right thing to do here is, but if you feel like it's best to go with no error and making header values take precedence, I'm happy to implement that.

I reckon let's do that, yeah.

Another alternative would be to have the 3rd parameter be either a string representing the content type or a headers dict. We can't really make the 3rd parameter always be a headers dict because that would be a breaking change for httpx.

I actually quite like that yes, neat idea. The big-tuples API is... not helpful really. But let's probably just go with the path of least resistance here. Perhaps one day we'll want an httpx 2.0, where we gradually start deprecating the various big-tuples bits of API in favour of a neater style.

Copy link
Contributor Author

@adriangb adriangb Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon let's do that, yeah.

👍 donzo

Another alternative would be to have the 3rd parameter be either a string representing the content type or a headers dict. We can't really make the 3rd parameter always be a headers dict because that would be a breaking change for httpx.

I actually quite like that yes, neat idea. The big-tuples API is... not helpful really. But let's probably just go with the path of least resistance here. Perhaps one day we'll want an httpx 2.0, where we gradually start deprecating the various big-tuples bits of API in favour of a neater style.

Agreed! I added a comment in the code explaining the reasoning behind the big tuple API (inherited from requests) and how we might want to change it in the future.

filename, fileobj = value # type: ignore
else:
# corresponds to (filename, fileobj, content_type, headers)
headers = {k.title(): v for k, v in headers.items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should .title() case here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I see the comparison case. Huh. Fiddly.

if content_type is not None and "Content-Type" not in headers:
# note that unlike requests, we ignore the content_type
# provided in the 3rd tuple element if it is also included in the headers
# requests does the opposite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay maybe we should instead do it the other way. If the 4-tuple is used, just ignore the content_type variable. That'd be okay enough, matches requests more closely, and we can forget about fiddly case-based header checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requests does the opposite: it ignores the header in the 4th tuple element. so we'll still need the case-based header checking if we want to do exactly what requests does. either way, we need to know if the content type header exists in the 4th element tuple so we can either ignore the 3rd element or overwrite it with the 3rd element.

Copy link
Contributor

@lovelydinosaur lovelydinosaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this pull request except that I would rather we don't force-change the casing on the headers. That introduces a hidden little bit of behaviour surprise that I'd rather avoid.

Obvs we do still want to do a case-insensitive comparison for the Content-Type case tho.

if content_type is None:
content_type = guess_content_type(filename)

if content_type is not None and "Content-Type" not in headers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps...

has_content_type_header = any(["content-type" in key.lower() for key in headers])
if content_type is not None and not has_content_type_header:
    ...

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted it to any("content-type" in key.lower() for key in headers) (so it'll stop early).
Also removed the {header.title() ...} line.

@lovelydinosaur lovelydinosaur changed the title feat: allow specification of additional headers in multipart/form-data requests Allow custom headers in multipart/form-data requests Jan 13, 2022
@lovelydinosaur
Copy link
Contributor

Lovely stuff. 👍

@lovelydinosaur lovelydinosaur merged commit 0f1ff50 into encode:master Jan 13, 2022
@adriangb adriangb deleted the multipart-advanced branch January 13, 2022 09:16
@lovelydinosaur lovelydinosaur mentioned this pull request Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants