Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Skip utf8 decoding for specific Content-Types in py3 #206

Closed
g-bon opened this issue Jun 2, 2018 · 7 comments
Closed

[Python] Skip utf8 decoding for specific Content-Types in py3 #206

g-bon opened this issue Jun 2, 2018 · 7 comments

Comments

@g-bon
Copy link
Contributor

g-bon commented Jun 2, 2018

In rest.py if PY3 is used and _preload_content is True, the response content is always decoded as utf8. This doesn't make sense for non-text files, e.g. an api that returns a file.

if six.PY3:
    r.data = r.data.decode('utf8')

I welcome any suggestion on how to tackle this but the two solution that come to mind are:

  • Check on the content_type and make a whitelist/blacklist of content types that should be decoded to utf8.

A simple example would be something like

if six.PY3 and headers.get('Content-Type') not in ['application/octet-stream']:
    r.data = r.data.decode('utf8')

Where obviously the list of Content-Types to "skip" would be a meaningful one and not only octet-stream.

  • Another, harder approach would be to infer the content type from the actual data but feels fragile and overkill to me. Positive side would be that it would likely do the right thing even if the wrong or no content-type is specified in the response headers.

Happy to work on this when I have some confirmation the problem is here / feedback on the right approach.

@gierschv
Copy link
Contributor

Same issue here (FlatIO/api-client-python#1), I believe a whitelist of mime-types could be helpful here.

@g-bon
Copy link
Contributor Author

g-bon commented Sep 11, 2018

@gierschv thanks, nice to have a second opinion on this. Are you already working to fix this upstream for your project (here)?

I wanted to point out that my suggestion needs a change, headers.get('Content-Type') would get the request headers.

Whitelist should be checked against r.urllib3_response.getheader('Content-Type') instead.

I think I can tackle this but I'll need to check how this is solved for other languages in the openapi-generator for consistency.

@gierschv
Copy link
Contributor

gierschv commented Sep 11, 2018

No, I didn't have time to work on this, I found this issue by looking into another bug.

Another idea for the whitelist: we could have application/octet-stream whitelisted by default and add a vendor extension in the spec to add additional mime types (e.g. x-disable-decoding-content-types: [...]). In my use case, I have at least MIDI/MP3/WAV content types that can be returned by some endpoints.

@rienafairefr
Copy link
Contributor

I've pushed a PR that I think addresses this issue, #2134

@spacether
Copy link
Contributor

spacether commented Mar 1, 2019

Why not put off the decoding step until we are in the api_client and then decode if needed?
Below is my suggestion on handling response data in Python2 and Python3.
This would make it so the spec controls whether or not we stay in bytes in python3, and the response data "charset" in the Content-Type header controls what encoding we use to decode. Charset defines the response encoding per: https://www.w3.org/International/articles/http-charset/index

Python3
bytes response, spec=binary -> stay in bytes
bytes response, spec=str -> str -> needs charset/encoding for decoding to str
bytes response, spec=file -> stay in bytes + write
bytes response, spec=Model -> convert to string using charset/encoding then use json.loads with charset/encoding, use headers + default to utf-8 (this is needed for python 3.3+, its json.loads can't accept binary data)
bytes response with _preload_content=False return bytes because that is what we started with

Python2
str response, spec=binary -> stay in str (in python2 bytes is an alias for str in python >=2.6)
str response, spec=str -> str stay in str
str response, spec=file -> stay in str + write
str response, spec=Model -> json needs charset/encoding for json.loads, use headers +default to utf-8
str response, with _preload_content=False return str because that is what we started with

Optionally, in python2 we could also use the python2 backport of the python3 bytes class from the future library: https://python-future.org/what_else.html#bytes though I think that returning instances of that class may confuse people. My preference would be to use the backport class, as it unifies how the type behaves in python2 and python3.
What do you think? @wing328 @taxpon @frol @mbohlool @cbornet @kenjones-cisco @tomplus @Jyhess

@spacether
Copy link
Contributor

spacether commented Apr 26, 2020

This issue has been resolved by #5679
Thank you for your hard work @CrshOverride!

@g-bon
Copy link
Contributor Author

g-bon commented Apr 26, 2020

Thanks @CrshOverride

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants