[Python] Skip utf8 decoding for specific Content-Types in py3 #206

g-bon · 2018-06-02T21:40:34Z

In rest.py if PY3 is used and _preload_content is True, the response content is always decoded as utf8. This doesn't make sense for non-text files, e.g. an api that returns a file.

if six.PY3:
    r.data = r.data.decode('utf8')

openapi-generator/modules/openapi-generator/src/main/resources/python/rest.mustache

Line 213 in b594262

r.data = r.data.decode('utf8')

I welcome any suggestion on how to tackle this but the two solution that come to mind are:

Check on the content_type and make a whitelist/blacklist of content types that should be decoded to utf8.

A simple example would be something like

if six.PY3 and headers.get('Content-Type') not in ['application/octet-stream']:
    r.data = r.data.decode('utf8')

Where obviously the list of Content-Types to "skip" would be a meaningful one and not only octet-stream.

Another, harder approach would be to infer the content type from the actual data but feels fragile and overkill to me. Positive side would be that it would likely do the right thing even if the wrong or no content-type is specified in the response headers.

Happy to work on this when I have some confirmation the problem is here / feedback on the right approach.

The text was updated successfully, but these errors were encountered:

gierschv · 2018-09-10T22:54:25Z

Same issue here (FlatIO/api-client-python#1), I believe a whitelist of mime-types could be helpful here.

g-bon · 2018-09-11T07:22:27Z

@gierschv thanks, nice to have a second opinion on this. Are you already working to fix this upstream for your project (here)?

I wanted to point out that my suggestion needs a change, headers.get('Content-Type') would get the request headers.

Whitelist should be checked against r.urllib3_response.getheader('Content-Type') instead.

I think I can tackle this but I'll need to check how this is solved for other languages in the openapi-generator for consistency.

gierschv · 2018-09-11T07:32:08Z

No, I didn't have time to work on this, I found this issue by looking into another bug.

Another idea for the whitelist: we could have application/octet-stream whitelisted by default and add a vendor extension in the spec to add additional mime types (e.g. x-disable-decoding-content-types: [...]). In my use case, I have at least MIDI/MP3/WAV content types that can be returned by some endpoints.

rienafairefr · 2019-02-13T08:07:24Z

I've pushed a PR that I think addresses this issue, #2134

spacether · 2019-03-01T18:45:35Z

Why not put off the decoding step until we are in the api_client and then decode if needed?
Below is my suggestion on handling response data in Python2 and Python3.
This would make it so the spec controls whether or not we stay in bytes in python3, and the response data "charset" in the Content-Type header controls what encoding we use to decode. Charset defines the response encoding per: https://www.w3.org/International/articles/http-charset/index

Python3
bytes response, spec=binary -> stay in bytes
bytes response, spec=str -> str -> needs charset/encoding for decoding to str
bytes response, spec=file -> stay in bytes + write
bytes response, spec=Model -> convert to string using charset/encoding then use json.loads with charset/encoding, use headers + default to utf-8 (this is needed for python 3.3+, its json.loads can't accept binary data)
bytes response with _preload_content=False return bytes because that is what we started with

Python2
str response, spec=binary -> stay in str (in python2 bytes is an alias for str in python >=2.6)
str response, spec=str -> str stay in str
str response, spec=file -> stay in str + write
str response, spec=Model -> json needs charset/encoding for json.loads, use headers +default to utf-8
str response, with _preload_content=False return str because that is what we started with

Optionally, in python2 we could also use the python2 backport of the python3 bytes class from the future library: https://python-future.org/what_else.html#bytes though I think that returning instances of that class may confuse people. My preference would be to use the backport class, as it unifies how the type behaves in python2 and python3.
What do you think? @wing328 @taxpon @frol @mbohlool @cbornet @kenjones-cisco @tomplus @Jyhess

spacether · 2020-04-26T20:35:05Z

This issue has been resolved by #5679
Thank you for your hard work @CrshOverride!

g-bon · 2020-04-26T22:40:37Z

Thanks @CrshOverride

jimschubert added the Client: Python label Jun 10, 2018

rienafairefr mentioned this issue Feb 12, 2019

[core][Python] rest client: handle multiple response types #2134

Closed

4 tasks

vmorris mentioned this issue Dec 19, 2019

[BUG] [PYTHON] Python client decodes file to string and fails to write bytes #4847

Closed

6 tasks

CrshOverride mentioned this issue Mar 23, 2020

[Python] Fix Python UTF-8 Encoding Issue #5679

Merged

5 tasks

spacether closed this as completed Apr 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Skip utf8 decoding for specific Content-Types in py3 #206

[Python] Skip utf8 decoding for specific Content-Types in py3 #206

g-bon commented Jun 2, 2018 •

edited

Loading

gierschv commented Sep 10, 2018

g-bon commented Sep 11, 2018 •

edited

Loading

gierschv commented Sep 11, 2018 •

edited

Loading

rienafairefr commented Feb 13, 2019

spacether commented Mar 1, 2019 •

edited

Loading

spacether commented Apr 26, 2020 •

edited

Loading

g-bon commented Apr 26, 2020

[Python] Skip utf8 decoding for specific Content-Types in py3 #206

[Python] Skip utf8 decoding for specific Content-Types in py3 #206

Comments

g-bon commented Jun 2, 2018 • edited Loading

gierschv commented Sep 10, 2018

g-bon commented Sep 11, 2018 • edited Loading

gierschv commented Sep 11, 2018 • edited Loading

rienafairefr commented Feb 13, 2019

spacether commented Mar 1, 2019 • edited Loading

spacether commented Apr 26, 2020 • edited Loading

g-bon commented Apr 26, 2020

g-bon commented Jun 2, 2018 •

edited

Loading

g-bon commented Sep 11, 2018 •

edited

Loading

gierschv commented Sep 11, 2018 •

edited

Loading

spacether commented Mar 1, 2019 •

edited

Loading

spacether commented Apr 26, 2020 •

edited

Loading