Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary #971

Quidge · 2021-04-28T21:46:48Z

Description

This PR re-implements the same functional change that was shown in a PR from the pre-fork Miserlou/zappa repository.

The change in that PR didn't make it into the Zappa 52 release from what I can tell. My company has been using that PR's code (in combination with the flask_compress library) for a couple weeks without issue so I'm opening this PR that copies that code.

Additional tests are added for coverage. I'm not confident that the test organization that I used is correct or idiomatic for this codebase -- please critique and I'll change if wanted!

GitHub Issues

#908

Checklist I found in the contributors documentation:

Before you submit this PR, please make sure that you meet these criteria:

Did you read the contributing guide?
- Yes.
If this is a non-trivial commit, did you open a ticket for discussion?
- No, a pre-existing ticket can be found here.
Did you put the URL for that ticket in a comment in the code?
- Yes.
If you made a new function, did you write a good docstring for it?
- NA.
Did you avoid putting "_" in front of your new function for no reason?
- NA.
Did you write a test for your new code?
- Yes.
Did the Travis build pass?
- Yes.
Did you improve (or at least not significantly reduce) the amount of code test coverage?
- Yes.
Did you make sure this code actually works on Lambda, as well as locally?
- Yes. My company has been using these functional changes in a fairly standard lambda environment for a couple weeks.
Did you test this code with all of Python 3.6, Python 3.7 and Python 3.8 ?
- I did not test personally, but the CI job that ran on this PR completed successfully after running the suite in 36, 37, and 38.
Does this commit ONLY relate to the issue at hand and have your linter shit all over the code?
- The changes introduced in the commits are not outside the scope of the issue.

paulnicolet · 2021-05-09T16:29:13Z

Awesome, thanks!

Is there anything preventing to merge this ?

Quidge · 2021-05-11T14:32:36Z

@paulnicolet There's nothing blocking that I'm aware of. @jneves could this be merged?

colinhoernig · 2021-05-25T20:58:46Z

Hi @jneves -- we're using this code from @Quidge in a forked version of Zappa because of a need to enable compression, and it's working very well for us. If this PR could be merged in, that would allow us to move off of our fork and back onto the official Zappa project which we're really looking forward to so that we don't have to maintain our own fork. 🙂

Quidge · 2021-06-03T04:50:59Z

@jneves rebased this PR atop the (great!) newer black changes to the codebase.

travnels · 2021-06-09T20:54:46Z

@jneves is this getting any movement? Anything I can do to help get this merged?

travnels · 2021-07-09T17:14:57Z

@jneves we've been using this code on a fork (https://github.com/tackle-io/Zappa) for over 3 months and are eager to get it merged. This enhancement has helped with large payload that exceed the lambda max response size (6 MB), since it allows for us to zip the response in the lambda. Let me know if there's anything I can do to help getting the approved and merge. Thanks!

colinhoernig · 2021-09-14T15:28:55Z

@jneves It's been several months -- we'd really love to get this merged in so we can cut off of our own Zappa fork.

tests/utils.py

zappa/handler.py

tests/test_handler.py

No functional change to the Zappa codebase is introduced by this commit. This area of the application was untested. Tests introduced to ensure new behavior discussed in zappa#908 does not cause a regression.

@monkut

When using _whitenoise_ for caching, which provides compression, binary types may include mimetypes, "text/", "application/json": - response.mimetype.startswith("text/") - response.mimetype == "application/json" Assuming that Content-Encoding will be set (as whitenoise apparently does) this allows compression to be applied by the application for "text/" and "application/json". About Content-Encoding: developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding --- The above commit message and functional code change to zappa/handler.py was written by GH user @monkut and a PR was provided with Miserlou/Zappa#2170. That PR was not merged in before the fork from Miserlou/zappa to Zappa/zappa. This commit copies the code from that PR, adds a comment line referencing the new migrated issue, and additionally adds tests to confirm behavior. Co-authored-by: monkut <[email protected]>

Quidge · 2021-11-29T00:44:50Z

@javulticat would you please re-review when you get a chance?

Behavior I started thinking about when fiddling with this: what do you think should happen when a csv file is sent back to the client, say via flask.send_file()?

A vanilla Flask application (no additional after_request stuff tacked on)'s use of flask.send_file() will not add a Content-Encoding header to the response, but the mimetype would start with text/ if the mimetype is passed manually to the function or if filename ends in .csv (flask.send_file() tries to infer the mimetype from the filename).

The end result is that a file uploaded through flask.send_file('some/file/path.csv') will not be base64 encoded due to the conditional logic that's being added in this PR.

I haven't seen any issues raised around this in the last year or so, but it seems wrong. The csv file would be binary so I think we'd want to encode as base64 for that too. Thoughts?

coveralls · 2021-12-06T23:50:03Z

Coverage increased (+0.08%) to 73.606% when pulling eb2b50f on tackle-io:quidge/908_2 into b5b80cf on zappa:master.

Quidge · 2021-12-07T17:11:15Z

:| I'm trying to 'resolve' a pending unresolved conversation because it appears to be a merge blocker:

I addressed this with an update to the test files then rebased my changes to keep the PR to two commits (that may have been a mistake -- I was trying to keep the commits atomic).

Now I can no longer reference the conversation to mark it as resolved.

Not sure what to do here?

javulticat · 2021-12-08T18:48:31Z

@Quidge conversation resolved - thanks for improving the tests!

I'm not sure I fully understand the question being asked around CSVs. Could you try restating?

One point that may or may not help answer the question is that a CSV is a plain text file. So I'd imagine the handling of CSVs (text/csv) should probably be the same as the handling of any other plain text file (text/plain).

zeevt · 2021-12-08T19:00:31Z

CSV (and plain text) can be in any character encoding, it is not guaranteed that it's ASCII or UTF-8.
If just .decode() works, great.
If there is a header like Content-Type: text/csv; charset=windows-1252 (there usually isn't) then it can be decoded to str without relying on default charset.
But if neither of those is true, then you have bytes that should be base64 encoded.

javulticat · 2021-12-08T21:02:45Z

@zeevt that's my understanding as well, so I believe it would be fair to say that this is a potential concern for any text file, correct? I was confused because @Quidge's message implied this was a concern specific to CSVs (which I definitely may have just misinterpreted - sorry!)

If so, would a solution be to sniff the encoding of plain text and use that to decode it using the proper encoding (and, if that fails, fall back on using base64-encoded bytes)? Something like:

import chardet

# If plain text
try:
    encoding = chardet.detect(response.data)["encoding"]
    decoded = response.data.decode(encoding)
except Exception:
    # base64 encode the bytes

?

zeevt · 2021-12-08T21:21:01Z

@javulticat You understood me correctly, yes. I have two concerns with this, now that I thought about it a tiny bit more:

Maybe someone is trying to return an http response with "content-type: text/foo" and with a body encoded with whatever encoding on purpose, and the proposed code would decode it and re-encode it in UTF-8 if chardet guesses correctly. This would be a bug. They should be able to ensure the exact bytes they intended are received by the HTTP client, without forcing application/octet-stream content type.
How fast is chardet? I would hesitate running it on every response without measuring first.

I now think it's only safe to decode bytes to str if it actually decodes as UTF-8.

monkut · 2022-08-12T09:36:17Z

Migrated to:
#1155

closing

…ata is binary (#1155) * 🔧 migrate #971 to lastest master * 🎨 run black/isort * ♻️ refactor to allow for other binary ignore types based on mimetype. (currently openapi schema can't be passed as text. * 🎨 run black/fix flake8 * 🔧 add EXCEPTION_HANDLER setting * 🐛 fix zappa_returndict["body"] assignment * 📝 add temp debug info * 🔥 delete unnecessary print statements * ♻️ Update comments and minor refactor for clarity * ♻️ refactor for ease of testing and clarity * 🎨 fix flake8 * ✨ add `additional_text_mimetypes` setting ✅ add testcases for additional_text_mimetypes handling * 🔧 Expand default text mimetypes mentioned in #1023 ♻️ define "DEFAULT_TEXT_MIMETYPES" and move to utilities.py * 🎨 run black/isort * 🎨 run black/isort * 🎨 remove unnecesasry comment (black now reformats code) 🎨 change commented lines to docstring for test app

… if data is binary (zappa#1155) * 🔧 migrate zappa#971 to lastest master * 🎨 run black/isort * ♻️ refactor to allow for other binary ignore types based on mimetype. (currently openapi schema can't be passed as text. * 🎨 run black/fix flake8 * 🔧 add EXCEPTION_HANDLER setting * 🐛 fix zappa_returndict["body"] assignment * 📝 add temp debug info * 🔥 delete unnecessary print statements * ♻️ Update comments and minor refactor for clarity * ♻️ refactor for ease of testing and clarity * 🎨 fix flake8 * ✨ add `additional_text_mimetypes` setting ✅ add testcases for additional_text_mimetypes handling * 🔧 Expand default text mimetypes mentioned in zappa#1023 ♻️ define "DEFAULT_TEXT_MIMETYPES" and move to utilities.py * 🎨 run black/isort * 🎨 run black/isort * 🎨 remove unnecesasry comment (black now reformats code) 🎨 change commented lines to docstring for test app

Quidge force-pushed the quidge/908_2 branch from a6c8443 to 9e99ee7 Compare April 29, 2021 14:24

Quidge changed the title ~~Write tests confirming Binary Support response behavior~~ Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary May 5, 2021

Quidge force-pushed the quidge/908_2 branch from 9e99ee7 to d0cbd16 Compare June 2, 2021 22:52

Quidge mentioned this pull request Jul 16, 2021

Becoming a member of the new Zappa organisation #948

Closed

javulticat reviewed Nov 12, 2021

View reviewed changes

tests/utils.py Outdated Show resolved Hide resolved

zappa/handler.py Outdated Show resolved Hide resolved

zappa/handler.py Show resolved Hide resolved

tests/test_handler.py Outdated Show resolved Hide resolved

Quidge force-pushed the quidge/908_2 branch 3 times, most recently from d073683 to 3501ea2 Compare November 29, 2021 00:25

Quidge and others added 2 commits November 28, 2021 19:31

Write tests confirming Binary Support response behavior

0125954

No functional change to the Zappa codebase is introduced by this commit. This area of the application was untested. Tests introduced to ensure new behavior discussed in zappa#908 does not cause a regression.

Quidge force-pushed the quidge/908_2 branch from 3501ea2 to eb2b50f Compare November 29, 2021 00:32

Quidge mentioned this pull request Dec 8, 2021

Add Compressed Text Support #1092

Closed

monkut added a commit that referenced this pull request Jul 28, 2022

🔧 migrate #971 to lastest master

f1881b6

monkut mentioned this pull request Jul 28, 2022

(#908) Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary #1155

Merged

monkut closed this Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary #971

Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary #971

Quidge commented Apr 28, 2021 •

edited

Loading

paulnicolet commented May 9, 2021

Quidge commented May 11, 2021

colinhoernig commented May 25, 2021

Quidge commented Jun 3, 2021

travnels commented Jun 9, 2021

travnels commented Jul 9, 2021

colinhoernig commented Sep 14, 2021

Quidge commented Nov 29, 2021 •

edited

Loading

coveralls commented Dec 6, 2021

Quidge commented Dec 7, 2021 •

edited

Loading

javulticat commented Dec 8, 2021

zeevt commented Dec 8, 2021

javulticat commented Dec 8, 2021

zeevt commented Dec 8, 2021

monkut commented Aug 12, 2022

Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary #971

Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary #971

Conversation

Quidge commented Apr 28, 2021 • edited Loading

Description

GitHub Issues

Checklist I found in the contributors documentation:

paulnicolet commented May 9, 2021

Quidge commented May 11, 2021

colinhoernig commented May 25, 2021

Quidge commented Jun 3, 2021

travnels commented Jun 9, 2021

travnels commented Jul 9, 2021

colinhoernig commented Sep 14, 2021

Quidge commented Nov 29, 2021 • edited Loading

coveralls commented Dec 6, 2021

Quidge commented Dec 7, 2021 • edited Loading

javulticat commented Dec 8, 2021

zeevt commented Dec 8, 2021

javulticat commented Dec 8, 2021

zeevt commented Dec 8, 2021

monkut commented Aug 12, 2022

Quidge commented Apr 28, 2021 •

edited

Loading

Quidge commented Nov 29, 2021 •

edited

Loading

Quidge commented Dec 7, 2021 •

edited

Loading