Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix URL parameter encoding #102

Merged

Conversation

waderobson
Copy link
Contributor

I ran into this when using swaggerui to test my lambda. The swaggerui was urlencode params prior to sending them. In our exisisting setup, this was getting handled, I could provide encoded or not and it worked.

setup.cfg Outdated Show resolved Hide resolved
src/apig_wsgi.py Outdated
@@ -55,7 +55,8 @@ def get_environ(event, binary_support):
else:
body = body.encode("utf-8")
params = event.get("queryStringParameters") or {}

# decoding first to prevent double encoding
params = {k: unquote(v) for k, v in params.items()}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're only unquoting values - did you verify the issue doesn't exist in keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that you would hopefully never have an application that has keys like that :). Probably not a safe assumption though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A URL like example.com/?+123 should have a key with an empty string as the value

@waderobson
Copy link
Contributor Author

I started to test more encoding and looks like there's also an issue with spaces.
Since spaces can be encoded as (+) or (%20) there's no way to be sure that someone is sending a ( ) or a (+).
Just blindly decoding everything could lead to problems for someone. Not sure how I should proceed, might just want to close this for now and reopen if I find time to come up with a reasonable solution.

Interested to hear your thoughts on how this could be handled? I'll have a look at how others are handling this (maybe they aren't)

@adamchainz
Copy link
Owner

I'll have a look at how others are handling this (maybe they aren't)

They definitely are in other WSGI servers, like wsgiref or gunicorn. If you refer to the WSGI specification that should also give a clue.

@waderobson
Copy link
Contributor Author

After a bit of investigation, I think using urlencode might be too heavy-handed, really you just need to join everything and let WSGI figure it out.
This change is working for me, I can send:

/?stopped=2018-06-13T19:07:44.716000Z

or

/?stopped=2018-06-13T19%3A07%3A44.716000Z

with no issues. This is testing my specific use case.
No idea if this will work or present issues for others. If that's a concern maybe we could enable a "raw" encoding mode?

@adamchainz adamchainz changed the title Decode before encode Fix URL parameter encoding Jan 13, 2020
@adamchainz adamchainz merged commit 42d5510 into adamchainz:master Jan 13, 2020
adamchainz added a commit that referenced this pull request Jan 13, 2020
adamchainz added a commit that referenced this pull request Jan 13, 2020
@adamchainz
Copy link
Owner

I've done some testing with other wsgi servers, specifically Django's runserver and gunicorn. This corresponds to what I've seen there.

Merged, added history note and split tests in #102, and release in version 2.4.1: https://pypi.org/project/apig-wsgi/2.4.1/

Enjoy!

@falloutcoder
Copy link

I started to test more encoding and looks like there's also an issue with spaces.

Yup, this update actually breaks Flask (werkzeug) app as it replaces + with spaces here.

>>> from werkzeug import urls

# If encoded QS is passed, it works fine
>>> urls.url_decode(b"foo=42&bar=SvkdZgvqjm0M%2B%2BvA%3D%3D")
MultiDict([('foo', '42'), ('bar', 'SvkdZgvqjm0M++vA==')])

# causes issues if decoded QS is passed containing `+`
>>> urls.url_decode(b"foo=42&bar=r+y4jfeSvkdZgvqj+m0M++vA==")
MultiDict([('foo', '42'), ('bar', 'r y4jfeSvkdZgvqj m0M  vA==')])

@waderobson
Copy link
Contributor Author

I'm using Flask and I can assure you it is not broken. The problem is the ambiguity around what a + is.
Your better off using %2B for + if you want a literal +.
The problem I was having and that is now addressed with this change was re-encoding %
For example, your encoded QS would be encoded again.

In [2]: from werkzeug.urls import url_decode

In [3]: url_decode(b'foo=SvkdZgvqjm0M%252B%252BvA%253D%253D')
Out[3]: MultiDict([('foo', 'SvkdZgvqjm0M%2B%2BvA%3D%3D')])

I much prefer having the option to send encoded

@waderobson waderobson deleted the fix-double-urlencoding branch January 27, 2020 21:16
@falloutcoder
Copy link

Just sharing my scenerio here to give more details of the issue. In query string i am passing pagination token (automatically generated, can contain literal +). Following is the flow which results into issue:

  1. Browser send the encoded query string (+ is encoded to %20)
  2. API gateway receives the request and automatically decodes the query string (%20 is decoded back to +). I think this behavior is default, cannot disable automatic query string decoding.
  3. API Gatway forwards the decoded request to Lamda
  4. Lambda uses apig-wsgi which is now (after this update) passing query string params as is to WSGI app
  5. WSGI apps (checked for werkzeug and django WSGI) always decodes the query string params with unquote plus which replaces + with spaces. Which makes the pagination token invalid as literal + is replaced with spaces.

This was not happening before this update because in step 4, query string params were encoded (+ was being sent as %20) before sending to wsgi app where it would decode with unquote plus properly.

Should we add a flag to enable/disable query string encoding in make_lambda_handler to handle different needs? I can push up a PR if that sounds right approach. Looking forward to your inputs. Thanks!!

@adamchainz
Copy link
Owner

I am not sure steps 1 and 2 are true. Using an example Django app I have seen the browser always sends + as a +, never encoding it. Therefore it seems API Gateway isn't touching that.

In fact in my demo app, + is always decoded by Django's request.GET to a ' ', no matter how I rejig things

Can you share a demo app using whatever framework you're using that replicates the behaviour you want @falloutcoder ?

@adamchainz
Copy link
Owner

I added an example Django app with Ansible deployment in #110. It confirms to me that current behaviour is correct.

Here's the app locally using Django's runserver reporting what it received for various encodings of space:

Screenshot 2020-02-17 at 07 27 51

And on API Gateway:

Screenshot 2020-02-17 at 07 27 17

An interesting learning is that API Gateway, unlike most webservers, returns a 400 for badly formatted parameters like a double %:

Screenshot 2020-02-17 at 07 34 55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants