Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.8] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) #25726

Merged
merged 5 commits into from
May 5, 2021

Conversation

miss-islington
Copy link
Contributor

@miss-islington miss-islington commented Apr 29, 2021

  • issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs.

Co-authored-by: Gregory P. Smith [email protected]
Co-authored-by: Serhiy Storchaka [email protected]
(cherry picked from commit 76cd81d)

Co-authored-by: Senthil Kumaran [email protected]

https://bugs.python.org/issue43882

…e and tabs. (pythonGH-25595)

* issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs.

Co-authored-by: Gregory P. Smith <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
(cherry picked from commit 76cd81d)

Co-authored-by: Senthil Kumaran <[email protected]>
@miss-islington
Copy link
Contributor Author

@orsenthil: Status check is done, and it's a failure ❌ .

@@ -443,6 +451,7 @@ def urlsplit(url, scheme='', allow_fragments=True):
if '?' in url:
url, query = url.split('?', 1)
_checknetloc(netloc)
url = _remove_unsafe_bytes_from_url(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to do this up front right after _coerce_args(url, ...) (and should do the same in 3.9 and 3.10).

by this point in 3.8 we've potentially allowed characters to slip through into query and fragment of http: urls.

at a minumum if we weren't do this right after _coerce_args before looking in the cache, this needs to be done right before any splitting happens in this branch of code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test case should be updated to include an invalid character in each of the five portions of the url.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowed characters to slip through into query and fragment

I had a doubt initially, and I thought (from practice) that newlines and tabs are percent-encoded when in query/fragment than they are removed upfront.

We will have find a spec that will unambiguously state what should be done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already got that unambiguous spec. https://url.spec.whatwg.org/#concept-basic-url-parser step 3. strip these three characters no matter what before the parsing state machine starts.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@ambv
Copy link
Contributor

ambv commented May 3, 2021

Note: this will miss 3.8.10 but as a security fix will be included in 3.8.11 later in the year.

@orsenthil orsenthil requested a review from gpshead May 3, 2021 13:08
self.assertEqual(p.geturl(), "x-new-scheme://www.python.org/javascript:alert('msg')/?query=something#fragment")

# Remove ASCII tabs and newlines from input as bytes, any scheme.
url = b"x-new-scheme\t://www.python.org/java\nscript:\talert('msg\r\n')/?query\n=\tsomething#frag\nment"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated test cases to verify removal in all parts of the URL.

@orsenthil
Copy link
Member

I have made the requested changes; please review again.

@miss-islington
Copy link
Contributor Author

Sorry, I can't merge this PR. Reason: You're not authorized to push to this branch. Visit https://docs.github.com/articles/about-protected-branches/ for more information..

@orsenthil
Copy link
Member

This can be merged now, @ambv . Thank you.

@ambv ambv merged commit 515a7bc into python:3.8 May 5, 2021
@bedevere-bot
Copy link

@ambv: Please replace # with GH- in the commit message next time. Thanks!

@miss-islington
Copy link
Contributor Author

Thanks @miss-islington for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

@miss-islington miss-islington deleted the backport-76cd81d-3.8 branch May 5, 2021 17:25
@ambv
Copy link
Contributor

ambv commented May 5, 2021

Thanks! ✨ 🍰 ✨

miss-islington added a commit to miss-islington/cpython that referenced this pull request May 5, 2021
…newline and tabs. (pythonGH-25595) (pythonGH-25726)

Co-authored-by: Gregory P. Smith <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
(cherry picked from commit 76cd81d)
Co-authored-by: Senthil Kumaran <[email protected]>
Co-authored-by: Senthil Kumaran <[email protected]>
(cherry picked from commit 515a7bc)

Co-authored-by: Miss Islington (bot) <[email protected]>
@bedevere-bot
Copy link

GH-25923 is a backport of this pull request to the 3.7 branch.

miss-islington added a commit to miss-islington/cpython that referenced this pull request May 5, 2021
…newline and tabs. (pythonGH-25595) (pythonGH-25726)

Co-authored-by: Gregory P. Smith <[email protected]>
Co-authored-by: Serhiy Storchaka <[email protected]>
(cherry picked from commit 76cd81d)
Co-authored-by: Senthil Kumaran <[email protected]>
Co-authored-by: Senthil Kumaran <[email protected]>
(cherry picked from commit 515a7bc)

Co-authored-by: Miss Islington (bot) <[email protected]>
@bedevere-bot
Copy link

GH-25924 is a backport of this pull request to the 3.6 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants