-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.8] bpo-43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. (GH-25595) #25726
Conversation
…e and tabs. (pythonGH-25595) * issue43882 - urllib.parse should sanitize urls containing ASCII newline and tabs. Co-authored-by: Gregory P. Smith <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> (cherry picked from commit 76cd81d) Co-authored-by: Senthil Kumaran <[email protected]>
@orsenthil: Status check is done, and it's a failure ❌ . |
Lib/urllib/parse.py
Outdated
@@ -443,6 +451,7 @@ def urlsplit(url, scheme='', allow_fragments=True): | |||
if '?' in url: | |||
url, query = url.split('?', 1) | |||
_checknetloc(netloc) | |||
url = _remove_unsafe_bytes_from_url(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to do this up front right after _coerce_args(url, ...)
(and should do the same in 3.9 and 3.10).
by this point in 3.8 we've potentially allowed characters to slip through into query and fragment of http: urls.
at a minumum if we weren't do this right after _coerce_args before looking in the cache, this needs to be done right before any splitting happens in this branch of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test case should be updated to include an invalid character in each of the five portions of the url.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowed characters to slip through into query and fragment
I had a doubt initially, and I thought (from practice) that newlines and tabs are percent-encoded when in query/fragment than they are removed upfront.
We will have find a spec that will unambiguously state what should be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've already got that unambiguous spec. https://url.spec.whatwg.org/#concept-basic-url-parser step 3. strip these three characters no matter what before the parsing state machine starts.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be put in the comfy chair! |
Note: this will miss 3.8.10 but as a security fix will be included in 3.8.11 later in the year. |
Lib/test/test_urlparse.py
Outdated
self.assertEqual(p.geturl(), "x-new-scheme://www.python.org/javascript:alert('msg')/?query=something#fragment") | ||
|
||
# Remove ASCII tabs and newlines from input as bytes, any scheme. | ||
url = b"x-new-scheme\t://www.python.org/java\nscript:\talert('msg\r\n')/?query\n=\tsomething#frag\nment" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated test cases to verify removal in all parts of the URL.
I have made the requested changes; please review again. |
Sorry, I can't merge this PR. Reason: |
This can be merged now, @ambv . Thank you. |
@ambv: Please replace |
Thanks @miss-islington for the PR, and @ambv for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7. |
Thanks! ✨ 🍰 ✨ |
…newline and tabs. (pythonGH-25595) (pythonGH-25726) Co-authored-by: Gregory P. Smith <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> (cherry picked from commit 76cd81d) Co-authored-by: Senthil Kumaran <[email protected]> Co-authored-by: Senthil Kumaran <[email protected]> (cherry picked from commit 515a7bc) Co-authored-by: Miss Islington (bot) <[email protected]>
GH-25923 is a backport of this pull request to the 3.7 branch. |
…newline and tabs. (pythonGH-25595) (pythonGH-25726) Co-authored-by: Gregory P. Smith <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]> (cherry picked from commit 76cd81d) Co-authored-by: Senthil Kumaran <[email protected]> Co-authored-by: Senthil Kumaran <[email protected]> (cherry picked from commit 515a7bc) Co-authored-by: Miss Islington (bot) <[email protected]>
GH-25924 is a backport of this pull request to the 3.6 branch. |
Co-authored-by: Gregory P. Smith [email protected]
Co-authored-by: Serhiy Storchaka [email protected]
(cherry picked from commit 76cd81d)
Co-authored-by: Senthil Kumaran [email protected]
https://bugs.python.org/issue43882