-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-39503: CVE-2020-8492: Fix AbstractBasicAuthHandler #18284
Conversation
This fix looks good to me! |
Lib/urllib/request.py
Outdated
@@ -937,7 +937,7 @@ class AbstractBasicAuthHandler: | |||
|
|||
# allow for double- and single-quoted realm values | |||
# (single quotes are a violation of the RFC, but appear in the wild) | |||
rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+' | |||
rx = re.compile('(?:[^,]*,)*[ \t]*([^ \t]+)[ \t]+' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(?:.*,)*
is equivalent to (?:.*,)?
.
But since this regular expresion is only used with search()
. (?:.*,)*[ \t]*
can be removed at all.
I'll analyze whether it is correct or there is an error in the regular expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I am cannot say that I completely understand the code, but to give it some sense we can either
- Replace
rx.search()
withrx.match()
and replace(?:.*,)*
with(?:.*,)?
.
or
- Keep
rx.search()
and replace(?:.*,)*
with(?:^|,)
.
Do not keep (?:[^,]*,)*
. It is a waster of resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, options 1 and 2 are not equivalent if the field value contains more than one challenge. Option 2 is closer to the current behavior. But correct support of more than one challenge need rewriting the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current patch seems to give an O(n^3) time to evaluate - much better than O(2^n), but still very slow - with 2000 commas it takes about a minute to evalute. With 65000 it takes much, much longer. Testing using the code from here gave the following (commas, seconds) values:
[(100, 0.124), (250, 0.261), (500, 0.923), (750, 2.85), (1000, 6.433), (1250, 12.608), (1500, 21.576), (2000, 50.751)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong, the first option is equivalent to the current behavior (returns the last realm).
Would be nice to add a test. |
Just a heads up, CVE-2020-8492 has been created. I'm not sure how Python CVEs are generally tracked, but it may be useful to include the information on the bug tracker issue 👍 |
Not only
We can either simplify the regex to prevent the "catastrophic backtracking" or even remove the prefix. UPDATE: Oops, my example was wrong, I fixed it :-) |
Does this also fix https://bugs.python.org/issue38826 ? |
It seems to me that the >>> header = 'basic realm="1", x, other realm="2"'
>>> re.search("(?:.*,)*[ \t]*([^ \t]+)[ \t]+", header).group(1)
'other'
>>> re.search("[ \t]*([^ \t]+)[ \t]+", header).group(1)
'basic'
>>> I don't see a way to fix this by just changing the regex while preserving the previous behavior. Then again, corner cases of the previous behavior might be wrong. |
I rebased my PR and added more tests. |
@serhiy-storchaka: I don't understand if you consider that the fix is wrong or that the fix is not enough (it remains possible to create a denial of service)? |
@vstinner your fix helps, but we can do better. It has cubic complexity, my suggestion has quadratic complexity. It is possible to implement an algorithm with linear complexity, but not with such small changes. |
I also added some tests and implemented a simpler complexity regex - see master...davidfraser:urllib_basic_auth_regex |
It's worth seeing how the results of this regex are actually used Note my comment in f79379c: Note that the original regex was roughly O(2**n) |
WWW-Authenticate is badly specified. The RFC doesn't specify if a single HTTP header can contain multiple challenges. I found these resources:
A variant is to have multiple WWW-Authenticate: one challenge per WWW-Authenticate header. By the way, AbstractBasicAuthHandler code contains this interesting comment:
Current behavior:
For example, |
Sorry, I misunderstood this proposition. In fact, I proposed something similar except that I missed the "start of the string" (regex I decided to write a way more complex change to not only fix the vulnerability, but also fix the parser since it didn't look possible to fix the regex without changing the behavior. Currently, the code uses the last realm if there are multiple challenges per header. I fixed this behavior to use the realm of the first Basic challenge. I also modified the code to support multiple headers, except of only parsing the first one. |
Ok, the PR is now ready for a new round of reviews. I fixed the vulnerability but I also changed the code to parse all WWW-Authenticate HTTP Headers and accept multiple challenges per header. |
The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge. Co-Authored-By: Serhiy Storchaka <[email protected]>
Oh, Ben Caller reported the issue at 2019-11-17: https://bugs.python.org/issue38826 |
cc @orsenthil |
Thanks, @vstinner - This was a good detection and change. I am reviewing the patch further. |
basic = f'Basic realm="{realm}"' | ||
basic2 = f'Basic realm="{realm2}"' | ||
other_no_realm = 'Otherscheme xxx' | ||
digest = (f'Digest realm="{realm2}", ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the motivation of adding a digest
test case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to write a more realistic test than "Otherscheme xxx". The Digest challenge uses "realm" and the test ensures that it's skipped. It uses use multiple fields separated by commas, some use quotes: test that the parser handles that properly.
I picked the example from Wikipedia :-) https://fr.wikipedia.org/wiki/Authentification_HTTP#Demande_d'identification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect. sounds good to me. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vstinner
Thanks for the patch and the change. The entire addition is meaningful and looks good to me.
- The change of regex only the security issue keeps the patch simple.
I only had a question in the tests, but it is not a critical question. I thought, I am missing something by adding of "digest" test case in line 1469, when the multiple headers challenge is covered again in the same test case at line 1499.
LGTM.
The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge. Co-Authored-By: Serhiy Storchaka <[email protected]> (cherry picked from commit 0b297d4) Co-authored-by: Victor Stinner <[email protected]>
GH-19292 is a backport of this pull request to the 3.7 branch. |
|
Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8. |
Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.7. |
The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge. Co-Authored-By: Serhiy Storchaka <[email protected]> (cherry picked from commit 0b297d4) Co-authored-by: Victor Stinner <[email protected]>
GH-19296 is a backport of this pull request to the 3.8 branch. |
The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge. Co-Authored-By: Serhiy Storchaka <[email protected]> (cherry picked from commit 0b297d4) Co-authored-by: Victor Stinner <[email protected]>
GH-19297 is a backport of this pull request to the 3.7 branch. |
…-19296) The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge. Co-Authored-By: Serhiy Storchaka <[email protected]> Co-authored-by: Victor Stinner <[email protected]> (cherry picked from commit 0b297d4)
…-19297) The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge. Co-Authored-By: Serhiy Storchaka <[email protected]> Co-authored-by: Victor Stinner <[email protected]> (cherry picked from commit 0b297d4)
…-19304) The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge. Co-Authored-By: Serhiy Storchaka <[email protected]> (cherry picked from commit 0b297d4)
…9305) The AbstractBasicAuthHandler class of the urllib.request module uses an inefficient regular expression which can be exploited by an attacker to cause a denial of service. Fix the regex to prevent the catastrophic backtracking. Vulnerability reported by Ben Caller and Matt Schwager. AbstractBasicAuthHandler of urllib.request now parses all WWW-Authenticate HTTP headers and accepts multiple challenges per header: use the realm of the first Basic challenge.
The AbstractBasicAuthHandler class of the urllib.request module uses
an inefficient regular expression which can be exploited by an
attacker to cause a denial of service. Fix the regex to prevent the
catastrophic backtracking.
Vulnerability reported by Matt Schwager.
https://bugs.python.org/issue39503