Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes bug in rfc2616 #3.6.1 implementation. #236

Merged
merged 2 commits into from
Sep 18, 2016
Merged

Fixes bug in rfc2616 #3.6.1 implementation. #236

merged 2 commits into from
Sep 18, 2016

Conversation

stephenharris
Copy link
Contributor

A chunk size must be followed by either optional chunk extension(s) (which begin with a semicolon) or \r\n. The current implementation just allows anything from the chunk size to the \r\n.

This was causing a bug in my Events plugin: a Google iCal feed was sometimes mistaken for being chunked (Google claims it is in the header: it isn't).

The starting line is

BEGIN:VCALENDAR

which is mistakenly interpreted as a chunk header with size BE. If, it just so happens the rest of the feed is interpreted as chunked you end up with a mutilated iCal feed.

Of course, Google shouldn't claim it is chunked, but the current implementation fails to detect that it is not a valid encoding.

A chunk size (hexidecimal) must be followed by either an optional chunk
extension (which begins with a semicolon) or \r\n.

@link https://tools.ietf.org/html/rfc2616#section-3.6.1
@stephenharris
Copy link
Contributor Author

I've updated the test to provide an example of a false-positive that were were previously missing.

I've also added further test (data) for correct decoding of chunks with chunk extensions.

@codecov-io
Copy link

codecov-io commented Aug 26, 2016

Current coverage is 92.22% (diff: 100%)

Merging #236 into master will not change coverage

@@             master       #236   diff @@
==========================================
  Files            21         21          
  Lines          1762       1762          
  Methods         156        156          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           1625       1625          
  Misses          137        137          
  Partials          0          0          

Powered by Codecov. Last update fb5b517...4041e0a

@@ -39,7 +43,7 @@ public function testChunked($body, $expected){
*/
public function testNotActuallyChunked() {
$transport = new MockTransport();
$transport->body = 'Hello! This is a non-chunked response!';
$transport->body = "Believe me\r\nthis looks chunked, but it isn't.";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be split into a separate test please? Ideally, it should look more like a chunked response (i.e. 2Anotchunked\r\n...) so it's obvious what it's testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a data provider be suitable? The test testNotActuallyChunked() hasn't really changed, just the example's we're feeding it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, works for me :)

@rmccue
Copy link
Collaborator

rmccue commented Aug 30, 2016

Great catch :) One small thing to fix in the tests, but otherwise looks good.

@@ -15,6 +15,10 @@ public static function chunkedProvider() {
"02\r\nab\r\n04\r\nra\nc\r\n06\r\nadabra\r\n0c\r\n\nall we got\n",
"abra\ncadabra\nall we got\n"
),
array(
"02;foo=bar;hello=world\r\nab\r\n04;foo=baz\r\nra\nc\r\n06;justfoo\r\nadabra\r\n0c\r\n\nall we got\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OWS (optional whitespace) allowed here too? Should add a test for that if so. Ditto if params can take quoted values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC2616 states:

chunk-ext-name = token
chunk-ext-val = token | quoted-string

Token cannot contain any white space, or any of the following: (}|<>@,;:\"/[]?={}. Quoted string can contain white space.

What we have at the moment is still a relatively low-bar for what is interpreted of correctly encoded. I'm wary about making it too strict as a misunderstanding here, when it's too strict, is more likely to lead to issues in production.

Happy to add the above restrictions with some additional tests

…?=. Chunk extension values can, provided they are quoted.

Ref #236
@@ -749,15 +749,17 @@ public static function parse_multiple(&$response, $request) {
* @return string Decoded body
*/
protected static function decode_chunked($data) {
if (!preg_match('/^([0-9a-f]+)(?:;[^\r\n]*)*\r\n/i', trim($data))) {
if (!preg_match('/^([0-9a-f]+)(?:;(?:[\w-]*)(?:=(?:(?:[\w-]*)*|"(?:[^\r\n])*"))?)*\r\n/i', trim($data))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that [\w-]* isn't too strict for a token. I would be tempted to err on the side of caution and stick with the original patch and have a slightly lower bar for what constitutes a valid encoding.

@rmccue rmccue merged commit ea359d3 into WordPress:master Sep 18, 2016
@rmccue rmccue modified the milestones: 1.7, 1.8 Sep 18, 2016
@rmccue
Copy link
Collaborator

rmccue commented Sep 18, 2016

Turns out the tests failing here were just a transient issue on a HTTPS connection; restarted the failing build and everything passed nicely. Thanks!

@westi
Copy link
Member

westi commented Sep 23, 2016

a Google iCal feed was sometimes mistaken for being chunked (Google claims it is in the header: it isn't).

Google isn't lying and this change doesn't really fix the issue it just fixes a symptom.

Depending on the underlying transport you may get the chunks already decoded which means chunked decoding can't be keyed directly off the header.

By default curl decodes chunks internally and so Requests shouldn't be trying to decode chunks again because it will potentially damage the transferred content.

You can see this with the curl cli by using the --raw option to disable the decoding when looking at the Google iCal feeds.

@rmccue
Copy link
Collaborator

rmccue commented Sep 23, 2016

@westi Yup, I figured this was the case, but the PR fixes it regardless.

@westi
Copy link
Member

westi commented Sep 23, 2016

Except... if the data being transferred "looks" like chunked encoded data it will still mangle it... and we know we don't need to decode so running a bunch of Regex is wasteful

@rmccue
Copy link
Collaborator

rmccue commented Sep 23, 2016

I don't disagree, so happy to accept a PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants