Skip to content

Conversation

@narfbg
Copy link
Contributor

@narfbg narfbg commented Jul 12, 2014

Quoting the issue description:

Do they affect us? There are $string[$index] stuff going on in this code, so maybe it could?

Yes, you are affected, given that mbstring.func_overload is enabled in php.ini and a multibyte chacater set is used. If that condition is present, then strlen(), substr() (and a few other string functions) are effectively replaced by their mb_*() equivalents, using the mbstring character set.

This patch fixes the issue.

Note: There's no point in checking for substr(...) === FALSE, as FALSE is only returned if you pass parameters that are invalid (such as an array for the input string), but I recall you giving me the opposite advice, so I've left that out until you convince yourself. :)

@defuse
Copy link
Owner

defuse commented Jul 13, 2014

Thanks. I've added mbstring.func_overload to my mental list of things that will break crypto in PHP.

My intent is for this library to process strings as though they are byte arrays. If I understand this patch correctly, then if a $str is 5 UTF-16 characters, mb_strlen($str, '8bit') will return 5*2=10, and self::substr($str, 0, 3) would return a string that's 3 bytes: the two bytes for the first character, and half of the second character.

Is that right?

Are multibyte strings different "types" in PHP? Or are they just strings of bytes, and how those bytes are interpreted depends on which functions you pass them to?

I also use the array syntax, such as in the line $pad = ord($plaintext[strlen($plaintext) - 1]);. What does that do when it's a multibyte string (or mbstring.func_overload is enabled).

(Don't feel obligated to actually answer those questions, I should probably do the research myself).

@defuse
Copy link
Owner

defuse commented Jul 13, 2014

Just so that it's documented here, my motivation for checking substr(...) === FALSE is that if I make some modifications that start passing bad arguments to it, I want it to fail immediately and obviously before I even think about committing the new code, rather than relying on the unit tests to catch the incorrect behavior that results (they might not if an empty string was the expected result).

So I'm trading off extra lines of code for the "if-i-fuck-up-in-a-future-change-it-will-be-obvious-right-away" property.

@narfbg
Copy link
Contributor Author

narfbg commented Jul 13, 2014

My intent is for this library to process strings as though they are byte arrays. If I understand this patch correctly, then if a $str is 5 UTF-16 characters, mb_strlen($str, '8bit') will return 5*2=10, and self::substr($str, 0, 3) would return a string that's 3 bytes: the two bytes for the first character, and half of the second character.

Is that right?

Precisely. :)
Although, for UTF-16 a single character can consist of more than 2 bytes, but that's irrelevant for our discussion.

Are multibyte strings different "types" in PHP? Or are they just strings of bytes, and how those bytes are interpreted depends on which functions you pass them to?

They are just streams of bytes that are looked at just as 8-bit sequences, regardless of whether that produces a printable character or not. strlen() and substr() were thus originally byte-safe.

With mbstring, I guess it's also fairly simple - if N 8-bit sequences form a valid character under the currently used charset, they will together be counted as a single character.

This is super-cool for dealing with regular strings now that UTF-8 is everywhere, but as it has often happened with PHP, somebody thought that it would be a good idea to optionally override the byte-safe functions with their mbstring equivalents, and here we are ... :)

I also use the array syntax, such as in the line $pad = ord($plaintext[strlen($plaintext) - 1]);.
What does that do when it's a multibyte string (or mbstring.func_overload is enabled).

mbstring.func_overload doesn't have effect on the array syntax, but it's the strlen() inside it that would produce a wrong index.

@defuse
Copy link
Owner

defuse commented Jul 15, 2014

I'm considering just checking if mbstring.func_overload is enabled, and if so, refusing to run. Because, if it is, then the crypto library will be secure, but the user's handling of the key and ciphertext probably won't be.

@defuse
Copy link
Owner

defuse commented Jul 15, 2014

First I'm going to pull this into a branch and develop tests for both settings of mbstring.func_overload.

@defuse
Copy link
Owner

defuse commented Jul 15, 2014

Okay, I pulled this into the multibyte branch.

@defuse
Copy link
Owner

defuse commented Jul 15, 2014

Ugh, so now strlen can return false. -.-

@defuse
Copy link
Owner

defuse commented Jul 15, 2014

@narfbg I made some changes to your changes, could you review them quickly? c803f32...multibyte

@defuse
Copy link
Owner

defuse commented Jul 15, 2014

This is a lot of added complexity just for an edge case that's arguably unreasonable and impossible to secure. We're giving all users of this library more risk in order to give an extremely small subset of the users slightly less risk under an insecure configuration. The best option is therefore to make it not work for that small subset (warning them that their configuration is insecure), and keep the complexity out for users who do have a normal configuration.

@narfbg
Copy link
Contributor Author

narfbg commented Jul 15, 2014

You haven't changed the logic - it's fine in that regard, but man .. you are making your life harder with these changes. :) The PHP closing tag in particular can cause nothing but trouble - if you accidentally leave some character behind it (space included IIRC), it's regular output.

I don't think it's unreasonable to work around that edge case though. With this patch, you're really checking the byte length of the key, so as far as your library is concerned, the only dangerous outside factor is gone. It's a tricky issue as you've noticed, but still manageable.

However, if you do decide to exclude mbstring.func_overload users from your user base, make sure to check that the mbstring extension is indeed loaded. That INI setting could be just sitting there with mbstring support disabled at the same time.

@defuse
Copy link
Owner

defuse commented Jul 16, 2014

@narfbg Thanks for looking!

I created #24 to decide on the closing tag. I will probably end up removing it.

Yeah, so the argument against excluding those users is that it's hard to reliably exclude them. Do we exclude users who set mbstring.func_overload to any non-zero value, or just one that causes the string functions to be overloaded? Excluding the right users is also complex.

So I'm still undecided.

@defuse
Copy link
Owner

defuse commented Jul 16, 2014

Okay, I decided to support mbstring.func_overload. I'll merge the branch after I've made sure the testing is adequate.

@narfbg
Copy link
Contributor Author

narfbg commented Jul 16, 2014

A few assertions using a multibyte string should be sufficient, you can lift them from my library if you wish. :)

@defuse defuse merged commit c803f32 into defuse:master Jul 29, 2014
@defuse
Copy link
Owner

defuse commented Jul 29, 2014

@narfbg Thanks for your contributions!

@narfbg
Copy link
Contributor Author

narfbg commented Jul 29, 2014

My pleasure, I'm simply giving back.

@larcho
Copy link

larcho commented Aug 24, 2015

Is it possible that this isn't working in PHP running on Windows? I'm getting an "mb_strlen() failed" exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants