-
Notifications
You must be signed in to change notification settings - Fork 311
Fix issue #13 (multibyte strings) #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks. I've added My intent is for this library to process strings as though they are byte arrays. If I understand this patch correctly, then if a Is that right? Are multibyte strings different "types" in PHP? Or are they just strings of bytes, and how those bytes are interpreted depends on which functions you pass them to? I also use the array syntax, such as in the line (Don't feel obligated to actually answer those questions, I should probably do the research myself). |
|
Just so that it's documented here, my motivation for checking So I'm trading off extra lines of code for the "if-i-fuck-up-in-a-future-change-it-will-be-obvious-right-away" property. |
Precisely. :)
They are just streams of bytes that are looked at just as 8-bit sequences, regardless of whether that produces a printable character or not. With mbstring, I guess it's also fairly simple - if N 8-bit sequences form a valid character under the currently used charset, they will together be counted as a single character. This is super-cool for dealing with regular strings now that UTF-8 is everywhere, but as it has often happened with PHP, somebody thought that it would be a good idea to optionally override the byte-safe functions with their mbstring equivalents, and here we are ... :)
|
|
I'm considering just checking if |
|
First I'm going to pull this into a branch and develop tests for both settings of |
|
Okay, I pulled this into the multibyte branch. |
|
Ugh, so now |
|
@narfbg I made some changes to your changes, could you review them quickly? c803f32...multibyte |
|
This is a lot of added complexity just for an edge case that's arguably unreasonable and impossible to secure. We're giving all users of this library more risk in order to give an extremely small subset of the users slightly less risk under an insecure configuration. The best option is therefore to make it not work for that small subset (warning them that their configuration is insecure), and keep the complexity out for users who do have a normal configuration. |
|
You haven't changed the logic - it's fine in that regard, but man .. you are making your life harder with these changes. :) The PHP closing tag in particular can cause nothing but trouble - if you accidentally leave some character behind it (space included IIRC), it's regular output. I don't think it's unreasonable to work around that edge case though. With this patch, you're really checking the byte length of the key, so as far as your library is concerned, the only dangerous outside factor is gone. It's a tricky issue as you've noticed, but still manageable. However, if you do decide to exclude |
|
@narfbg Thanks for looking! I created #24 to decide on the closing tag. I will probably end up removing it. Yeah, so the argument against excluding those users is that it's hard to reliably exclude them. Do we exclude users who set So I'm still undecided. |
|
Okay, I decided to support |
|
A few assertions using a multibyte string should be sufficient, you can lift them from my library if you wish. :) |
|
@narfbg Thanks for your contributions! |
|
My pleasure, I'm simply giving back. |
|
Is it possible that this isn't working in PHP running on Windows? I'm getting an "mb_strlen() failed" exception. |
Quoting the issue description:
Yes, you are affected, given that
mbstring.func_overloadis enabled in php.ini and a multibyte chacater set is used. If that condition is present, thenstrlen(),substr()(and a few other string functions) are effectively replaced by theirmb_*()equivalents, using the mbstring character set.This patch fixes the issue.
Note: There's no point in checking for
substr(...) === FALSE, as FALSE is only returned if you pass parameters that are invalid (such as an array for the input string), but I recall you giving me the opposite advice, so I've left that out until you convince yourself. :)