Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for HHVM #62

Open
h4cc opened this issue Mar 15, 2014 · 7 comments
Open

Support for HHVM #62

h4cc opened this issue Mar 15, 2014 · 7 comments

Comments

@h4cc
Copy link
Contributor

h4cc commented Mar 15, 2014

In case somebody else tries to add support for HHVM, this is the current state:

There is currently a problem with some test with unicode caracters:
https://travis-ci.org/h4cc/sql-formatter/jobs/20499499
Cause is the htmlentities function, which does not encode unicode characters with corresponding entities:
facebook/hhvm#1875

A solution could be replacing htmlentities with htmlspecialchars.
But even there is a small bug, because using ENT_IGNORE will simple skip all non ASCII chars:
facebook/hhvm#2071

Example:

<?php
$s = 'èéæó';
var_dump(
    $s,
    htmlspecialchars($s, ENT_COMPAT, 'UTF-8'),
    htmlspecialchars($s, ENT_COMPAT | ENT_IGNORE, 'UTF-8'),
    htmlspecialchars($s, ENT_COMPAT | ENT_SUBSTITUTE, 'UTF-8')
);

Output:

PHP
string(10) "èéæó"
string(10) "èéæó"
string(10) "èéæó"
string(10) "èéæó"
HHVM
string(10) "èéæó"
string(10) "èéæó"
string(0) ""
string(10) "èéæó"

But this should be solveable by switching to ENT_SUBSTITUTE.

@jdorn What do you think? Waiting for a HHVM fix or using htmlspecialchars with ENT_SUBSTITUTE?

@jdorn
Copy link
Owner

jdorn commented Mar 15, 2014

I'm fine adding a check for ENT_SUBSTITUTE support and preferring that over ENT_IGNORE. There's already a check at https://github.com/jdorn/sql-formatter/blob/master/lib/SqlFormatter.php#L867 so adding another if condition isn't a big deal.

@h4cc
Copy link
Contributor Author

h4cc commented Mar 15, 2014

I am not sure if ENT_SUBSTITUTE will be a usefull parameter. If there is much binary data in the query, it will be encoded instead of skipped.

What about switching to htmlspecialchars? It will not output "Ã" instead of "Ã "what htmlentities does.

@jdorn
Copy link
Owner

jdorn commented Mar 15, 2014

ok, so to fix this, we would need to switch to htmlspecialchars and add an extra if statement to check for ENT_SUBSTITUTE support?

I'm not sure I would do this just for a HHVM bug, which will probably be fixed soon. Would this provide any benefit to normal PHP usage? I don't know too much about character encodings.

@h4cc
Copy link
Contributor Author

h4cc commented Mar 15, 2014

I dont think there is a quick but lasting fix here.
HHVMs htmlentites does not return HTML-Entites for UTF8 inputs, because HHVM is based on UTF8, not Latin1.
So basically HHVM's htmlentities with UTF8 behaves like the default htmlspecialchars, except for the Bug with the ENT_IGNORE flag.

There is also a hint we should look at here:
http://www.php.net/manual/en/function.htmlentities.php

ENT_IGNORE  Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it » may have security implications. 

Summarized, we could leave everything as is and skip HHVM testing.
Or we could follow the overall recommondations and stop creating HTML-Entities for UTF8 chars. If we do so, we have to change some some tests and can get HHVM testing.

My opinion is, to switch to 'htmlspecialchars' and release this change as a 1.3.* version.

@garyemiller
Copy link

HHVMs htmlentites does not return HTML-Entites for UTF8 inputs, because HHVM is based on UTF8, not Latin1.

Lost me there. Does not matter UTF8 or Latin1, you do not want to emit a naked & in HTML because that is invalid HTML. Ditto < and >.

Per W3C:
http://www.w3.org/International/questions/qa-escapes#use

Sure only htmlspecialchars() is sufficient for valid html, but a lot of programmers use htmlentities() to make their HTML more robust and readable, even in a UTF8 environment.

Sadly, since htmlspecialchars() is currently broken in hhvm using it instead of htmlentitities() still leaves brokenness.

Regardless of the arguments pro or con, the goal should be to duplicate existing PHP behavior, not to rethink it.

@h4cc
Copy link
Contributor Author

h4cc commented Mar 20, 2014

In the end it is not critical, because this library works on HHVM as expected. Just the tests will not run exactly the same because of this one different behaviour. Lets see what the future brings.

Lost me there. Does not matter UTF8 or Latin1, you do not want to emit a naked & in HTML because that is invalid HTML. Ditto < and >.

HHVM will create html entites for "<>&" as intended. It will just not create html entites for "Ãà...".

@garyemiller
Copy link

In the end it is not critical, because this library works on HHVM as expected. 

Uh, no. I expect it to work as in the PHP documentation. See HHVM bug #2061

HHVM will create html entites for "<>&" as intended. 

Uh, no, see HHVM bug #2064

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants