Support for HHVM #62

h4cc · 2014-03-15T12:35:08Z

In case somebody else tries to add support for HHVM, this is the current state:

There is currently a problem with some test with unicode caracters:
https://travis-ci.org/h4cc/sql-formatter/jobs/20499499
Cause is the htmlentities function, which does not encode unicode characters with corresponding entities:
facebook/hhvm#1875

A solution could be replacing htmlentities with htmlspecialchars.
But even there is a small bug, because using ENT_IGNORE will simple skip all non ASCII chars:
facebook/hhvm#2071

Example:

<?php
$s = 'èéæÃ³';
var_dump(
    $s,
    htmlspecialchars($s, ENT_COMPAT, 'UTF-8'),
    htmlspecialchars($s, ENT_COMPAT | ENT_IGNORE, 'UTF-8'),
    htmlspecialchars($s, ENT_COMPAT | ENT_SUBSTITUTE, 'UTF-8')
);

Output:

PHP
string(10) "èéæÃ³"
string(10) "èéæÃ³"
string(10) "èéæÃ³"
string(10) "èéæÃ³"
HHVM
string(10) "èéæÃ³"
string(10) "èéæÃ³"
string(0) ""
string(10) "èéæÃ³"

But this should be solveable by switching to ENT_SUBSTITUTE.

@jdorn What do you think? Waiting for a HHVM fix or using htmlspecialchars with ENT_SUBSTITUTE?

The text was updated successfully, but these errors were encountered:

jdorn · 2014-03-15T14:29:40Z

I'm fine adding a check for ENT_SUBSTITUTE support and preferring that over ENT_IGNORE. There's already a check at https://github.com/jdorn/sql-formatter/blob/master/lib/SqlFormatter.php#L867 so adding another if condition isn't a big deal.

h4cc · 2014-03-15T15:35:35Z

I am not sure if ENT_SUBSTITUTE will be a usefull parameter. If there is much binary data in the query, it will be encoded instead of skipped.

What about switching to htmlspecialchars? It will not output "Ã" instead of "Ã "what htmlentities does.

jdorn · 2014-03-15T16:09:43Z

ok, so to fix this, we would need to switch to htmlspecialchars and add an extra if statement to check for ENT_SUBSTITUTE support?

I'm not sure I would do this just for a HHVM bug, which will probably be fixed soon. Would this provide any benefit to normal PHP usage? I don't know too much about character encodings.

h4cc · 2014-03-15T18:37:24Z

I dont think there is a quick but lasting fix here.
HHVMs htmlentites does not return HTML-Entites for UTF8 inputs, because HHVM is based on UTF8, not Latin1.
So basically HHVM's htmlentities with UTF8 behaves like the default htmlspecialchars, except for the Bug with the ENT_IGNORE flag.

There is also a hint we should look at here:
http://www.php.net/manual/en/function.htmlentities.php

ENT_IGNORE  Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it » may have security implications.

Summarized, we could leave everything as is and skip HHVM testing.
Or we could follow the overall recommondations and stop creating HTML-Entities for UTF8 chars. If we do so, we have to change some some tests and can get HHVM testing.

My opinion is, to switch to 'htmlspecialchars' and release this change as a 1.3.* version.

garyemiller · 2014-03-19T23:26:24Z

HHVMs htmlentites does not return HTML-Entites for UTF8 inputs, because HHVM is based on UTF8, not Latin1.

Lost me there. Does not matter UTF8 or Latin1, you do not want to emit a naked & in HTML because that is invalid HTML. Ditto < and >.

Per W3C:
http://www.w3.org/International/questions/qa-escapes#use

Sure only htmlspecialchars() is sufficient for valid html, but a lot of programmers use htmlentities() to make their HTML more robust and readable, even in a UTF8 environment.

Sadly, since htmlspecialchars() is currently broken in hhvm using it instead of htmlentitities() still leaves brokenness.

Regardless of the arguments pro or con, the goal should be to duplicate existing PHP behavior, not to rethink it.

h4cc · 2014-03-20T08:00:37Z

In the end it is not critical, because this library works on HHVM as expected. Just the tests will not run exactly the same because of this one different behaviour. Lets see what the future brings.

Lost me there. Does not matter UTF8 or Latin1, you do not want to emit a naked & in HTML because that is invalid HTML. Ditto < and >.

HHVM will create html entites for "<>&" as intended. It will just not create html entites for "Ãà...".

garyemiller · 2014-03-20T21:40:19Z

In the end it is not critical, because this library works on HHVM as expected.

Uh, no. I expect it to work as in the PHP documentation. See HHVM bug #2061

HHVM will create html entites for "<>&" as intended.

Uh, no, see HHVM bug #2064

dave1010 mentioned this issue Jan 16, 2015

Add hhvm (non-nightly) to Travis #72

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for HHVM #62

Support for HHVM #62

h4cc commented Mar 15, 2014

jdorn commented Mar 15, 2014

h4cc commented Mar 15, 2014

jdorn commented Mar 15, 2014

h4cc commented Mar 15, 2014

garyemiller commented Mar 19, 2014

h4cc commented Mar 20, 2014

garyemiller commented Mar 20, 2014

Support for HHVM #62

Support for HHVM #62

Comments

h4cc commented Mar 15, 2014

jdorn commented Mar 15, 2014

h4cc commented Mar 15, 2014

jdorn commented Mar 15, 2014

h4cc commented Mar 15, 2014

garyemiller commented Mar 19, 2014

h4cc commented Mar 20, 2014

garyemiller commented Mar 20, 2014