-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mathml #200
base: master
Are you sure you want to change the base?
Mathml #200
Conversation
…to admit them" This reverts commit 85eb33800d19a83143c447906b1c364bd4915cb5.
Yes please. I haven't looked at the patch yet, but we'll have to work out some solution to the Travis OOM problem. Maybe as simple as just bumping up the max memory size in PHP config. |
$E['XLINK.prefix'] = 'xlink'; | ||
|
||
$proprietary_att_wrs = array( | ||
'wrs:valign' => 'CDATA', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum#middle,middle-baseline?
// Prefix used for xlink attrs; is not specified by the MathML DTD | ||
$E['XLINK.prefix'] = 'xlink'; | ||
|
||
$proprietary_att_wrs = array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these attributes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dsi:*
and wrs:*
are a set of proprietary styling attributes which are pretty common in many MathML documents.
In fact, they appear in the standard W3C testing suite (see e.g. https://www.w3.org/Math/testsuite/mml2-testsuite/Presentation/DynamicExpressions/maction/mactionBhigh1.xml) which we use to validate the MathML module in HTMLPurifier.
Even thought they are not part of the standard, we believe them to be important enough to be validated just like the rest of the attributes.
$E['CommonAtt'] = array_merge( | ||
array( | ||
'xmlns' => 'Bool#http://www.w3.org/1998/Math/MathML', | ||
$E['XLINK.prefix'] . ':href' => 'CDATA', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks XSSable
|
||
$E['DefEncAtt'] = array( | ||
'encoding' => 'CDATA', | ||
'definitionurl' => 'CDATA' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks XSSable
'id' => 'CDATA', // MathML allows multiple elements with same ID | ||
'xref' => 'CDATA', | ||
'class' => 'CDATA', | ||
'style' => 'CDATA', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely XSSable
This is a really good start. However, there is too heavy use of CDATA, especially on fields which are known to be XSS-able if you have arbitrary string data. The next step is to go through the CDATA fields and audit them. Use existing attribute definitions as much as possible, and otherwise clamp down on possible values. |
I went and updated the most important fields that were previously CDATA, some new AttrDefs were added in order to do so. |
OK. This may take some time to review because I need to audit every remaining occurrence of CDATA to confirm that there isn't security risk. |
Hi @ezyang, Hope you are well. My name is Manuel, Plugin Manager at WIRIS. I'm, writing to see if there are some prevision to include this update in a new version of HTML Purifier and to see if we can help in any way to accelerate the integration process. Best, |
Hi there, i am going through multiple sanitizer libs to use it with CKeditor MathType / ChemType and i found this thread, can anyone please tell me how to sanitize MathType tags properly? It's getting little bit confusing. |
Just in case, if anyone is looking for sanitizer for MathMl or ChemType / MathType you can use this https://github.com/rohitcoder/html-sanitizer/ this is improved and upgraded version of https://github.com/tgalopin/html-sanitizer/ |
Hi,
My name is Xavier Ripoll, I'm a developer at MathType and colleague of @manuelwiris, who contacted you a couple of months ago asking about a MathML module.
As he stated, we are working on a MathML module of our own for HTML Purifier. Currently, the module has been completed and is ready for release. We share the concerns you raised regarding security so we did an exhaustive job implementing the specification. To back our work, we also created three suites of tests.
One of them is included within the
mathml
branch (the one intended to merge) and consists of the Mozilla MathML Torture Test.The two others are the W3C MathML Test Suite and 140+ custom MathML snippets. These are included only in the
testing
branch because of their size. If you prefer, though, they can be included in the main branch as well.All three suites of tests are passing in our local environment with all PHP versions. Nonetheless, in older PHP versions (<=5.5) the execution fails in Travis due to an out-of-memory exception we believe to be caused by the size of the
schema.ser
file. We'd appreciate any feedback from your side about what might be causing the error.You can check the results of the
mathml
(https://travis-ci.org/wiris/htmlpurifier/builds/453971161) andtesting
(https://travis-ci.org/wiris/htmlpurifier/builds/453986321) branches on Travis.Given all of this, we think HTML Purifier should benefit from the inclusion of the MathML module we propose.
Thank you,
Xavier
P.S. We have also detected some other issues with the HTML Purifier package, is the GitHub tracker the appropriate place to report these?