Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch: Failed comparison of activeFormattingElements entry with Marker #44

Open
gwicke opened this issue Nov 28, 2011 · 4 comments
Open
Assignees

Comments

@gwicke
Copy link
Contributor

gwicke commented Nov 28, 2011

On node 0.6.3 and html5 v0.3.5, parsing multiple page fragments with the same parser sometimes causes failures in the treebuilder, which are caused by a pointer-unequal marker element in reconstructActiveFormattingElements and elementInActiveFormattingElements. The problem is mainly triggered if the parsed fragments contain tables.

The following patch converts the pointer equality based check into a node type check, which fixes the problem for me:

--- treebuilder.js      2011-11-28 21:30:17.675749830 +0100
+++ /usr/lib/node_modules/html5/lib/html5/treebuilder.js        2011-11-28 20:21:38.067593170 +0100
@@ -224,9 +224,9 @@
        // Step 2 and 3: start with the last element
        var i = this.activeFormattingElements.length - 1;
        var entry = this.activeFormattingElements[i];
-       if(entry == HTML5.Marker || this.open_elements.indexOf(entry) != -1) return;
+       if(entry.type == HTML5.Marker.type || this.open_elements.indexOf(entry) != -1) return;

-       while(entry != HTML5.Marker && this.open_elements.indexOf(entry) == -1) {
+       while(entry.type != HTML5.Marker.type && this.open_elements.indexOf(entry) == -1) {
                i -= 1;
                entry = this.activeFormattingElements[i];
                if(!entry) break;
@@ -248,7 +248,7 @@
 b.prototype.elementInActiveFormattingElements = function(name) {
        var els = this.activeFormattingElements;
        for(var i = els.length - 1; i >= 0; i--) {
-               if(els[i] == HTML5.Marker) break;
+               if(els[i].type == HTML5.Marker.type) break;
                if(els[i].tagName.toLowerCase() == name) return els[i];
        }
        return false;
@aredridel
Copy link
Owner

Hm. It doesn't happen for me with this test case:

var HTML5 = require('../../lib/html5'),
    test = require('tap').test;

test("test 1", function(t) {
    var p = new HTML5.Parser()
    p.parse("<body><table><tr><td>Hello</td></tr></table</body>");
    p.parse("<body><table><tr><td>Hello</td></tr></table</body>");
    t.pass("Works");
    t.end();
});

Do you have test data that triggers it?

@gwicke
Copy link
Contributor Author

gwicke commented Nov 29, 2011

I can trigger it reliably when running through MediaWiki's 600-odd parser tests, but have not yet been able to reproduce it in a minimal test case in the shell. The error always occurs in the same test cases, all related to tables. The first string that triggers it is <body><table><caption> caption</caption><tr><td></td></tr></table></body>. This is the second table test case, following <body><table></table></body>. The tests preceding these tables are about unbalanced formatting elements.

I'll see if I can track down where the second marker is coming from, or if I can create a minimal test case.

@aredridel
Copy link
Owner

Awesome. Or just a link to the parser tests would rock, I'll poke at it too.

@gwicke
Copy link
Contributor Author

gwicke commented Dec 2, 2011

I just added the information needed to try it at https://www.mediawiki.org/wiki/Future/Parser_development#Trying_it_out. If you search for 'normalize' in the output you should find the error messages from the parser. I use less -R as a searchable viewer with color.

The code generating the error is all in parserTests.js. Basically, we use the stock html parser to parse and re-serialize the expected test output (for normalization). This part is where the failure occurs. We also use a hacked-up tree builder in a separate module as a backend for our wiki parser, but that should not affect the plain use of the npm module as the namespaces are separate.

@ghost ghost assigned aredridel Aug 21, 2013
wmfgerrit pushed a commit to wikimedia/mediawiki-services-parsoid that referenced this issue Nov 13, 2013
  This is needed until aredridel/html5#44
  is merged into the upstream "html5" module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants