Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser #514

gerryhocks · 2015-11-19T14:05:10Z

This change replaces the existing AsciiDoc parser with a simpler AsciiDoc parser with no external dependencies.

This new parser generates a model of the source document which retains each character's original position, then proceeds to 'erase' the elements of the model that are recognized as AsciiDoc markup. The remaining characters are then converted into a RedPen document model.

The pom has been updated to remove asciidoctor/asciidoctorj and jruby related dependencies.

takahi-i · 2015-11-20T04:02:13Z

👍

takahi-i · 2015-11-20T04:21:32Z

It works almost expected. The followings are the topics to improve.

not to add sentences in comments this parser extract sentences in comments (// BLAH BLAH)
not to add sentences in tables this parser looks extract sentences in tables. For example this parser extract the following table as one sentence.

|===
|Hex |RGB |CMYK

|ffffff または #ffffff
|[255,255,255]
|[0, 0, 0, 0] または [0, 0, 0, 0%]
|===

And it would be great for you to add enough tests internal long methods such as convertToSentences, convertModel, Line constructor. eraseEnclosure etc. The tests are needed for me to understand and maintain the codes.

gerryhocks · 2015-11-20T08:10:27Z

Hi Ito-san,

Thanks for having a look. Both tables and comments should already be ignored. For me, the following code prints one line "Potato":

    String sampleText = "// BLAH BLAH" +
                "\n" +
                "Potato" +
                "\n" +
                "|===\n" +
                "|Hex |RGB |CMYK nibble\n" +
                "\n" +
                "|ffffff または #ffffff asd asd\n" +
                "|[255,255,255]\n" +
                "|[0, 0, 0, 0] または [0, 0, 0, 0%]\n" +
                "|===\n";
        Document doc = createFileContent(sampleText);

        for (Section section : doc) {
            for (Paragraph paragraph : section.getParagraphs()) {
                paragraph.getSentences().forEach(sentence -> {
                    System.err.println(sentence.getContent());
                });
            }
        }

Would it be possible to get the document you are testing with so I can see what you are seeing?

I will add some internal test cases for the parser so you can see how it works.

Best wishes,
Gerry

gerryhocks · 2015-11-25T16:33:53Z

Update to the AsciiDoc parser.

This should fix the reported issue. The problem was caused by an where a table block could, mistakenly, end an existing block.

The Parser's inner classes have been moved to external classes, and moved to a new cc.redpen.parser.asciidoc package. This is to reduce the apparent complexity of the AsciiDocParser class.

Some of the code has also been reworked to improve readability.

Additional tests have been added to the test branch in cc.redpen.parser.asciidoc to test the inner workings of the asciidoc parser.

takahi-i · 2015-11-29T07:08:03Z

Thanks a lot for the update!

gerryhocks · 2015-11-29T08:27:46Z

I've pushed an update that resolves the conflicts with the upstream changes

takahi-i · 2015-12-01T06:19:07Z

LGTM

Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser

gerryhocks added 3 commits November 17, 2015 16:15

Partially working new version of AsciiDoc parser

f25cb57

Creation of lightweight AsciiDoc parser

35d5018

Merge branch 'master' of https://github.com/recruit-tech/redpen

3b4902e

takahi-i self-assigned this Nov 20, 2015

gerryhocks added 2 commits November 25, 2015 17:24

Updates to AsciiDoc parser

f443479

Merge branch 'master' of https://github.com/recruit-tech/redpen

7d28b9e

Merge of recent changes to upstream

f571e62

takahi-i added a commit that referenced this pull request Dec 1, 2015

Merge pull request #514 from gerryhocks/master

c55388c

Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser

takahi-i merged commit c55388c into redpen-cc:master Dec 1, 2015

takahi-i mentioned this pull request Dec 1, 2015

redpen throws an exception when I use glossary list in asciidoc #509

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser #514

Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser #514

gerryhocks commented Nov 19, 2015

takahi-i commented Nov 20, 2015

takahi-i commented Nov 20, 2015

gerryhocks commented Nov 20, 2015

gerryhocks commented Nov 25, 2015

takahi-i commented Nov 29, 2015

gerryhocks commented Nov 29, 2015

takahi-i commented Dec 1, 2015

Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser #514

Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser #514

Conversation

gerryhocks commented Nov 19, 2015

takahi-i commented Nov 20, 2015

takahi-i commented Nov 20, 2015

gerryhocks commented Nov 20, 2015

gerryhocks commented Nov 25, 2015

takahi-i commented Nov 29, 2015

gerryhocks commented Nov 29, 2015

takahi-i commented Dec 1, 2015