Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser #514

Merged
merged 6 commits into from
Dec 1, 2015

Conversation

gerryhocks
Copy link
Contributor

This change replaces the existing AsciiDoc parser with a simpler AsciiDoc parser with no external dependencies.

This new parser generates a model of the source document which retains each character's original position, then proceeds to 'erase' the elements of the model that are recognized as AsciiDoc markup. The remaining characters are then converted into a RedPen document model.

The pom has been updated to remove asciidoctor/asciidoctorj and jruby related dependencies.

@takahi-i takahi-i self-assigned this Nov 20, 2015
@takahi-i
Copy link
Member

👍

@takahi-i
Copy link
Member

It works almost expected. The followings are the topics to improve.

  1. not to add sentences in comments this parser extract sentences in comments (// BLAH BLAH)
  2. not to add sentences in tables this parser looks extract sentences in tables. For example this parser extract the following table as one sentence.
|===
|Hex |RGB |CMYK

|ffffff または #ffffff
|[255,255,255]
|[0, 0, 0, 0] または [0, 0, 0, 0%]
|===

And it would be great for you to add enough tests internal long methods such as convertToSentences, convertModel, Line constructor. eraseEnclosure etc. The tests are needed for me to understand and maintain the codes.

@gerryhocks
Copy link
Contributor Author

Hi Ito-san,

Thanks for having a look. Both tables and comments should already be ignored. For me, the following code prints one line "Potato":

    String sampleText = "// BLAH BLAH" +
                "\n" +
                "Potato" +
                "\n" +
                "|===\n" +
                "|Hex |RGB |CMYK nibble\n" +
                "\n" +
                "|ffffff または #ffffff asd asd\n" +
                "|[255,255,255]\n" +
                "|[0, 0, 0, 0] または [0, 0, 0, 0%]\n" +
                "|===\n";
        Document doc = createFileContent(sampleText);

        for (Section section : doc) {
            for (Paragraph paragraph : section.getParagraphs()) {
                paragraph.getSentences().forEach(sentence -> {
                    System.err.println(sentence.getContent());
                });
            }
        }

Would it be possible to get the document you are testing with so I can see what you are seeing?

I will add some internal test cases for the parser so you can see how it works.

Best wishes,
Gerry

@gerryhocks
Copy link
Contributor Author

Update to the AsciiDoc parser.

This should fix the reported issue. The problem was caused by an where a table block could, mistakenly, end an existing block.

The Parser's inner classes have been moved to external classes, and moved to a new cc.redpen.parser.asciidoc package. This is to reduce the apparent complexity of the AsciiDocParser class.

Some of the code has also been reworked to improve readability.

Additional tests have been added to the test branch in cc.redpen.parser.asciidoc to test the inner workings of the asciidoc parser.

@takahi-i
Copy link
Member

Thanks a lot for the update!

@gerryhocks
Copy link
Contributor Author

I've pushed an update that resolves the conflicts with the upstream changes

@takahi-i
Copy link
Member

takahi-i commented Dec 1, 2015

LGTM

takahi-i added a commit that referenced this pull request Dec 1, 2015
Replace AsciiDoctor/AsciiDoctorJ parser with simpler lightweight AsciiDoc parser
@takahi-i takahi-i merged commit c55388c into redpen-cc:master Dec 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants