Support Pandoc / R Markdown / Quarto special features #588

allefeld · 2024-09-13T23:07:45Z

Documents in Pandoc Markdown including R Markdown and Quarto have a few special features which currently interfere with the spell checking. It would be great if the extension could be adapted to these features.

There is usually a YAML block at the beginning (but it can also be elsewhere and there can be more than one), included between two lines containing only ---.

It appears that there is currently some YAML block support (?) because most keywords and many values are not highlighted, but there are still some that are, e.g. katex in html-math-method: katex. A perfect solution would be if only values of YAML keys which result in document text are spellchecked (e.g. title:) and the rest ignored. But for simplicity just ignoring the whole block would also be fine.
Text can include both inline (between $s) and display (between $$) LaTeX math, in HTML output rendered e.g. by MathJax. Moreover, macros can be defined directly in the Markdown source and are expanded by Pandoc.

The best solution would be to ignore everything except text that is displayed, like the extension does in LaTeX mode. But for simplicity, it should be enough to ignore macro names starting with a backslash and environment names (for non-English documents).

As a workaround, I defined the following "ignore" regular expressions:

    "spellright.ignoreRegExpsByClass": {
        "markdown": [
            "/\\\\(?:begin|end){.*?}/g",
            "/\\\\[a-zA-Z]*\\(?/g",
            "/^---\\n[^]*?\\n---$/gm"
        ],
        "quarto": [
            "/\\\\(?:begin|end){.*?}/g",
            "/\\\\[a-zA-Z]*\\(?/g",
            "/^---\\n[^]*?\\n(?:---)$/gm"
        ]

But according to the README such expressions "may have serious impact on performance", so maybe a built-in solution would be better?

For some reason I don't understand, I had to include \\(? in the second regular expression, because otherwise e.g. \sin in \sin(x) is not ignored. More strangely, in such a case the spellcheck does not complain about \sin, but about ____ (four underscore characters):

The text was updated successfully, but these errors were encountered:

connortwiegand · 2024-10-16T21:05:55Z

For an internal solution, It looks like the main files to modify would be lib/parsers/markdown.js and lib/doctype.js. Two ideas come to mind. One is calling quarto YAML headers "comments". This may require adding comments to the markdown parser. The other idea is to add the suggested regexps as additional delimiters of a "code" block. Unsure if that would impact performance significantly (my guess is not).

allefeld · 2024-10-17T18:08:12Z

I've been using these regexps for a while now, and I did not notice decreased performance. I still think this would be a good improvement, but basically I'm fine for now, so it might be enough to mention my regexps (or better versions) in the documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Pandoc / R Markdown / Quarto special features #588

Support Pandoc / R Markdown / Quarto special features #588

allefeld commented Sep 13, 2024 •

edited

Loading

connortwiegand commented Oct 16, 2024 •

edited

Loading

allefeld commented Oct 17, 2024

Support Pandoc / R Markdown / Quarto special features #588

Support Pandoc / R Markdown / Quarto special features #588

Comments

allefeld commented Sep 13, 2024 • edited Loading

connortwiegand commented Oct 16, 2024 • edited Loading

allefeld commented Oct 17, 2024

allefeld commented Sep 13, 2024 •

edited

Loading

connortwiegand commented Oct 16, 2024 •

edited

Loading