Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Pandoc / R Markdown / Quarto special features #588

Open
allefeld opened this issue Sep 13, 2024 · 2 comments
Open

Support Pandoc / R Markdown / Quarto special features #588

allefeld opened this issue Sep 13, 2024 · 2 comments

Comments

@allefeld
Copy link

allefeld commented Sep 13, 2024

Documents in Pandoc Markdown including R Markdown and Quarto have a few special features which currently interfere with the spell checking. It would be great if the extension could be adapted to these features.

  • There is usually a YAML block at the beginning (but it can also be elsewhere and there can be more than one), included between two lines containing only ---.

    It appears that there is currently some YAML block support (?) because most keywords and many values are not highlighted, but there are still some that are, e.g. katex in html-math-method: katex. A perfect solution would be if only values of YAML keys which result in document text are spellchecked (e.g. title:) and the rest ignored. But for simplicity just ignoring the whole block would also be fine.

  • Text can include both inline (between $s) and display (between $$) LaTeX math, in HTML output rendered e.g. by MathJax. Moreover, macros can be defined directly in the Markdown source and are expanded by Pandoc.

    The best solution would be to ignore everything except text that is displayed, like the extension does in LaTeX mode. But for simplicity, it should be enough to ignore macro names starting with a backslash and environment names (for non-English documents).

As a workaround, I defined the following "ignore" regular expressions:

    "spellright.ignoreRegExpsByClass": {
        "markdown": [
            "/\\\\(?:begin|end){.*?}/g",
            "/\\\\[a-zA-Z]*\\(?/g",
            "/^---\\n[^]*?\\n---$/gm"
        ],
        "quarto": [
            "/\\\\(?:begin|end){.*?}/g",
            "/\\\\[a-zA-Z]*\\(?/g",
            "/^---\\n[^]*?\\n(?:---)$/gm"
        ]

But according to the README such expressions "may have serious impact on performance", so maybe a built-in solution would be better?

For some reason I don't understand, I had to include \\(? in the second regular expression, because otherwise e.g. \sin in \sin(x) is not ignored. More strangely, in such a case the spellcheck does not complain about \sin, but about ____ (four underscore characters):

image

@connortwiegand
Copy link

connortwiegand commented Oct 16, 2024

For an internal solution, It looks like the main files to modify would be lib/parsers/markdown.js and lib/doctype.js. Two ideas come to mind. One is calling quarto YAML headers "comments". This may require adding comments to the markdown parser. The other idea is to add the suggested regexps as additional delimiters of a "code" block. Unsure if that would impact performance significantly (my guess is not).

@allefeld
Copy link
Author

I've been using these regexps for a while now, and I did not notice decreased performance. I still think this would be a good improvement, but basically I'm fine for now, so it might be enough to mention my regexps (or better versions) in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants