Fix premature HTML-escaping of inline code #393

mentalisttraceur · 2024-10-31T15:31:28Z

Fixes #372 by moving HTML-escaping of codespan from the parser into the HTML renderer.

Passes all tests, and test_ast_output is enhanced to cover this case. (I just ran pytest, are there other tests or any linters I should run?)

Before (parse to AST wrongly shows raw codespan content as &, which is a bug for use-cases that never render to HTML - for comparison, & outside of codespan doesn't get escaped):

>>> import mistune
>>> md = mistune.Markdown()
>>> md('`&`')
[{'type': 'paragraph', 'children': [{'type': 'codespan', 'raw': '&amp;'}]}]

After (pure parse correctly shows raw code content is &):

>>> import mistune
>>> md = mistune.Markdown()
>>> md('`&`')
[{'type': 'paragraph', 'children': [{'type': 'codespan', 'raw': '&'}]}]

After (still works when rendering to HTML):

>>> md.renderer = mistune.HTMLRenderer()
>>> md('`&`')
'<p><code>&amp;</code></p>\n'

Use-case example: I have a script which publishes files written in Markdown as Tumblr posts. It walks through Mistune's AST, generating Tumblr's "NPF". I need the raw contents, because Tumblr takes the raw text and does its own HTML rendering+escaping server-side. Currently, my script has to have extra handling for codespan tokens to unescape the characters escaped by mistune.util.escape.

P.S. Special thanks to @torokati44 for finding the exact line in the parser that was causing the problem.

mentalisttraceur force-pushed the fix-codespan-ampersand branch 5 times, most recently from 75f0115 to 905b69e Compare October 31, 2024 15:50

fix: only HTML-escape codespan in HTML render

245c006

mentalisttraceur force-pushed the fix-codespan-ampersand branch from 905b69e to 245c006 Compare October 31, 2024 16:35

mentalisttraceur changed the title ~~Fix inline code getting HTML-escaped outside of HTML~~ Fix inline code getting HTML-escaped prematurely Oct 31, 2024

mentalisttraceur mentioned this pull request Nov 1, 2024

Fix premature (and incomplete) HTML-escaping of links #395

Merged

mentalisttraceur changed the title ~~Fix inline code getting HTML-escaped prematurely~~ Fix premature HTML-escaping of inline code Nov 1, 2024

veenstrajelmer mentioned this pull request Nov 1, 2024

Including a README.md with links that contain &s, "translates them" to & omnilib/sphinx-mdinclude#19

Open

lepture merged commit 77f925e into lepture:master Nov 7, 2024
3 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix premature HTML-escaping of inline code #393

Fix premature HTML-escaping of inline code #393

mentalisttraceur commented Oct 31, 2024 •

edited

Loading

Fix premature HTML-escaping of inline code #393

Fix premature HTML-escaping of inline code #393

Conversation

mentalisttraceur commented Oct 31, 2024 • edited Loading

mentalisttraceur commented Oct 31, 2024 •

edited

Loading