Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix premature HTML-escaping of inline code #393

Merged
merged 1 commit into from
Nov 7, 2024

Conversation

mentalisttraceur
Copy link
Contributor

@mentalisttraceur mentalisttraceur commented Oct 31, 2024

Fixes #372 by moving HTML-escaping of codespan from the parser into the HTML renderer.

Passes all tests, and test_ast_output is enhanced to cover this case. (I just ran pytest, are there other tests or any linters I should run?)

Before (parse to AST wrongly shows raw codespan content as &, which is a bug for use-cases that never render to HTML - for comparison, & outside of codespan doesn't get escaped):

>>> import mistune
>>> md = mistune.Markdown()
>>> md('`&`')
[{'type': 'paragraph', 'children': [{'type': 'codespan', 'raw': '&'}]}]

After (pure parse correctly shows raw code content is &):

>>> import mistune
>>> md = mistune.Markdown()
>>> md('`&`')
[{'type': 'paragraph', 'children': [{'type': 'codespan', 'raw': '&'}]}]

After (still works when rendering to HTML):

>>> md.renderer = mistune.HTMLRenderer()
>>> md('`&`')
'<p><code>&amp;</code></p>\n'

Use-case example: I have a script which publishes files written in Markdown as Tumblr posts. It walks through Mistune's AST, generating Tumblr's "NPF". I need the raw contents, because Tumblr takes the raw text and does its own HTML rendering+escaping server-side. Currently, my script has to have extra handling for codespan tokens to unescape the characters escaped by mistune.util.escape.

P.S. Special thanks to @torokati44 for finding the exact line in the parser that was causing the problem.

@mentalisttraceur mentalisttraceur force-pushed the fix-codespan-ampersand branch 5 times, most recently from 75f0115 to 905b69e Compare October 31, 2024 15:50
@mentalisttraceur mentalisttraceur changed the title Fix inline code getting HTML-escaped outside of HTML Fix inline code getting HTML-escaped prematurely Oct 31, 2024
@mentalisttraceur mentalisttraceur changed the title Fix inline code getting HTML-escaped prematurely Fix premature HTML-escaping of inline code Nov 1, 2024
@lepture lepture merged commit 77f925e into lepture:master Nov 7, 2024
3 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Certain characters in inline code incorrectly parsed (e.g., &)
2 participants