-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate nested emphasis and strong emphasis delimiters in Markdown writer #10642
Comments
Pandoc's markdown parser can handle the sort of nested italics that seems sensible, for example
What it can't handle is a case where the entire phrase is nested:
But why on earth would someone write something like that? Can you point to some real-world examples? |
I'm sorry that I wasn't clear, but I don't think we are on the same page. I first noticed this bug on Circadian rhythm - Wikipedia, which includes nested <i><i lang="la"><a href="https://en.wiktionary.org/wiki/circa#Latin" class="extiw" title="wikt:circa">circa</a></i></i> The problem is that the nested italics are written as It's not particularly relevant, but the reason these tested italics happen is that the inner Please let me know how I could have written my initial issue to avoid any confusion. |
I see, your explanation about the template helps. |
Sorry for not testing this out earlier, but the fix loses formatting information present in the original text. Instead of treating |
Usually when emph is nested, the convention is to alternate between italics and non-italics. For example, if a book is called "Race in Huckleberry Finn" and another book that discusses this is called "A Commentary on [booktitle]", it will normally be formatted as "A Commentary on Race in Huckleberry Finn". That is why this choice was made. |
That's a good point! Thanks for the explanation! I guess I'll just preprocess the HTML to fix the nested italics, then. |
Explain the problem.
The Markdown writers should alternate nested emphasis and strong emphasis delimiters to prevent incorrect formatting being output.
I encountered nested
<i>
tags in the wild (they appear to be relatively common on Wikipedia), and I noticed that nested italics are rendered as strong emphasis instead of nested emphasis:I would instead expect that
<i><i>A</i></i>
be converted to_*A*_
or*_A_*
. This syntax appears to be treated as nested emphasis according to both the Markdown specification (https://spec.commonmark.org/0.31.2/#emphasis-and-strong-emphasis) and Pandoc's own Markdown reader:This issue is similar to #9521, but that bug report is asking for the formatting to be dropped. I am instead asking that Pandoc not try to "clean" the formatting here and simply write Markdown that it can itself read in. In addition, it is tagged with
format:HTML
andreader
when this issue should instead beformat:Markdown
andwriter
.I'm not sure how one should handle nested intraword emphasis, but given the limitations of Markdown, it might be best to consider that impossible to write without problems.
Pandoc version?
macOS on Apple Silicon (albeit an x86_64 executable running under Rosetta2)
pandoc 3.6.3-nightly-2025-02-24
Features: +server +lua
Scripting engine: Lua 5.4
The text was updated successfully, but these errors were encountered: