-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comment AST element? #1926
Comments
Hi @jondo, thanks for the feedback. Pandoc's internal AST doesn't have an representation for comments. For example, in HTML
and in LaTeX
Because of this there is no to preserve comments right now. |
Exactly. Comments are just represented as raw HTML, so they don't appear in non-HTML formats. In principle, pandoc could add a native Comment element, but this would be quite an involved change. (Every reader and writer would need to be modified.) And even if we did this, we'd get complaints if we made it the default to convert comments to the target format, since some people may be relying on this not happening. |
I just came across this since I'm looking for similar functionality. To address one of your points, a flag like If at all possible, this would be a valuable feature. |
I am also after preservation of comments across formats. My current interest is in having latex comments converted to comments in odt output. Issue #1561 raises the related point of annotations, but I'm not sure how that could be applied to tex formats. |
As for:
and:
If someone can point me in the right direction, I can take a crack at these; comments would be a massive bonus for me, personally. |
You could write a pandoc filter that passes your HTML
comments through to latex as latex comments.
Something like
```
import Text.Pandoc.JSON
main = toJSONFilter commentHtmlToTeX
commentHtmlToTeX :: Block -> Block
commentHtmlToTeX (RawBlock (Format "html") ('<':'!':'-':'-':xs)) =
RawBlock (Format "latex")
(unlines $ map ("% "++) $ lines $ take (length xs - 3) xs
commentHtmlToTeX x = x
```
This would allow you to pass through comments without any
change in pandoc itself.
|
JGM--That is a clever solution! |
(Transferring/merging from #3187 as requested.) I realize most uses of Pandoc are one-way and display-format-oriented, but it is such a rich transformation system that it can be very valuable to capture all available information where possible in readers, rather than dropping or flattening it during reading. Pandoc readers already do this to a very large degree, and even for comments, the Markdown reader reads comments as raw HTML blocks which can be suppressed by default when going to other targets. The LaTeX reader, however, does not seem to have a way to preserve comments. It would be useful for (in my case) LaTeX <==> (extended) Markdown round tripping to be able to capture comments when reading LaTeX. I'm glad to handle them in my own desired way using my own filter scripts, but even that is not currently possible since they're elided on reading. It's not obvious why they could not also be parsed as raw TeX strings, starting with In short, as a useful half-step still well short of supporting comments as a general new node type, it would be valuable for the LaTeX reader/writer, in particular, to support comments as Raw TeX blocks as Markdown/HTML do with raw HTML comments. |
+++ Jonathan Ragan-Kelley [Dec 04 16 17:19 ]:
reading. It's not obvious why they could not also be parsed as raw TeX
strings, starting with %, just as Markdown/HTML comments are raw HTML
nodes internally enclosed by the text <!--…-->.
Unfortunately, that's not going to work by itself, because
raw tex gets rendered in Markdown, where the % will be
interpreted as a % sign.
The best we could do would be to have "comment" environments
(\begin{comment}...\end{comment}) included a raw LaTeX
(at least when --parse-raw is specified), instead of just
omitted. I don't know if that would help for you.
|
Note that the comment environment requires the verbatim package which pandoc is not currently depending on. |
For my uses, at least, I don't care that RawTeX passes through to Markdown, since I'm mostly interested in going this direction with my own filters in the pipeline. I'm sure I'm in the minority, but I think attention should be paid to uses of Pandoc as a semantic parsing and transformation engine, not just a black-box converter which must always give the desired output directly using only its own internal processes and defaults. And unfortunately the comment environment isn't sufficient, since I am round-tripping with standard LaTeX written by others. |
I don't entirely understand this statement, but if it seems like "it's own internal processes" refers to the pandoc AST and "defaults" or being a "black-box converter" refer to the customizability. The former one is a design choice, however the AST is changed and improved, what pandoc can do should always implied and limited by the AST. "Hacking" beyond what the AST allows will be the job of pre/post-processors/filters. And about customizability, while pandoc already has seas of command line options (so it is not a black-box nor only have "defaults"), there will always be situations those customizability is not enough.
It sounds like you're using beyond what pandoc is designed for. But I'm curious: did you round-tripping with pandoc with success in some cases? You sound like you're already relying on this behavior from pandoc with success. A quote from the manual:
Going back to the comment issue, the "pandoc way" to do it is to make an "AST change" which defines a new comment element. (By the way, should this issue has the "AST Change" label?) So getting comments work across formats is not unachievable (albeit difficult). But your expectation on pandoc in general (if I understand you correctly above) is a mission impossible. I have also pushed pandoc beyond what it is designed for and some cases have success, but we're pretty much on our own (and pandoc-discuss) in this case. |
I would find an AST representation of comments also very useful since comments are an element that is present in many text-based (e.g. HTML) and WYSIWYG (e.g. docx) formats. We would need to consider if we would/could support anchor+selection style of comments (which is usually visible as highlighted text, e.g. in Word) |
I'd also love comments to be useful. I often use them to structure documents visually (e.g. paragraph titles in comments), and make it easier to read over. It would be useful to still have those when converting from markdown to latex, for instance. Presumably having them in the AST would also make it easier to write filters that would e.g. convert HTML comments in to word-doc comments, which would be useful when sharing with coauthors sometimes (I currently use the |
@jgm Could you please explain me how is that a possible workaround if there's no AST representation for comments and the filters, AFAIU, work on Pandoc's AST? My best guess is that there is that most of the time there's a one-to-one match from |
The filter matches on a RawBlock and emits a RawBlock, yes. |
I am using Pandoc 1.13.0.1 to convert Markdown to LaTeX. My Markdown documents contain comments of the form
<!-- some comment -->
, that are currently stripped by Pandoc.It would be useful to create LaTeX comments instead:
% some comment
.The text was updated successfully, but these errors were encountered: