Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comment AST element? #1926

Open
jondo opened this issue Feb 5, 2015 · 16 comments
Open

Comment AST element? #1926

jondo opened this issue Feb 5, 2015 · 16 comments

Comments

@jondo
Copy link

jondo commented Feb 5, 2015

I am using Pandoc 1.13.0.1 to convert Markdown to LaTeX. My Markdown documents contain comments of the form <!-- some comment -->, that are currently stripped by Pandoc.

It would be useful to create LaTeX comments instead: % some comment.

@rgaiacs
Copy link
Contributor

rgaiacs commented Feb 5, 2015

Hi @jondo,

thanks for the feedback.

Pandoc's internal AST doesn't have an representation for comments. For example, in HTML

$ pandoc -f html -t json <<EOF
<!-- Foo -->
EOF
[{"unMeta":{}},[]]

and in LaTeX

$ pandoc -f latex -t json <<EOF
% Foo
EOF
[{"unMeta":{}},[]]

Because of this there is no to preserve comments right now.

@jgm
Copy link
Owner

jgm commented Feb 8, 2015

Exactly. Comments are just represented as raw HTML, so they don't appear in non-HTML formats. In principle, pandoc could add a native Comment element, but this would be quite an involved change. (Every reader and writer would need to be modified.) And even if we did this, we'd get complaints if we made it the default to convert comments to the target format, since some people may be relying on this not happening.

@mhkeller
Copy link

I just came across this since I'm looking for similar functionality. To address one of your points, a flag like --preserve-comments could keep it hidden for most users. My use case is I want to convert markdown files to word docs so people can more easily share them among non-markdown users.

If at all possible, this would be a valuable feature.

@ghost
Copy link

ghost commented May 7, 2015

I am also after preservation of comments across formats. My current interest is in having latex comments converted to comments in odt output. Issue #1561 raises the related point of annotations, but I'm not sure how that could be applied to tex formats.

@philbarresi
Copy link

As for:

In principle, pandoc could add a native Comment element, but this would be quite an involved change. (Every reader and writer would need to be modified.)

and:

To address one of your points, a flag like --preserve-comments could keep it hidden for most users

If someone can point me in the right direction, I can take a crack at these; comments would be a massive bonus for me, personally.

@jgm
Copy link
Owner

jgm commented Jul 17, 2015 via email

@bamcdougall
Copy link

JGM--That is a clever solution!

@jrk
Copy link

jrk commented Dec 5, 2016

(Transferring/merging from #3187 as requested.)

I realize most uses of Pandoc are one-way and display-format-oriented, but it is such a rich transformation system that it can be very valuable to capture all available information where possible in readers, rather than dropping or flattening it during reading.

Pandoc readers already do this to a very large degree, and even for comments, the Markdown reader reads comments as raw HTML blocks which can be suppressed by default when going to other targets. The LaTeX reader, however, does not seem to have a way to preserve comments.

It would be useful for (in my case) LaTeX <==> (extended) Markdown round tripping to be able to capture comments when reading LaTeX. I'm glad to handle them in my own desired way using my own filter scripts, but even that is not currently possible since they're elided on reading. It's not obvious why they could not also be parsed as raw TeX strings, starting with %, just as Markdown/HTML comments are raw HTML nodes internally enclosed by the text <!---->.


In short, as a useful half-step still well short of supporting comments as a general new node type, it would be valuable for the LaTeX reader/writer, in particular, to support comments as Raw TeX blocks as Markdown/HTML do with raw HTML comments.

@jgm
Copy link
Owner

jgm commented Dec 5, 2016 via email

@ickc
Copy link
Contributor

ickc commented Dec 5, 2016

Note that the comment environment requires the verbatim package which pandoc is not currently depending on.

@jrk
Copy link

jrk commented Dec 6, 2016

For my uses, at least, I don't care that RawTeX passes through to Markdown, since I'm mostly interested in going this direction with my own filters in the pipeline. I'm sure I'm in the minority, but I think attention should be paid to uses of Pandoc as a semantic parsing and transformation engine, not just a black-box converter which must always give the desired output directly using only its own internal processes and defaults.

And unfortunately the comment environment isn't sufficient, since I am round-tripping with standard LaTeX written by others.

@ickc
Copy link
Contributor

ickc commented Dec 6, 2016

@jrk

... not just a black-box converter which must always give the desired output directly using only its own internal processes and defaults.

I don't entirely understand this statement, but if it seems like "it's own internal processes" refers to the pandoc AST and "defaults" or being a "black-box converter" refer to the customizability. The former one is a design choice, however the AST is changed and improved, what pandoc can do should always implied and limited by the AST. "Hacking" beyond what the AST allows will be the job of pre/post-processors/filters. And about customizability, while pandoc already has seas of command line options (so it is not a black-box nor only have "defaults"), there will always be situations those customizability is not enough.

... since I am round-tripping with standard LaTeX written by others.

It sounds like you're using beyond what pandoc is designed for. But I'm curious: did you round-tripping with pandoc with success in some cases? You sound like you're already relying on this behavior from pandoc with success.

A quote from the manual:

Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. Pandoc attempts to preserve the structural elements of a document, but not formatting details such as margin size. And some document elements, such as complex tables, may not fit into pandoc’s simple document model. While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.

Going back to the comment issue, the "pandoc way" to do it is to make an "AST change" which defines a new comment element. (By the way, should this issue has the "AST Change" label?) So getting comments work across formats is not unachievable (albeit difficult). But your expectation on pandoc in general (if I understand you correctly above) is a mission impossible. I have also pushed pandoc beyond what it is designed for and some cases have success, but we're pretty much on our own (and pandoc-discuss) in this case.

@jdittrich
Copy link

I would find an AST representation of comments also very useful since comments are an element that is present in many text-based (e.g. HTML) and WYSIWYG (e.g. docx) formats.

We would need to consider if we would/could support anchor+selection style of comments (which is usually visible as highlighted text, e.g. in Word)

@naught101
Copy link

I'd also love comments to be useful. I often use them to structure documents visually (e.g. paragraph titles in comments), and make it easier to read over. It would be useful to still have those when converting from markdown to latex, for instance.

Presumably having them in the AST would also make it easier to write filters that would e.g. convert HTML comments in to word-doc comments, which would be useful when sharing with coauthors sometimes (I currently use the todonotes latex package, but that obviously doesn't work when converting to docx or similar).

Repository owner deleted a comment May 14, 2018
Repository owner deleted a comment Aug 22, 2018
Repository owner deleted a comment Nov 23, 2018
@mb21 mb21 changed the title Markdown to LaTeX: please keep comments Comment AST element? Dec 7, 2018
@mb21 mb21 removed the reader label Dec 7, 2018
@fgasperij
Copy link

fgasperij commented May 17, 2020

You could write a pandoc filter [...]. This would allow you to pass through comments without any change in pandoc itself.

@jgm Could you please explain me how is that a possible workaround if there's no AST representation for comments and the filters, AFAIU, work on Pandoc's AST?

My best guess is that there is that most of the time there's a one-to-one match from RawBlocks to the elements of the input that have no corresponding output, such as HTML comments, since you have to identify them to not include them. So a filter can detect them by checking their prefix. If this is the case I think it's super useful to know it and wonder why you chose not to include this fact in the docs (at least I wasn't able to find it).

@jgm
Copy link
Owner

jgm commented May 18, 2020

The filter matches on a RawBlock and emits a RawBlock, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

15 participants