Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly escaped backslashes in the value of a fenced code block's attributes #8506

Open
wtbutler opened this issue Dec 26, 2022 · 11 comments
Labels

Comments

@wtbutler
Copy link

I'm trying to use pandoc in order to convert a markdown file to a pdf, and I ran into an issue where it garbled the formatting attributes of code blocks. I narrowed the issue down to pandoc going from markdown to texfiles in general.

When using the command pandoc -s --listings tmp.md -o tmp.tex to convert the following markdown

```{backgroundcolor="\color{yellow!10}"}
"It's a beautiful day in the neiborhood"
```
It's a beautiful day in the neiborhood

to a .tex file in version 2.9.2.1, I get the expected output of

\begin{lstlisting}[backgroundcolor={\color{yellow!10}}]

as the start of the listing. However, on version 2.17.1.1, it starts

\begin{lstlisting}[backgroundcolor={\textbackslash color\{yellow!10\}}]

where it completely misinterprets the backslashes in the formatting instructions. I've tried reproducing on the online, but I couldn't figure out how to enable the --listings option, which is key to getting the values to actually show up.

@wtbutler wtbutler added the bug label Dec 26, 2022
@wtbutler wtbutler changed the title Incorrectly escaped backslashes in the value of a fenced code block's attribute values Incorrectly escaped backslashes in the value of a fenced code block's attributes Dec 26, 2022
@jgm
Copy link
Owner

jgm commented Dec 26, 2022

l. 431 of Text.Pandoc.Readers.LaTeX

        kvs <- mapM (\(k,v) -> (k,) <$>
                       stringToLaTeX TextString v) keyvalAttr

The stringToLaTeX will cause the contents of the attribute to be escaped in the way that would be appropriate for a literal string in LaTeX. Here that's not what you want, because you mean for the attribute to include literal LaTeX. Perhaps that will always be the case for listing attributes?

Looking at the history I see commit 0b3b774 and commit a55fb5f which fixed #6742.

@jgm
Copy link
Owner

jgm commented Dec 26, 2022

It's tough to know how to deal with this context. If we do escape, we'll run into problems like yours from people who want to use TeX commands in these attributes. If we don't, we'll run into problems like #6742. Perhaps we should have solved #6742 by telling the user to backslash escape the _ in their attribute value. Or is that even necessary? (I didn't try running that code, with caption="some_code.c", through LaTeX to see if it compiles.)

@wtbutler
Copy link
Author

With regards to whether the underscore escaping was necessary, it appears that it was, as

\begin{lstlisting}[caption={some_code.c}]
code here
\end{lstlisting}

gives the following output

Package hyperref Warning: Rerun to get /PageLabels entry.

! Missing $ inserted.
<inserted text> 
                $
l.56 \begin{lstlisting}[caption={some_code.c}]
                                              
? 

when compiled with xelatex.
The way that I think about it intuitively, is that if you're adding attributes that will be used by a specific system, then pandoc should give them to that system unaltered. i.e. if you're giving a code block a caption because you know that latex has a caption field, then it should be formatted as though it were directly in the caption field. That's a roundabout way of saying that I think that the user should be escaping the underscore in their attribute value. But I can understand why that user didn't want to do that and didn't expect it to happen that way. This might be feature creep, but would there be a way to have atribute="value" escape characters, but attribute:="value" passes the value literally? Or some way to ensure that pandoc passes that value literally to what is expecting it? (The specific operator syntax isn't necessary obviously, if there's existing syntax for a similar operation, then that works too)

@jgm
Copy link
Owner

jgm commented Dec 26, 2022

Pandoc supports multiple output formats. So, your code block with attribute caption="some_code.c" could be used with LaTeX/listings, but it could also be used with other output formats.

The pandoc types don't currently give us a way to represent the difference you're suggesting between a "passthrough attribute" and a "textual attribute."

@wtbutler
Copy link
Author

Drat. In that case, it would make more sense to me if pandoc didn't modify the fields at all then, because if pandoc escapes then the user loses a lot of freedom in those attributes. If the user has to escape, then all those things are still possible, the user just has to take escaping into account.

@jgm
Copy link
Owner

jgm commented Dec 27, 2022

It goes both ways, though: escaping gives the user the freedom to generate multiple output formats from the same source document.

I see both arguments, and I'm not sure right now what the best solution would be.

@wtbutler
Copy link
Author

I think that would be true if pandoc did more to map specific attribute names to specific output fields (other than startFrom). Adding a caption field only adds a caption to a latex document, at least in my experimentation (mostly with HTML). But because the attribute field name is (or at least seems to be) particular to the output format, I think it makes more sense to assume that the attribute value is going to be parsed by that format as well.

@jgm
Copy link
Owner

jgm commented Dec 27, 2022

because the attribute field name is (or at least seems to be) particular to the output format

I don't think there's anything about a caption attribute that is specifically connected with LaTeX. I can imagine many people making use of this. Even if support for it in other formats isn't built into pandoc, people customize using filters.

@jgm
Copy link
Owner

jgm commented Dec 27, 2022

In general, some escaping needs to be done for attribute values. In HTML/XML formats, for example, all attribute values have & changed to &amp;.

@wtbutler
Copy link
Author

After some more experimentation with the only formats that use attributes (LaTeX, Docx, pptx, Ms, and HTML, at least, according to the docs), latex is indeed the only one that uses the caption field. HTML is the only one that even keeps the data at all, and HTML keeps it only as a data-caption field that doesn't show up at all.

@ptoboley
Copy link

I came across this same issue with trying to control the fontsize in a verbatim block that wrapped:

~~~~ {caption="Example output from tool" basicstyle="\footnotesize\ttfamily"}
...
~~~~

There's no way I can think of to escape this back so that you get the right values in the output. That is, escaping is a one way function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants