Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: markdown extension: native_tables #3154

Closed
ickc opened this issue Oct 8, 2016 · 3 comments
Closed

Feature Request: markdown extension: native_tables #3154

ickc opened this issue Oct 8, 2016 · 3 comments

Comments

@ickc
Copy link
Contributor

ickc commented Oct 8, 2016

How pandoc handles table without table extensions and raw HTML

In test.md:

| test  | -ing  |  
| ----  | ----  |  
| 1 | 2 |  
| 3 | 4 |  

In command line: pandoc -t markdown-simple_tables-multiline_tables-grid_tables-pipe_tables-raw_html -o test-pandoc.md test.md

Output in test-pandoc.md:

<table>
<thead>
<tr class="header">
<th>test</th>
<th>-ing</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>2</td>
</tr>
<tr class="even">
<td>3</td>
<td>4</td>
</tr>
</tbody>
</table>

The interesting result is that even if all table extensions are disabled, and raw HTML not allow, somehow the table still "make it through" in markdown to markdown conversion.

Defying the expectation from this behavior

After realizing this behavior, I was prepared to use native HTML table as the "5th pandoc table extensions", kind of like pandoc native span and div.

But, in command line: pandoc -o test-pandoc-pandoc.md test-pandoc.md will produce an output (test-pandoc-pandoc.md) identical to input (test-pandoc.md). Hence, pandoc seems not to recognize the table is a table anymore. Otherwise it should convert it into 1 of the 4 table extensions.

Running pandoc -o test-pandoc.tex test-pandoc.md confirms the results. The output is not a table:

test

-ing

1

2

3

4

Suggestion

After seeing this behavior, I find it handy if raw HTML table can be used in pandoc as the 5th table extension. If a cell is a block, it seems the only approach to do so currently is through grid_tables. But I don't use emacs, and when I tested grid_tables with unicode character (Chinese in my case), the | won't aligned visually (Should this be regarded as a bug by the way? In the manual it is mentioned monospace font is expected to be used for the alignment, but as far as I know there's no font that match the width of 1 English character to 1 Chinese character. There's actually such English character in Chinese font, but they are not really "English" as in ASCII, but some kind of unicode characters as well.).

Using raw HTML table to write table with cell that is a block seems much easier. But it only make sense if it is treated as a native pandoc table extensions.

As a sidenote, this is kind of similar to issue #3152: turning markdown extensions on and off seems non-trivial in pandoc.

@jgm
Copy link
Owner

jgm commented Oct 10, 2016

Suggestion

After seeing this behavior, I find it handy if raw HTML table can be
used in pandoc as the 5th table extension. If a cell is a block, it
seems the only approach to do so currently is through grid_tables. But
I don't use emacs, and when I tested grid_tables with unicode character
(Chinese in my case), the | won't aligned visually (Should this be
regarded as a bug by the way? In the manual it is mentioned monospace
font is expected to be used for the alignment, but as far as I know
there's no font that match the width of 1 English character to 1
Chinese character. There's actually such English character in Chinese
font, but they are not really "English" as in ASCII, but some kind of
unicode characters as well.).

In Text.Pandoc.Pretty we do take into account widths of east
asian characters, but unfortunately, the table parsing code
doesn't seem to take this into account. You could put up an
issue for this.

Using raw HTML table to write table with cell that is a block seems
much easier. But it only make sense if it is treated as a native pandoc
table extensions.

Yes. The problem is that an HTML table might contain
features that can't be represented in the pandoc table
model (colspan, for example, or a certain style attribute
on a cell, or borders, or headers on the left). Since
someone might have a used the HTML table precisely because
they needed these features, we don't want to use the HTML
reader to translate the table to a pandoc Table, since that
would lose information. I suppose we could try to check for
unsupported features and convert the table if we could.
But this raises similar questions about other things.
Do we convert <i> to a pandoc Emph? Etc.

@ickc
Copy link
Contributor Author

ickc commented Oct 11, 2016

About the table width issue, I am now not so sure if it is a bug or user error. I did some short tests again and the Chinese characters are treated as 2 English characters wide. I might go back to my original document when I got the problem to see if it is really a bug or user error.

Regarding the raw HTML table in Markdown, could a table extension (upon the 4 existing table extensions) be added, html_tables, that basically treat HTML tables in Markdown as yet another native syntax, like pandoc span and div and small cap, so that they are not treated as raw_html and has to be separately toggled on/off. The said new extension might be under Non-pandoc extensions so that only when markdown+html_tables is used the said behavior is triggered.

@ickc
Copy link
Contributor Author

ickc commented Oct 29, 2016

I suppose we could try to check for
unsupported features and convert the table if we could.
But this raises similar questions about other things.
Do we convert <i> to a pandoc Emph? Etc.

To make my suggestion clearer, I suggest an extension similar to native_spans and native_divs: native_tables. Although the syntax used is exactly the same as raw_html, but they are regarded "special". The reason native_tables deserve a spot like native_spans and native_divs but not like <i> is because

  1. As discussed in Using all the features of tables in pandoc - Google Groups, none of the current 4 table extensions support all "features/characteristics" of the internal AST.
  2. The most feature rich table extension, grip_tables is difficult to author and maintained, at least my text editors don't seem to be able to do that, meaning it has to be done manually. In this case typing raw HTML might actually be easier to maintain. However, if it is only treated as raw_html, output to any other formats will be wrong.
  3. I can't currently think of other pandoc markdown syntax that is potentially more convenient to be done in raw_html but not in native pandoc markdown. So native_html would probably not set a precedence on the demand to create other native_* feature to allow raw_html to be parsed in the markdown reader.

@ickc ickc changed the title Interesting behavior when using raw HTML table in markdown Feature Request: markdown extension: native_tables Oct 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants