Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe Tables written by pandoc cannot be written as pipe tables again #3171

Closed
ickc opened this issue Oct 22, 2016 · 13 comments
Closed

Pipe Tables written by pandoc cannot be written as pipe tables again #3171

ickc opened this issue Oct 22, 2016 · 13 comments

Comments

@ickc
Copy link
Contributor

ickc commented Oct 22, 2016

This is a bit related to #3154, but different.

Attached are some MWE:

The tables in test.txt are generated by pandoc. They are both originally in .docx and I use pandoc to convert to pipe_tables by -t markdown-simple_tables-multiline_tables-grid_tables. But if I convert from markdown to markdown again, say, pandoc -f markdown -t markdown-multiline_tables-grid_tables -o test-pandoc.txt test.txt, the tables would becomes HTML table, as in test-pandoc.txt. I tried different combination, but the writer would insist to write either in multiline or grid tables, but not pipe tables (nor simple tables).

I noticed that this behavior seems to happen only when the column width is wide. And looking at the HTML tables generated, it has <col width="33%" /> in it.

@jgm
Copy link
Owner

jgm commented Oct 22, 2016

This is only going to happen when you have pipe tables whose width exceeds the column width (see --columns). In that case, we could produce pipe tables as output, but only when --wrap=none is also selected, otherwise we'd have to create overly long lines.

@ickc
Copy link
Contributor Author

ickc commented Oct 22, 2016

I see. When I opened this issue, I didn't think about the --wrap=none matters but want to produce a MWE only.

Unfortunately, going back to the original code I used when I first discover the issue, it has --wrap=none:

find . -maxdepth 2 -mindepth 2 -iname "*.md" -exec pandoc\
-f markdown+abbreviations+autolink_bare_uris+markdown_attribute+mmd_header_identifiers+mmd_link_attributes+mmd_title_block+tex_math_double_backslash-latex_macros\
-t markdown+raw_tex-native_spans-simple_tables-multiline_tables-grid_tables-latex_macros\
--normalize -s --wrap=none --atx-headers -o {} {} \;

@ickc
Copy link
Contributor Author

ickc commented Oct 24, 2016

Just to make a more minimal working example:

Command used:

pandoc -t markdown-simple_tables-multiline_tables-grid_tables --wrap=none -s -o test.txt test.docx
pandoc -t markdown-simple_tables-multiline_tables-grid_tables --wrap=none -s -o test-2.txt test.txt

test.txt is a pipe table. test-2.txt becomes an HTML table.

So this is a demonstration that the pandoc markdown reader and writer combination is not idempotent.

@jgm
Copy link
Owner

jgm commented Oct 25, 2016

The column widths are added because the content is wider
than columnwidth (see --columns option), so explicit widths
are needed for correct rendering in latex and other formats.

I suppose the Markdown writer could be told that it can
use a (super-wide) pipe table when there are column widths,
in the case where text wrapping isn't set. This would be
a fairly simple change.

@ickc
Copy link
Contributor Author

ickc commented Oct 28, 2016

Great advice:

pandoc -t markdown-simple_tables-multiline_tables-grid_tables --wrap=none --column=999 -s -o test-2.txt test.txt

will now leads to correct result.

However, from the manual, it seems --wrap=none alone would do.

Regarding the solution, I'm thinking may be --wrap=none can be deprecated, and has --column=infinity option instead. And then check if colwidth is infinity, and if so assign it as "infinity".

@jgm
Copy link
Owner

jgm commented Oct 29, 2016

I think it's worth separating these two things (columns and wrap). Infinity doesn't work anyway, since in multiline tables and grid tables we compute column widths relative to --columns; if we divide by infinity, we'll have infinitesimally narrow columns in the output!

You might well want to use multiline tables, or even pipe tables with specified column widths, while not wrapping paragraphs.

@ickc
Copy link
Contributor Author

ickc commented Oct 29, 2016

Ah, I didn't think about relative vs. absolute length. Right!

Sorry, I miss it. Why --column=[bigNumber] is expected in addition to --wrap=none in the test files above?

@OJFord
Copy link

OJFord commented Nov 4, 2016

I've just run into this too.

I suppose the Markdown writer could be told that it can
use a (super-wide) pipe table when there are column widths,
in the case where text wrapping isn't set.

This sounds good. --columns shouldn't apply to tables IMO.

OJFord added a commit to OJFord/final-year-project that referenced this issue Nov 4, 2016
Requires removing `--columns=80` pandoc wrap on linter.

See jgm/pandoc#3171.
OJFord added a commit to OJFord/final-year-project that referenced this issue Nov 4, 2016
Requires removing `--columns=80` pandoc wrap on linter.

See jgm/pandoc#3171.
@jgm
Copy link
Owner

jgm commented Nov 5, 2016

+++ Ollie Ford [Nov 04 16 16:26 ]:

I've just run into this too.

I suppose the Markdown writer could be told that it can
use a (super-wide) pipe table when there are column widths,
in the case where text wrapping isn't set.

This sounds good. --columns shouldn't apply to tables IMO.

Well, we need some way to determine what the relative column
sizes will be in tables which have column widths. If we
used the table itself, then every table would take up the
whole text width, which isn't always desired. So --columns
needs to apply to tables.

@ickc
Copy link
Contributor Author

ickc commented Nov 11, 2016

Playing with the MWE example above again, if I do

$ pandoc -t native test.docx -o test-docx.native
$ pandoc -t native test.txt -o test-md.native -f markdown
$ diff test-docx.native test-md.native
1,2c1,4
< [Table [] [AlignDefault,AlignDefault,AlignDefault] [0.0,0.0,0.0]
<  []
---
> [Table [] [AlignDefault,AlignDefault,AlignDefault] [6.611570247933884e-2,0.5950413223140496,0.33884297520661155]
>  [[]
>  ,[]
>  ,[]]

So the difference between the 1st docx to md conversion and the 2nd md to md conversion is exactly the width. The docx reader applied zero width it.

I think it's worth separating these two things (columns and wrap). Infinity doesn't work anyway, since in multiline tables and grid tables we compute column widths relative to --columns; if we divide by infinity, we'll have infinitesimally narrow columns in the output!

I originally suggested --column=infinity to replace --wrap=none, which is wrong. How about it provides only the first part, i.e., given an option of --column=infinity in addition to numerical value. This way, the "side-effect" will make the widths 0, exactly the one that will be output by the docx reader.

The obvious benefit is then I don't have to specify an arbitrary large number which is ugly and not guarantee to work.

@jgm
Copy link
Owner

jgm commented Nov 13, 2016

+++ ickc [Nov 10 16 18:17 ]:

I originally suggested --column=infinity to replace --wrap=none, which
is wrong. How about it provides only the first part, i.e., given an
option of --column=infinity in addition to numerical value. This way,
the "side-effect" will make the widths 0, exactly the one that will be
output by the docx reader.

It's problematic because in some places we divide by columns.

@ickc
Copy link
Contributor Author

ickc commented Nov 14, 2016

Oh, I see. You mean somewhere else other than tables. But in that case would --column=<huge number> be problematic too? I am not sure where else someone would use --column=<huge number> other than plaintext-based output, and in those cases, would --column=infinity make sense?

If there's no way to remedy the situation. I think the only thing left is to add this the the MANUAL. I can help make a pull request on MANUAL.txt if you want.

@jgm
Copy link
Owner

jgm commented Nov 20, 2016

Manual already says:

If a pipe table contains a row whose printable content is wider than the column width (see --columns), then the cell contents will wrap, with the relative cell widths determined by the widths of the separator lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants