Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit Figure element in Block #3177

Closed
jgm opened this issue Oct 23, 2016 · 65 comments · May be fixed by jgm/pandoc-types#83
Closed

Explicit Figure element in Block #3177

jgm opened this issue Oct 23, 2016 · 65 comments · May be fixed by jgm/pandoc-types#83

Comments

@jgm
Copy link
Owner

jgm commented Oct 23, 2016

Currently we represent figures in the AST using this hack: a figure is an Image whose title attribute starts with fig: and which is by itself in a Para.

Short of a full-featured figure environment in the AST, it would make sense to move to a less hacky representation: a Div with class figure containing the image (which need not have a title starting with fig:). This would involve changes to readers and writers.

Indeed, if we did this, we could support figures containing multiple images, via explicit Divs.

@jgm
Copy link
Owner Author

jgm commented Oct 23, 2016

Another advantage is that attributes could be added explicitly to the Div.

<div class="figure floatRight">
![my image](img.jpg){.imageclass}
</div>

See #3094.

@hrehfeld
Copy link

hrehfeld commented Oct 24, 2016

Seconded, I just spent an hour figuring out why multiple images in a paragraph don't create a figure. Is there any way to create a figure with multiple images right now? (I guess creating Rawblocks will work?)

Relevant code is here? https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/HTML.hs#L450 Why is the match only for one image and not multiple ones in the first place?

@jgm
Copy link
Owner Author

jgm commented Oct 25, 2016

If multiple images were allowed, which one would form the figure's caption? (There is only one caption.) What would determine how the images are arrayed in the figure? In retrospect, some kind of more explicit syntax for figures would have been desirable, and maybe that's the direction we should move in.

@hrehfeld
Copy link

hrehfeld commented Oct 25, 2016

Layout: Hm, I only use html and latex backends, but both of those handle multiple images in a figure in a reasonable way without extra specification. However, in latex IIRC it makes a difference if there is a SoftBreak between images (break vs. no break). IMHO it would be up to the writer/backend to break up figures if multiple images are not supported.

Caption: None unless explicitely stated? That's legal in both html and latex iirc. However I believe Paras do not support extra data like attrs, so that a div or figure node would be easier.

@jgm
Copy link
Owner Author

jgm commented Oct 25, 2016

+++ Hauke Rehfeld [Oct 24 16 16:42 ]:

Seconded, I just spent an hour figuring out why multiple images in a
paragraph don't create a figure. Is there any way to create a figure
right now (I guess creating Rawblocks will work?)

You can paste the images together into one image, I suppose,
or use a filter.

Relevant code is here?
[1]https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Writers/HT
ML.hs#L450 Why is the match only for one image and not multiple ones in
the first place?

Good question. I suppose that since I'm in a field where
pictorial figures aren't used much, I didn't realize how
common figures with multiple images and a single caption
are.

But how would it work to allow a paragraph with multiple
images (and nothing else) to form a figure? From which
image would the figure's caption be taken? What determines
how the images are arranged in the figure (side by side,
in a square, etc.)?

Really we need a more explicit syntax for figures if this
kind of thing is going to be allowed.

@jgm
Copy link
Owner Author

jgm commented Feb 26, 2017

I think we need an explicit Block element for Figure.

@jgm jgm added the AST change label Feb 26, 2017
@mb21
Copy link
Collaborator

mb21 commented Feb 26, 2017

The question is whether this figure element should only contain images, or if it should be a general floating-container more analogous to the LaTeX \begin{figure} and HTML5 figure elements (emphasis added):

Usually a <figure> is an image, illustration, diagram, code snippet, etc., that is referenced in the main flow of a document, but that can be moved to another part of the document or to an appendix without affecting the main flow.

If so, the figure element should contains a caption (multiple paragraphs allowed) and arbitrary block content:

Figure Attr [Block] [Block]

@jgm
Copy link
Owner Author

jgm commented Feb 26, 2017 via email

@jgm jgm added this to the pandoc 2.0 milestone Aug 15, 2017
@jgm
Copy link
Owner Author

jgm commented Aug 20, 2017

Development on figures branch for pandoc, pandoc-types.

@jgm
Copy link
Owner Author

jgm commented Aug 20, 2017

Thinking about about explicit Markdown syntaxes for figures.

If we had a native syntax for divs, we could treat any div with a following caption as a figure:

;--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)
;---
^^  [This is the optional short caption.]
    This is the long caption. It can span multiple blocks. 
    Syntax like footnotes.  

    Subsequent paragraphs indented.  We automatically treat a
    div as a figure if it is followed by a caption.  Or is it
    too confusing if the caption comes outside the div?

Another possibility would be to have a special kind of marking for figures, like:

!--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)

[This is the optional short caption.]
This is the long caption. It can span multiple blocks. 
Syntax like footnotes.  

In this version, all the Para elements at the end of the
structure are treated as the caption,
so we don't need an explicit syntax to mark the caption.
!---

I'm trying to avoid syntaxes that require you to use the word figure.

I like the ^^ syntax for attaching captions; we might want to use this for tables as well (INS: and code blocks) if we use it for this.

@jgm jgm changed the title Better handling of implicit figures Explicit Figure element in Block Aug 20, 2017
@jgm
Copy link
Owner Author

jgm commented Aug 20, 2017

TODO on figures branch on pandoc and pandoc-citeproc

  • Finish updating writers to handle Figure
  • Update readers
    • RST
    • Markdown
    • LaTeX
    • HTML
    • Org/Blocks
    • MediaWiki
  • Get everything to compile
  • Update tests
  • Update ToJSON, FromJSON instance in pandoc-types to include Figure, Caption
  • Update Arbitary in pandoc-types (it lacks Div, Figure, Span, probably others)
  • Markdown syntax, new extension?
  • Figure numbering and internal refs?

@jgm
Copy link
Owner Author

jgm commented Aug 20, 2017

Having second thoughts about the type now. I suspect that allowing ANY kind of block content inside a figure is not going to work well in many output formats. E.g. in docbook, only certain elements are allowed inside a figure element. http://tdg.docbook.org/tdg/4.5/figure.html

Perhaps instead the contents should be limited to a list of images (or perhaps a list of lists of images, so they can be organized on lines? -- though it may be better to let the layout happen automatically, given width information).

Perhaps listings could go in figures as well?

@jgm
Copy link
Owner Author

jgm commented Aug 20, 2017

I think I'm going to remove this from the 2.0 milestone as it still needs more thought.

@jgm jgm removed this from the pandoc 2.0 milestone Aug 20, 2017
@mb21
Copy link
Collaborator

mb21 commented Aug 21, 2017

Having second thoughts about the type now. I suspect that allowing ANY kind of block content inside a figure is not going to work well in many output formats.

I still think taking the most general approach in the AST makes sense. There are always going to be some formats that don't support certain things, but that should be handled by the respective writers and the AST design shouldn't be held up by those. It would be great to have a general block figure element to output to HTML/ePUB/LaTeX...

@mb21
Copy link
Collaborator

mb21 commented Aug 21, 2017

Concerning the caption syntax, I kind of prefer the second one, since it is clearly placed inside the figure/div element. A third variant:

;--- {#foo .right}
![my image](img.jpg){.imageclass}
![second image](img2.jpg)
;--
This is the long caption. It can span multiple blocks. 

Syntax like footnotes.  
;--
This is the optional short caption.
Since it's optional, it needs to go at the end in this syntax.
;---

@jgm
Copy link
Owner Author

jgm commented Aug 22, 2017

What kinds of things do people really put in figures, besides images?

@mb21
Copy link
Collaborator

mb21 commented Aug 23, 2017

Maybe the element I have in mind is more of a Float than a Figure.

Again the MDN extract posted above:

Usually a <figure> is an image, illustration, diagram, code snippet, etc., that is referenced in the main flow of a document, but that can be moved to another part of the document or to an appendix without affecting the main flow.

And from Wikibooks LaTeX/Floats:

Floats are containers for things in a document that cannot be broken over a page. LaTeX by default recognizes "table" and "figure" floats, but you can define new ones of your own (see Custom floats below). Floats are there to deal with the problem of the object that won't fit on the present page, and to help when you really don't want the object here just now.

Floats are not part of the normal stream of text, but separate entities, positioned in a part of the page to themselves (top, middle, bottom, left, right, or wherever the designer specifies). They always have a caption describing them and they are always numbered so they can be referred to from elsewhere in the text.

Usually it's tables and images that are floated, but it could also be source code, a poem, some sort of aside box etc. Even Docbook has a sidebar element.

Maybe the table AST element shouldn't have a caption, only the Figure element should have a caption. Current markdown table syntax with captions would be converted to Figure attr caption [Table a]. With the attr specifying whether the figure should float or be at that fixed position in the text, plus whether to list it in the list of figures/list of tables etc.

Summarizing, a Figure is an element that:

I think it would be great to have the figure type in the AST for pandoc 2.0. Writing the code for the writers and reference generators etc. can be done later.

@jgm
Copy link
Owner Author

jgm commented Aug 23, 2017 via email

@mb21
Copy link
Collaborator

mb21 commented Aug 26, 2017

I'm leaning currently towards the second option (general float/caption container). Use cases include floating more than just images (e.g. float two tables that share a caption), or having one figure with a caption, that contains subfigures (or images) with each having a caption, e.g:

It's probably true that it gets a bit trickier to consider all cases in all writers, but it is a more flexible option.

@despresc
Copy link
Contributor

despresc commented Sep 7, 2020

The HTML writer currently has to deal with the fact that HTML headings only go up to h6. Its fallback when encountering a Header past 6 is to render it as a paragraph with the heading class.

So, eventually, the writers that have depth-limited figures and tables could keep track of the current figure depth. If they encounter too-deep nesting, they could convert the Figure to a Div containing its body and a Div caption (with appropriate classes), then attempt to render that. Otherwise they would render figures (and tables and galleries) however they're supported in the output.

Initially, of course, every writer would need to fallback in this way, except for figures with [Table...] and [Plain [Image...]] content, which would be rendered as tables and figures currently are. Then better support (for figures, subfigures, subtables, and galleries) could be added to the relevant outputs.

@tarleb
Copy link
Collaborator

tarleb commented May 6, 2021

See #6782 for important info on accessibility.

@tarleb
Copy link
Collaborator

tarleb commented May 6, 2021

Noting that #5994 depends on this.

@tarleb
Copy link
Collaborator

tarleb commented Mar 22, 2023

This was done in pandoc 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.