Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option --include-before-body does not work with docx writer #4720

Closed
rpuntaie opened this issue Jun 20, 2018 · 6 comments
Closed

Option --include-before-body does not work with docx writer #4720

rpuntaie opened this issue Jun 20, 2018 · 6 comments

Comments

@rpuntaie
Copy link

Pandoc 2.1.3

pandoc -f rst -t docx --include-before-body foreword.docx --toc --reference-doc ref.docx tst.rst -o tst.docx

This doesn't seem to work with the docx writer.

Reference.docx (pandoc --print-default-data-file reference.docx) does not contain inclusion of variable include-before? Are variables supported at all for docx?

@mb21
Copy link
Collaborator

mb21 commented Jun 20, 2018

The docx writer doesn't use a template, so template variables won't work. You'll have to customize the reference.docx instead, see the MANAUL...

@mb21 mb21 closed this as completed Jun 20, 2018
@W1M0R
Copy link

W1M0R commented Dec 9, 2020

My understanding of reference.docx is that it can only modify style, and not add content.

In my mind, there is code somewhere that checks whether the toc variable is present, and if so, it generates the table of contents (see #4645, especially #4645 (comment)). So why not code that can check for the include-before variable as specified here: https://pandoc.org/MANUAL.html#variables-set-automatically

Setting the --include-before-body flag to a file of format --from, e.g. markdown, should then be processed by the docx writer and then placed before the table of contents.

@jgm Would the architecture of the docx writer be able to support this scenario? Is @mb21 still correct in stating that this is not possible?

EDIT: I understand now that the content of include-before is included verbatim (https://pandoc.org/MANUAL.html#option--include-before-body) and that the --from format is not used. And since docx works with binary data, it will probably not be possible to include binary data verbatim. Perhaps it is possible to use OpenXML?

@jgm
Copy link
Owner

jgm commented Dec 9, 2020

You are right, it might be possible to support --include-before-body in this way. However, the contents of the file specified by --include-before-body are included literally. So, if we did this, you'd have to write valid openxml, and moreoever openxml that will be valid at the beginning of the docx body. I predict that lots of people would try to use this feature and fail, leading to many questions and bug reports we'd then have to deal with.

Perhaps we could allow the argumnet of --include-before-body to be another docx file, and extract the openxml content from it. But there are all kinds of complexities here: the openxml may refer to other parts of the docx container, and in general you can't just move content like that unless it's quite simple. So again we'd be looking at problems any time people did anything complex.

The best solution would be adding a templating feature to reference.docx, so you can include your boilerplate content there. I know that this has been discussed before, but I couldn't track down the discussion. Maybe you can find it.

@W1M0R
Copy link

W1M0R commented Dec 9, 2020

Thanks for clearing that up for me @jgm. Perhaps you are referring to the discussion around issue #1612. The problem I want to resolve is in fact described by that issue, i.e. to determine the placement of the TOC, or more specifically, to introduce my own content before the TOC.

If --include-before-body is not a viable option for docx, then maybe there can be some other way, maybe a type of concat operation that pandoc can perform to join two documents of the same type. There are tools available for docx and pdf files, and probably for other binary formats as well, but it would be great to have this functionality in pandoc itself. Then one can add custom content in a separate docx and join it to the docx that pandoc generates. Such a concat would then probably have to bypass pandoc's native or internal representation (to avoid loss of formatting etc).

Either way, can we open this issue again, and perhaps mark it as an enhancement or feature request? (#4720)

Additionally, should I create a new issue to propose a concat feature as described above, or is it outside the scope of what pandoc is intended for?

EDIT: I found this filter, which probably satisfies my concat idea: https://github.com/pandoc/lua-filters/tree/master/include-files

EDIT: The include-files filter will not be sufficient, since the TOC would still be inserted at the front of the document.

EDIT: A python package that can concat docx files can be found here, which may serve as an implementation reference: https://pypi.org/project/docxcompose/

@aditivin
Copy link

Hi @W1M0R were you able to find a way to achieve this?

@W1M0R
Copy link

W1M0R commented May 18, 2024

Hi @aditivin. Yes. I generate the document as usual using pandoc, e.g. body.docx. This generated document includes the toc. Then I generate another document using pandoc, e.g. cover.docx. This document typically uses a different reference.docx file and no toc. Then I have a manual build step that concatenates the two files.

The concatenation step can be achieved using:

  1. https://github.com/4teamwork/docxcompose
  2. https://github.com/unidoc/unioffice-examples/blob/master/document/merge-documents/main.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants