Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create a table of contents (document outline)? #127

Closed
Johann-S opened this issue Jun 15, 2019 · 20 comments
Closed

How to create a table of contents (document outline)? #127

Johann-S opened this issue Jun 15, 2019 · 20 comments

Comments

@Johann-S
Copy link

Hi @Hopding,

Again thanks for your awesome lib 👍

Do you think it's possible to create a table of contents in a PDF with your libs ? If it's possible I would love to see how to do that.

Thanks 👍

BTW you should add a way to support your work 😉

@Hopding
Copy link
Owner

Hopding commented Jun 16, 2019

Hello @Johann-S!

I'm not sure what precisely you mean by "table of contents". You can certainly write out some text on a page to outline the contents of your document. Are you wanting to do something more than that?

@Johann-S
Copy link
Author

Yep some PDF have a table of contents inside them, but not on a separate page, sometimes it's called signets too

@Hopding
Copy link
Owner

Hopding commented Jun 16, 2019

You mean like this?
Screen Shot 2019-06-16 at 7 22 26 AM

@Johann-S
Copy link
Author

Yep exactly !

@Hopding Hopding changed the title [Question] Table of contents How to create a table of contents (document outline)? Jun 16, 2019
@Hopding
Copy link
Owner

Hopding commented Jun 16, 2019

Yes, this is definitely possible to do. As with page links, pdf-lib doesn't have a high level API for it yet. So it requires a bit of lower-level code. But it works all the same!

I created an example script to demonstrate how to do it. Here's the resulting PDF, along with a screenshot previewing the outline panel: with_outline.pdf
Screen Shot 2019-06-16 at 7 52 26 AM

And here's the script itself:

// ...imports omitted...

const PAGE_WIDTH = 500;
const PAGE_HEIGHT = 750;

const getPageRefs = (pdfDoc) => {
  const refs = [];
  pdfDoc.catalog.Pages.traverse((kid, ref) => {
    if (kid instanceof PDFPage) refs.push(ref);
  });
  return refs;
};

const createOutlineItem = (pdfDoc, title, parent, nextOrPrev, page, isLast = false) =>
  PDFDictionary.from(
    {
      Title: PDFString.fromString(title),
      Parent: parent,
      [isLast ? 'Prev' : 'Next']: nextOrPrev,
      Dest: PDFArray.fromArray(
        [
          page,
          PDFName.from('XYZ'),
          PDFNull.instance,
          PDFNull.instance,
          PDFNull.instance,
        ],
        pdfDoc.index,
      ),
    },
    pdfDoc.index,
  );

const pdfDoc = PDFDocumentFactory.create();

const [fontRef, font] = pdfDoc.embedStandardFont(StandardFonts.Helvetica);

const contentStream1 = pdfDoc.register(
  pdfDoc.createContentStream(
    drawText(font.encodeText('PAGE 1'), {
      font: 'Helvetica',
      size: 50,
      x: 175,
      y: PAGE_HEIGHT - 100,
    }),
  ),
);
const contentStream2 = pdfDoc.register(
  pdfDoc.createContentStream(
    drawText(font.encodeText('PAGE 2'), {
      font: 'Helvetica',
      size: 50,
      x: 175,
      y: PAGE_HEIGHT - 100,
    }),
  ),
);
const contentStream3 = pdfDoc.register(
  pdfDoc.createContentStream(
    drawText(font.encodeText('PAGE 3'), {
      font: 'Helvetica',
      size: 50,
      x: 175,
      y: PAGE_HEIGHT - 100,
    }),
  ),
);

const page1 = pdfDoc
  .createPage([PAGE_WIDTH, PAGE_HEIGHT])
  .addFontDictionary('Helvetica', fontRef)
  .addContentStreams(contentStream1);
const page2 = pdfDoc
  .createPage([PAGE_WIDTH, PAGE_HEIGHT])
  .addFontDictionary('Helvetica', fontRef)
  .addContentStreams(contentStream2);
const page3 = pdfDoc
  .createPage([PAGE_WIDTH, PAGE_HEIGHT])
  .addFontDictionary('Helvetica', fontRef)
  .addContentStreams(contentStream3);

pdfDoc.addPage(page1);
pdfDoc.addPage(page2);
pdfDoc.addPage(page3);

const pageRefs = getPageRefs(pdfDoc);

const outlinesDictRef = pdfDoc.index.nextObjectNumber();
const outlineItem1Ref = pdfDoc.index.nextObjectNumber();
const outlineItem2Ref = pdfDoc.index.nextObjectNumber();
const outlineItem3Ref = pdfDoc.index.nextObjectNumber();

const outlineItem1 = createOutlineItem(
  pdfDoc,
  'Page 1',
  outlinesDictRef,
  outlineItem2Ref,
  pageRefs[0],
);

const outlineItem2 = createOutlineItem(
  pdfDoc,
  'Page 2',
  outlinesDictRef,
  outlineItem3Ref,
  pageRefs[1],
);

const outlineItem3 = createOutlineItem(
  pdfDoc,
  'Page 3',
  outlinesDictRef,
  outlineItem2Ref,
  pageRefs[2],
  true,
);

const outlinesDict = PDFDictionary.from(
  {
    Type: PDFName.from('Outlines'),
    First: outlineItem1Ref,
    Last: outlineItem3Ref,
    Count: PDFNumber.fromNumber(3),
  },
  pdfDoc.index,
);

pdfDoc.index.assign(outlinesDictRef, outlinesDict);
pdfDoc.index.assign(outlineItem1Ref, outlineItem1);
pdfDoc.index.assign(outlineItem2Ref, outlineItem2);
pdfDoc.index.assign(outlineItem3Ref, outlineItem3);

pdfDoc.catalog.set('Outlines', outlinesDictRef);

const pdfBytes = PDFDocumentWriter.saveToBytes(pdfDoc);

fs.writeFileSync('./with_outline.pdf', pdfBytes);

This is, of course, a very simple document outline without any nesting. If you'd like to create something more complex, with multiple nested levels, you can certainly do so. However, I'll refer you to section 12.3.3 Document Outline and annex H.6 Outline Hierarchy Example of the PDF specification for the details.

I hope this helps. Please let me know if you have any additional questions!

@Hopding Hopding closed this as completed Jun 16, 2019
@Johann-S
Copy link
Author

Thanks @Hopding you're a PDF expert 👍

Ablu pushed a commit to Ablu/asciidoctor-pdf.js that referenced this issue Oct 25, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
Ablu pushed a commit to Ablu/asciidoctor-pdf.js that referenced this issue Oct 25, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
Ablu pushed a commit to Ablu/asciidoctor-pdf.js that referenced this issue Oct 28, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
Ablu pushed a commit to Ablu/asciidoctor-pdf.js that referenced this issue Oct 28, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
Ablu pushed a commit to Ablu/asciidoctor-pdf.js that referenced this issue Oct 28, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
Ablu pushed a commit to Ablu/asciidoctor-pdf.js that referenced this issue Oct 28, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
ggrossetie pushed a commit to ggrossetie/asciidoctor-web-pdf that referenced this issue Oct 28, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
ggrossetie pushed a commit to Ablu/asciidoctor-pdf.js that referenced this issue Oct 28, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
ggrossetie pushed a commit to ggrossetie/asciidoctor-web-pdf that referenced this issue Oct 29, 2019
Ideally this would be handled by Chrome during printing. However,
https://bugs.chromium.org/p/chromium/issues/detail?id=840455 is not
implemented yet (and cannot rely on metadata extracted from the asciidoc
format directly).

Therefore this implements it by introducing some kind of post-processing
using `pdf-lib`. If the `:toc:` property is set, the `.adoc` file is
scanned for sections (respecting the `:toclevels:` attribute) and an
outline is generated.

This only works if a ToC is also generated within the document (or
better: links exist to each section), because otherwise Chrome would not
generate the necessary `Dests` fields within the PDF.

Unfortunately, Chrome also has some bugs regarding Umlaute in anchors,
leading to omission of the relevant `Dests` fields. Therefore a warning
is printed if a anchor cannot be located in the `Dests` field of the PDF
catalog.

Based upon
https://gitlab.pagedmedia.org/tools/pagedjs-cli/commit/df0d10bd1bb12d1e2077323e1fb5eec75da35a1e
which itself is based on @Dopding's comment at:
Hopding/pdf-lib#127 (comment)
idreyn pushed a commit to pubpub/pagedjs-cli that referenced this issue Nov 18, 2019
--outline-tags allows to specify the HTML tags which should be
considered for the outline. The tags are expected to be given in
order of hierachy, for example, 'h1,h2' will trigger a generation
with h1 elements as top level outline entries and h2 as their
childs.

Ideally this would not be required if Chromium would add
this directly. So if these bugs are closed this can probably be
removed again:
- https://bugs.chromium.org/p/chromium/issues/detail?id=840455
- puppeteer/puppeteer#1778

This code is heavily based on @Hopding's comment at:
Hopding/pdf-lib#127 (comment)
@jackwshepherd
Copy link

Hi @Hopding - seems to me this example might be based on an older version of pdf-lib? Could you let me know what changes I should think about for using with the latest version?

@feodormak
Copy link

Hi @Hopding - seems to me this example might be based on an older version of pdf-lib? Could you let me know what changes I should think about for using with the latest version?

@jackwshepherd, I was facing the same problem, but this JS lib still seems to be the best for my needs. Did some deep inspecting of the current code and compared it to the older versions and managed to update @Hopding's solution.

Do pardon me if my code is inefficient as I'm still quite new to JS and I'm writing code to be run in Electron.

const { PDFDocument, PDFPageLeaf, PDFDict, PDFString, PDFArray, PDFName, PDFNull, PDFNumber, } = require("pdf-lib");
const fs = require("fs");

async function creatOutlines() {
const doc = await PDFDocument.load(
    fs.readFileSync("##YOUR CURRENT FILE NAME##")
);

const getPageRefs = (pdfDoc) => {
    const refs = [];
    pdfDoc.catalog.Pages().traverse((kid, ref) => {
    if (kid instanceof PDFPageLeaf) refs.push(ref);
    });
    return refs;
};
//(PDFDocument, string, PDFRef, PDFRef, PDFRef, boolean)
const createOutlineItem = (pdfDoc, title, parent, nextOrPrev, page, isLast = false) => {
    let array = PDFArray.withContext(pdfDoc.context);
    array.push(page);
    array.push(PDFName.of("XYZ"));
    array.push(PDFNull);
    array.push(PDFNull);
    array.push(PDFNull);
    const map = new Map();
    map.set(PDFName.Title, PDFString.of(title));
    map.set(PDFName.Parent, parent);
    map.set(PDFName.of(isLast ? "Prev" : "Next"), nextOrPrev);
    map.set(PDFName.of("Dest"), array);

    return PDFDict.fromMapWithContext(map, pdfDoc.context);
}

const pageRefs = getPageRefs(doc);

const outlinesDictRef = doc.context.nextRef(); 
const outlineItem1Ref = doc.context.nextRef();
const outlineItem2Ref = doc.context.nextRef();
const outlineItem3Ref = doc.context.nextRef();

const outlineItem1 = createOutlineItem(
    doc,
    "Page 1",
    outlinesDictRef,
    outlineItem2Ref,
    pageRefs[0]
);

const outlineItem2 = createOutlineItem(
    doc,
    "Page 2",
    outlinesDictRef,
    outlineItem3Ref,
    pageRefs[1]
);

const outlineItem3 = createOutlineItem(
    doc,
    "Page 3",
    outlinesDictRef,
    outlineItem2Ref,
    pageRefs[2],
    true
);

const outlinesDictMap = new Map();
outlinesDictMap.set(PDFName.Type, PDFName.of("Outlines"));
outlinesDictMap.set(PDFName.of("First"), outlineItem1Ref);
outlinesDictMap.set(PDFName.of("Last"), outlineItem3Ref);
outlinesDictMap.set(PDFName.of("Count"), PDFNumber.of(3)); //This is a count of the number of outline items. Should be changed for X no. of outlines

//Pointing the "Outlines" property of the PDF's "Catalog" to the first object of your outlines
doc.catalog.set(PDFName.of("Outlines"),outlinesDictRef)

const outlinesDict = PDFDict.fromMapWithContext(outlinesDictMap, doc.context);

//First 'Outline' object. Refer to table H.3 in Annex H.6 of PDF Specification doc.
doc.context.assign(outlinesDictRef, outlinesDict);

//Actual outline items that will be displayed
doc.context.assign(outlineItem1Ref, outlineItem1);
doc.context.assign(outlineItem2Ref, outlineItem2);
doc.context.assign(outlineItem3Ref, outlineItem3);

const file = await doc.save();

fs.writeFileSync("##YOUR DESTINATION FILE NAME##", file);
}

creatOutlines();

It is a lot of work for 3 outlines. I will be working on nested outlines and I'd need that. Happy to share with anyone that might need it when I'm done with that.

@Resurg3nt
Copy link

Hi @Hopding - seems to me this example might be based on an older version of pdf-lib? Could you let me know what changes I should think about for using with the latest version?

@jackwshepherd, I was facing the same problem, but this JS lib still seems to be the best for my needs. Did some deep inspecting of the current code and compared it to the older versions and managed to update @Hopding's solution.

Do pardon me if my code is inefficient as I'm still quite new to JS and I'm writing code to be run in Electron.

const { PDFDocument, PDFPageLeaf, PDFDict, PDFString, PDFArray, PDFName, PDFNull, PDFNumber, } = require("pdf-lib");
const fs = require("fs");

async function creatOutlines() {
const doc = await PDFDocument.load(
    fs.readFileSync("##YOUR CURRENT FILE NAME##")
);

const getPageRefs = (pdfDoc) => {
    const refs = [];
    pdfDoc.catalog.Pages().traverse((kid, ref) => {
    if (kid instanceof PDFPageLeaf) refs.push(ref);
    });
    return refs;
};
//(PDFDocument, string, PDFRef, PDFRef, PDFRef, boolean)
const createOutlineItem = (pdfDoc, title, parent, nextOrPrev, page, isLast = false) => {
    let array = PDFArray.withContext(pdfDoc.context);
    array.push(page);
    array.push(PDFName.of("XYZ"));
    array.push(PDFNull);
    array.push(PDFNull);
    array.push(PDFNull);
    const map = new Map();
    map.set(PDFName.Title, PDFString.of(title));
    map.set(PDFName.Parent, parent);
    map.set(PDFName.of(isLast ? "Prev" : "Next"), nextOrPrev);
    map.set(PDFName.of("Dest"), array);

    return PDFDict.fromMapWithContext(map, pdfDoc.context);
}

const pageRefs = getPageRefs(doc);

const outlinesDictRef = doc.context.nextRef(); 
const outlineItem1Ref = doc.context.nextRef();
const outlineItem2Ref = doc.context.nextRef();
const outlineItem3Ref = doc.context.nextRef();

const outlineItem1 = createOutlineItem(
    doc,
    "Page 1",
    outlinesDictRef,
    outlineItem2Ref,
    pageRefs[0]
);

const outlineItem2 = createOutlineItem(
    doc,
    "Page 2",
    outlinesDictRef,
    outlineItem3Ref,
    pageRefs[1]
);

const outlineItem3 = createOutlineItem(
    doc,
    "Page 3",
    outlinesDictRef,
    outlineItem2Ref,
    pageRefs[2],
    true
);

const outlinesDictMap = new Map();
outlinesDictMap.set(PDFName.Type, PDFName.of("Outlines"));
outlinesDictMap.set(PDFName.of("First"), outlineItem1Ref);
outlinesDictMap.set(PDFName.of("Last"), outlineItem3Ref);
outlinesDictMap.set(PDFName.of("Count"), PDFNumber.of(3)); //This is a count of the number of outline items. Should be changed for X no. of outlines

//Pointing the "Outlines" property of the PDF's "Catalog" to the first object of your outlines
doc.catalog.set(PDFName.of("Outlines"),outlinesDictRef)

const outlinesDict = PDFDict.fromMapWithContext(outlinesDictMap, doc.context);

//First 'Outline' object. Refer to table H.3 in Annex H.6 of PDF Specification doc.
doc.context.assign(outlinesDictRef, outlinesDict);

//Actual outline items that will be displayed
doc.context.assign(outlineItem1Ref, outlineItem1);
doc.context.assign(outlineItem2Ref, outlineItem2);
doc.context.assign(outlineItem3Ref, outlineItem3);

const file = await doc.save();

fs.writeFileSync("##YOUR DESTINATION FILE NAME##", file);
}

creatOutlines();

It is a lot of work for 3 outlines. I will be working on nested outlines and I'd need that. Happy to share with anyone that might need it when I'm done with that.

I tried adopting your code and allowing a merge of n numbers of PDFs, with the option of adding a bookmark for each PDF with a specified name passed through an argument to command line. Code is as follows. Two bookmarks only are added (out of the expected 4 in the demo I was running), and they both had the same title. Any ideas where i've gone wrong?

const { PDFDocument, PDFPageLeaf, PDFDict, PDFString, PDFArray, PDFName, PDFNull, PDFNumber, StandardFonts, rgb } = require('pdf-lib');
const fs = require('fs');
const parameters = require('minimist')(process.argv.slice(2));

var mergeFiles = parameters["a"];
var mergeFLength = mergeFiles.length;
var bookMarkDescs = parameters["ab"];
var buffers = [];
var bookmarkPages = [];
var outlineItemArr = [];
var outlineItemRefsArr = [];
var pageCount = 0;

const getPageRefs = (pdfDoc) => {
    const refs = [];
    pdfDoc.catalog.Pages().traverse((kid, ref) => {
    if (kid instanceof PDFPageLeaf) refs.push(ref);
    });
    return refs;
};

const createOutlineItem = (pdfDoc, title, parent, nextOrPrev, page, isLast = false) => {
    let array = PDFArray.withContext(pdfDoc.context);
    array.push(page);
    array.push(PDFName.of("XYZ"));
    array.push(PDFNull);
    array.push(PDFNull);
    array.push(PDFNull);
    const map = new Map();
    map.set(PDFName.Title, PDFString.of(title));
    map.set(PDFName.Parent, parent);
    map.set(PDFName.of(isLast ? "Prev" : "Next"), nextOrPrev);
    map.set(PDFName.of("Dest"), array);

    return PDFDict.fromMapWithContext(map, pdfDoc.context);
}

for (var i = 0; i < mergeFLength; i++){	
	var fileBuffer = fs.readFileSync(mergeFiles[i]);
	buffers.push(fileBuffer);
}

if(buffers.length != 0){
	var output_file = mergePDFDocuments(buffers);
}

async function mergePDFDocuments(documents) {
	const mergedPdf = await PDFDocument.create();

	for (let document of documents) {
		document = await PDFDocument.load(document);
		if(bookmarkPages.length == 0){
			bookmarkPages.push(0);
			pageCount = pageCount + document.getPageCount();
			bookmarkPages.push(pageCount);			
		} else {
			pageCount = pageCount + document.getPageCount();
			bookmarkPages.push(pageCount);
		}
		const copiedPages = await mergedPdf.copyPages(document, document.getPageIndices());
		copiedPages.forEach((page) => mergedPdf.addPage(page));    
	}
	bookmarkPages.pop();
	
	var references = getPageRefs(mergedPdf);	
	
	const outlinesDictRef = mergedPdf.context.nextRef(); 
	
	for (var y = 0; y < bookmarkPages.length; y++){	
		var pdf_reference = references[bookmarkPages[y]];
		var bookmark_title = bookMarkDescs[y];
		var outlineItemRef = mergedPdf.context.nextRef();
		outlineItemRefsArr.push(outlineItemRef);
		if(bookmarkPages[bookmarkPages.length - 1] === bookmarkPages[y]){
			console.log("last value");
			var outlineItem_last = createOutlineItem(mergedPdf, bookmark_title, outlinesDictRef, outlineItemRef, pdf_reference, true);	
			outlineItemArr.push(outlineItem_last);			
		} else {
			var outlineItem = createOutlineItem(mergedPdf, bookmark_title, outlinesDictRef, outlineItemRef, pdf_reference);		
			outlineItemArr.push(outlineItem);
		}
	}
	
	var outlinesDictMap = new Map();
	var countOfBookmarks = outlineItemArr.length;
	var firstBookmark = outlineItemArr[0];
	var lastBookmark = outlineItemArr[countOfBookmarks-1];
	
	outlinesDictMap.set(PDFName.Type, PDFName.of("Outlines"));
	outlinesDictMap.set(PDFName.of("First"), firstBookmark);
	outlinesDictMap.set(PDFName.of("Last"), lastBookmark);
	outlinesDictMap.set(PDFName.of("Count"), PDFNumber.of(countOfBookmarks));	
	
	for (var z = 0; z < bookmarkPages.length; z++){	
		mergedPdf.context.assign(outlineItemRefsArr[z], outlineItemArr[z]);
	}		

	mergedPdf.catalog.set(PDFName.of("Outlines"),outlinesDictRef);
	
	const outlinesDict = PDFDict.fromMapWithContext(outlinesDictMap, mergedPdf.context);

	mergedPdf.context.assign(outlinesDictRef, outlinesDict);	

	fs.writeFileSync('output_merge.pdf', await mergedPdf.save());
}

@lillallol
Copy link

@Resurg3nt @feodormak @jackwshepherd @Hopding @Johann-S
I created a module that has a high level api and uses pdf-lib to add outline to outline-less pdfs. You can find it here.

@zql365747776
Copy link

Hi @Hopding - seems to me this example might be based on an older version of pdf-lib? Could you let me know what changes I should think about for using with the latest version?

@jackwshepherd, I was facing the same problem, but this JS lib still seems to be the best for my needs. Did some deep inspecting of the current code and compared it to the older versions and managed to update @Hopding's solution.

Do pardon me if my code is inefficient as I'm still quite new to JS and I'm writing code to be run in Electron.

const { PDFDocument, PDFPageLeaf, PDFDict, PDFString, PDFArray, PDFName, PDFNull, PDFNumber, } = require("pdf-lib");
const fs = require("fs");

async function creatOutlines() {
const doc = await PDFDocument.load(
    fs.readFileSync("##YOUR CURRENT FILE NAME##")
);

const getPageRefs = (pdfDoc) => {
    const refs = [];
    pdfDoc.catalog.Pages().traverse((kid, ref) => {
    if (kid instanceof PDFPageLeaf) refs.push(ref);
    });
    return refs;
};
//(PDFDocument, string, PDFRef, PDFRef, PDFRef, boolean)
const createOutlineItem = (pdfDoc, title, parent, nextOrPrev, page, isLast = false) => {
    let array = PDFArray.withContext(pdfDoc.context);
    array.push(page);
    array.push(PDFName.of("XYZ"));
    array.push(PDFNull);
    array.push(PDFNull);
    array.push(PDFNull);
    const map = new Map();
    map.set(PDFName.Title, PDFString.of(title));
    map.set(PDFName.Parent, parent);
    map.set(PDFName.of(isLast ? "Prev" : "Next"), nextOrPrev);
    map.set(PDFName.of("Dest"), array);

    return PDFDict.fromMapWithContext(map, pdfDoc.context);
}

const pageRefs = getPageRefs(doc);

const outlinesDictRef = doc.context.nextRef(); 
const outlineItem1Ref = doc.context.nextRef();
const outlineItem2Ref = doc.context.nextRef();
const outlineItem3Ref = doc.context.nextRef();

const outlineItem1 = createOutlineItem(
    doc,
    "Page 1",
    outlinesDictRef,
    outlineItem2Ref,
    pageRefs[0]
);

const outlineItem2 = createOutlineItem(
    doc,
    "Page 2",
    outlinesDictRef,
    outlineItem3Ref,
    pageRefs[1]
);

const outlineItem3 = createOutlineItem(
    doc,
    "Page 3",
    outlinesDictRef,
    outlineItem2Ref,
    pageRefs[2],
    true
);

const outlinesDictMap = new Map();
outlinesDictMap.set(PDFName.Type, PDFName.of("Outlines"));
outlinesDictMap.set(PDFName.of("First"), outlineItem1Ref);
outlinesDictMap.set(PDFName.of("Last"), outlineItem3Ref);
outlinesDictMap.set(PDFName.of("Count"), PDFNumber.of(3)); //This is a count of the number of outline items. Should be changed for X no. of outlines

//Pointing the "Outlines" property of the PDF's "Catalog" to the first object of your outlines
doc.catalog.set(PDFName.of("Outlines"),outlinesDictRef)

const outlinesDict = PDFDict.fromMapWithContext(outlinesDictMap, doc.context);

//First 'Outline' object. Refer to table H.3 in Annex H.6 of PDF Specification doc.
doc.context.assign(outlinesDictRef, outlinesDict);

//Actual outline items that will be displayed
doc.context.assign(outlineItem1Ref, outlineItem1);
doc.context.assign(outlineItem2Ref, outlineItem2);
doc.context.assign(outlineItem3Ref, outlineItem3);

const file = await doc.save();

fs.writeFileSync("##YOUR DESTINATION FILE NAME##", file);
}

creatOutlines();

It is a lot of work for 3 outlines. I will be working on nested outlines and I'd need that. Happy to share with anyone that might need it when I'm done with that.

array.push(PDFName.of("XYZ"));
I want to set the position of link XYZ , like x=0 y= 3 z=0 what should i do?

@avinashsingh953
Copy link

for (var y = 0; y < bookmarkPages.length; y++){
var pdf_reference = references[bookmarkPages[y]];
var bookmark_title = bookMarkDescs[y];
var outlineItemRef = mergedPdf.context.nextRef();
outlineItemRefsArr.push(outlineItemRef);
if(bookmarkPages[bookmarkPages.length - 1] === bookmarkPages[y]){
console.log("last value");
var outlineItem_last = createOutlineItem(mergedPdf, bookmark_title, outlinesDictRef, outlineItemRef, pdf_reference, true);
outlineItemArr.push(outlineItem_last);
} else {
var outlineItem = createOutlineItem(mergedPdf, bookmark_title, outlinesDictRef, outlineItemRef, pdf_reference);
outlineItemArr.push(outlineItem);
}
}

You are not doing this right. I had the same issue, You are not mapping the item references correctly.

Check my implementation for the above lines of code. Hope this helps.

const outlinesDictRef = mergedPdf.context.nextRef();
let outlineItemRef = mergedPdf.context.nextRef();
let outlineNextItemRef = mergedPdf.context.nextRef();
let outlinePrevItemRef = Object.assign({},outlinesDictRef);
pageIndexes = pageIndexes.map((p, i) => {

                let result =  {
                    ...p, 
                    outlineItem: createOutlineItem(
                        mergedPdf,  
                        p.name,
                        outlinesDictRef,
                        outlineNextItemRef,
                        references[p.pageNumber],
                        i === (pageIndexes.length - 1)
                    ),
                    outlineItemRef: outlineItemRef,
                    isLast: i === (pageIndexes.length - 1)
                }
                if(i === (pageIndexes.length - 1)) // last page
                {
                    outlineItemRef = Object.assign({},outlineNextItemRef);
                    outlineNextItemRef = Object.assign({},outlinePrevItemRef);
                }
                else
                {
                    outlinePrevItemRef = Object.assign({},outlineItemRef);
                    outlineItemRef = Object.assign({},outlineNextItemRef);
                    outlineNextItemRef =  mergedPdf.context.nextRef();
                    
                }
                
                return result;
            })

@devnoname120
Copy link

See a better implementation here:
https://github.com/marp-team/marp-cli/blob/9e0eff5f9d9530577458e93769cd2b0000958a7d/src/utils/pdf.ts

The function you are looking for is setOutline with the following prototype:

async function setOutline(doc: PDFDocument, outlines: readonly PDFOutline[])

@yekaiLiu2022
Copy link

yekaiLiu2022 commented Aug 11, 2023

See a better implementation here: https://github.com/marp-team/marp-cli/blob/9e0eff5f9d9530577458e93769cd2b0000958a7d/src/utils/pdf.ts

The function you are looking for is setOutline with the following prototype:

async function setOutline(doc: PDFDocument, outlines: readonly PDFOutline[])

Hi thanks for your good implementation! I tried to use your pdf.ts and setOutline to assign some bookmarks into my pdf, the bookmarks link work well to be able to be clicked and jump to the right pages, but the only issue is that the bookmark titles all seem to be totally invisible like transparent in my adobe pdf reviewer(i tried different reviewers,wps or chrome browser, and none of them can show the textual title for any bookmark that has been added in the pdf..), below is my code, I was wondering could you please help me find the reason of the issue, thanks!

import { PDFDocument, PDFRef, rgb } from 'pdf-lib';
import {setOutline, PDFOutline} from './pdf';
async addPageNumbersAndContentsIndexToPDF1() {

const { PDFDocument, rgb } = require('pdf-lib');
// const pdfDoc = await PDFDocument.load(pdfBuffer);

const uint8Array = fs.readFileSync('123.pdf')
const pdfDoc = await PDFDocument.load(uint8Array)
const font = await pdfDoc.embedFont('Helvetica'); 

const pages = pdfDoc.getPages();
const outlines: PDFOutline[] = []; // Create an empty outlines array

for (let i = 0; i < pages.length; i++) {
  const pageIndex = i + 1;
  const page = pages[i];
  const { width, height } = page.getSize();

  // Add visible "Page X" text to the top of each page
  // as page numbers
  page.drawText(`Page ${pageIndex}`, {
    x: width / 2 - 40,  // adjust as per requirements
    y: height - 30,     // adjust to place text at top
    size: 12,
    font: font,
    color: rgb(0, 0, 0),
  });


  // Create an outline for each page
  outlines.push({
    title:'Page',
    to: i,
    italic: true,
    bold: true,
  });
}

// Add the outlines to the PDF document
await setOutline(pdfDoc, outlines);

const pdfBytes = await pdfDoc.save();

fs.writeFileSync("pdf.pdf", pdfBytes);
return Buffer.from(pdfBytes);

}

@yekaiLiu2022
Copy link

See a better implementation here: https://github.com/marp-team/marp-cli/blob/9e0eff5f9d9530577458e93769cd2b0000958a7d/src/utils/pdf.ts
The function you are looking for is setOutline with the following prototype:

async function setOutline(doc: PDFDocument, outlines: readonly PDFOutline[])

Hi thanks for your good implementation! I tried to use your pdf.ts and setOutline to assign some bookmarks into my pdf, the bookmarks link work well to be able to be clicked and jump to the right pages, but the only issue is that the bookmark titles all seem to be totally invisible like transparent in my adobe pdf reviewer(i tried different reviewers,wps or chrome browser, and none of them can show the textual title for any bookmark that has been added in the pdf..), below is my code, I was wondering could you please help me find the reason of the issue, thanks!

import { PDFDocument, PDFRef, rgb } from 'pdf-lib'; import {setOutline, PDFOutline} from './pdf'; async addPageNumbersAndContentsIndexToPDF1() {

const { PDFDocument, rgb } = require('pdf-lib');
// const pdfDoc = await PDFDocument.load(pdfBuffer);

const uint8Array = fs.readFileSync('123.pdf')
const pdfDoc = await PDFDocument.load(uint8Array)
const font = await pdfDoc.embedFont('Helvetica'); 

const pages = pdfDoc.getPages();
const outlines: PDFOutline[] = []; // Create an empty outlines array

for (let i = 0; i < pages.length; i++) {
  const pageIndex = i + 1;
  const page = pages[i];
  const { width, height } = page.getSize();

  // Add visible "Page X" text to the top of each page
  // as page numbers
  page.drawText(`Page ${pageIndex}`, {
    x: width / 2 - 40,  // adjust as per requirements
    y: height - 30,     // adjust to place text at top
    size: 12,
    font: font,
    color: rgb(0, 0, 0),
  });


  // Create an outline for each page
  outlines.push({
    title:'Page',
    to: i,
    italic: true,
    bold: true,
  });
}

// Add the outlines to the PDF document
await setOutline(pdfDoc, outlines);

const pdfBytes = await pdfDoc.save();

fs.writeFileSync("pdf.pdf", pdfBytes);
return Buffer.from(pdfBytes);

}

oh, never mind, I figured out, since I didn't import pdf-lib library packages from the internal code... thanks.

@Siddharth-Tiwari1712
Copy link

See a better implementation here: https://github.com/marp-team/marp-cli/blob/9e0eff5f9d9530577458e93769cd2b0000958a7d/src/utils/pdf.ts
The function you are looking for is setOutline with the following prototype:

async function setOutline(doc: PDFDocument, outlines: readonly PDFOutline[])

Hi thanks for your good implementation! I tried to use your pdf.ts and setOutline to assign some bookmarks into my pdf, the bookmarks link work well to be able to be clicked and jump to the right pages, but the only issue is that the bookmark titles all seem to be totally invisible like transparent in my adobe pdf reviewer(i tried different reviewers,wps or chrome browser, and none of them can show the textual title for any bookmark that has been added in the pdf..), below is my code, I was wondering could you please help me find the reason of the issue, thanks!
import { PDFDocument, PDFRef, rgb } from 'pdf-lib'; import {setOutline, PDFOutline} from './pdf'; async addPageNumbersAndContentsIndexToPDF1() {

const { PDFDocument, rgb } = require('pdf-lib');
// const pdfDoc = await PDFDocument.load(pdfBuffer);

const uint8Array = fs.readFileSync('123.pdf')
const pdfDoc = await PDFDocument.load(uint8Array)
const font = await pdfDoc.embedFont('Helvetica'); 

const pages = pdfDoc.getPages();
const outlines: PDFOutline[] = []; // Create an empty outlines array

for (let i = 0; i < pages.length; i++) {
  const pageIndex = i + 1;
  const page = pages[i];
  const { width, height } = page.getSize();

  // Add visible "Page X" text to the top of each page
  // as page numbers
  page.drawText(`Page ${pageIndex}`, {
    x: width / 2 - 40,  // adjust as per requirements
    y: height - 30,     // adjust to place text at top
    size: 12,
    font: font,
    color: rgb(0, 0, 0),
  });


  // Create an outline for each page
  outlines.push({
    title:'Page',
    to: i,
    italic: true,
    bold: true,
  });
}

// Add the outlines to the PDF document
await setOutline(pdfDoc, outlines);

const pdfBytes = await pdfDoc.save();

fs.writeFileSync("pdf.pdf", pdfBytes);
return Buffer.from(pdfBytes);

}

oh, never mind, I figured out, since I didn't import pdf-lib library packages from the internal code... thanks.

Can you please help me with this implementation i'm kind of stuck

@Siddharth-Tiwari1712
Copy link

Siddharth-Tiwari1712 commented Sep 10, 2023

Can you share a working repo in nodejs that will help me a lot @yekaiLiu2022 @devnoname120

@WindrunnerMax
Copy link

Can you share a working repo in nodejs that will help me a lot @yekaiLiu2022 @devnoname120

I implemented an example using node.

To demonstrate generality, I used additional libraries to generate a PDF and successfully added bookmarks/outlines.

Thanks for the awesome lib 👍

@zwjjiaozhu
Copy link

See a better implementation here: https://github.com/marp-team/marp-cli/blob/9e0eff5f9d9530577458e93769cd2b0000958a7d/src/utils/pdf.ts
The function you are looking for is setOutline with the following prototype:

async function setOutline(doc: PDFDocument, outlines: readonly PDFOutline[])

Hi thanks for your good implementation! I tried to use your pdf.ts and setOutline to assign some bookmarks into my pdf, the bookmarks link work well to be able to be clicked and jump to the right pages, but the only issue is that the bookmark titles all seem to be totally invisible like transparent in my adobe pdf reviewer(i tried different reviewers,wps or chrome browser, and none of them can show the textual title for any bookmark that has been added in the pdf..), below is my code, I was wondering could you please help me find the reason of the issue, thanks!
import { PDFDocument, PDFRef, rgb } from 'pdf-lib'; import {setOutline, PDFOutline} from './pdf'; async addPageNumbersAndContentsIndexToPDF1() {

const { PDFDocument, rgb } = require('pdf-lib');
// const pdfDoc = await PDFDocument.load(pdfBuffer);

const uint8Array = fs.readFileSync('123.pdf')
const pdfDoc = await PDFDocument.load(uint8Array)
const font = await pdfDoc.embedFont('Helvetica'); 

const pages = pdfDoc.getPages();
const outlines: PDFOutline[] = []; // Create an empty outlines array

for (let i = 0; i < pages.length; i++) {
  const pageIndex = i + 1;
  const page = pages[i];
  const { width, height } = page.getSize();

  // Add visible "Page X" text to the top of each page
  // as page numbers
  page.drawText(`Page ${pageIndex}`, {
    x: width / 2 - 40,  // adjust as per requirements
    y: height - 30,     // adjust to place text at top
    size: 12,
    font: font,
    color: rgb(0, 0, 0),
  });


  // Create an outline for each page
  outlines.push({
    title:'Page',
    to: i,
    italic: true,
    bold: true,
  });
}

// Add the outlines to the PDF document
await setOutline(pdfDoc, outlines);

const pdfBytes = await pdfDoc.save();

fs.writeFileSync("pdf.pdf", pdfBytes);
return Buffer.from(pdfBytes);

}

oh, never mind, I figured out, since I didn't import pdf-lib library packages from the internal code... thanks.

hi,i have the same problem。Can you tell me how to solve it, thank you

@zwjjiaozhu
Copy link

See a better implementation here: https://github.com/marp-team/marp-cli/blob/9e0eff5f9d9530577458e93769cd2b0000958a7d/src/utils/pdf.ts
The function you are looking for is setOutline with the following prototype:

async function setOutline(doc: PDFDocument, outlines: readonly PDFOutline[])

Hi thanks for your good implementation! I tried to use your pdf.ts and setOutline to assign some bookmarks into my pdf, the bookmarks link work well to be able to be clicked and jump to the right pages, but the only issue is that the bookmark titles all seem to be totally invisible like transparent in my adobe pdf reviewer(i tried different reviewers,wps or chrome browser, and none of them can show the textual title for any bookmark that has been added in the pdf..), below is my code, I was wondering could you please help me find the reason of the issue, thanks!
import { PDFDocument, PDFRef, rgb } from 'pdf-lib'; import {setOutline, PDFOutline} from './pdf'; async addPageNumbersAndContentsIndexToPDF1() {

const { PDFDocument, rgb } = require('pdf-lib');
// const pdfDoc = await PDFDocument.load(pdfBuffer);

const uint8Array = fs.readFileSync('123.pdf')
const pdfDoc = await PDFDocument.load(uint8Array)
const font = await pdfDoc.embedFont('Helvetica'); 

const pages = pdfDoc.getPages();
const outlines: PDFOutline[] = []; // Create an empty outlines array

for (let i = 0; i < pages.length; i++) {
  const pageIndex = i + 1;
  const page = pages[i];
  const { width, height } = page.getSize();

  // Add visible "Page X" text to the top of each page
  // as page numbers
  page.drawText(`Page ${pageIndex}`, {
    x: width / 2 - 40,  // adjust as per requirements
    y: height - 30,     // adjust to place text at top
    size: 12,
    font: font,
    color: rgb(0, 0, 0),
  });


  // Create an outline for each page
  outlines.push({
    title:'Page',
    to: i,
    italic: true,
    bold: true,
  });
}

// Add the outlines to the PDF document
await setOutline(pdfDoc, outlines);

const pdfBytes = await pdfDoc.save();

fs.writeFileSync("pdf.pdf", pdfBytes);
return Buffer.from(pdfBytes);

}

oh, never mind, I figured out, since I didn't import pdf-lib library packages from the internal code... thanks.

hi,i have the same problem。Can you tell me how to solve it, thank you

i know.

import { PDFString } from 'pdf-lib'

//  const createOutline
// ....

// PDFHexString.fromText(outline.title)
PDFString.of(outline.title)  // ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests