Skip to content
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.

Comparison

WANMIN LIU edited this page May 29, 2014 · 23 revisions

This page compares common approaches to present PDF files online. Please read the paper [Online publishing via pdf2htmlEX](http://coolwanglu.github.io/pdf2htmlEX/doc/tb108wang.html) for more details.

Basic Info

Convert to HTML 5 Parse by JS Convert to image Convert to HTML 4 Adobe PDF plugin Other plugins
Example pdf2htmlEX PDF.js pdftoppm (poppler) Google Doc pdftohtml (poppler) Adobe PDF Plugin N/A
Briefing PDF elements are converted into corresponding or closest HTML elements PDF file is loaded, parsed and rendered by Javascript PDF pages are converted into images and shown in web pages Similar as “Convert to HTML 5”, but with much less features Official plugin Non-official PDF plugins, Flash-based plugins or others
Open source Yes Some (pdf.js) Poppler is open source. Google Doc may be based on poppler as well, because they showed same errors. Some (pdftohtml) No Maybe
Free Yes Some Some Some Yes Some

note: There are free and/or open source tools for all but Adobe PDF plugin.

Performance

Convert to HTML 5 (pdf2htmlEX) Parse by JS Convert to image Convert to HTML 4 Adobe PDF plugin Other plugins
Processing (server-side) Normal, one time None Slow, one time Fast, one time None None, usually
Loading (client-side) Fast Fast Slow Fast Fast Fast
Rendering (client-side) Fast Slow Fast Fast Fast Fast, usually
Network cost Small 1 Small Large 2 Small Small Small

1: HTTP compression is required
2: Could be Huge if higher resolution is needed

Browser Requirements

Convert to HTML 5 (pdf2htmlEX) Parse by JS Convert to image Convert to HTML 4 Adobe PDF plugin Other plugins
HTML 5 Yes Yes, usually No No No No
CSS Yes Yes No Yes No No
Javascript No Yes No No No No
Third-party plugin No No No No Yes Yes

Features

Convert to HTML 5 (pdf2htmlEX) Parse by JS Convert to image Convert to HTML 4 Adobe PDF plugin Other plugins
Full PDF Feature ? No, but usually enough Maybe Yes No Yes Maybe
Text Extraction (select/copy/search) Yes Yes, with text layer No, usually 1 Yes Yes Maybe
Embedding Font Yes Yes Yes No Yes Yes, usually
Link Yes Yes No, usually 2 Yes Yes Maybe
Accurate rendering (layout/spacing) Yes, usually 3 Yes Yes No Yes Yes, usually
Read while loading Yes Yes Yes Yes No Maybe

1: Text extraction can be supported with a text layer
2: Link may be handled with Javascript
3: There are PDF elements which cannot be converted into HTML losslessly

Development

Convert to HTML 5 (pdf2htmlEX) Parse by JS Convert to image Convert to HTML 4 Adobe PDF plugin Other plugins
Customizable UI/Theme Yes Yes Yes Yes No No, usually 1
Extensible Yes Yes Yes Yes No Maybe 2

1: For some plugins there are commercial licensed versions with customizable UI
2: Some plugins have API available

Clone this wiki locally