Skip to content

Commit e398601

Browse files
jakesebrightstweil
authored andcommitted
Include ALTO in list of supported output formats
1 parent 1f5fb15 commit e398601

File tree

2 files changed

+2
-1
lines changed

2 files changed

+2
-1
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/gr
2424

2525
Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box".
2626

27-
Tesseract supports **various output formats**: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf.
27+
Tesseract supports **various output formats**: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV. The master branch also has experimental support for ALTO (XML) output.
2828

2929
You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image** you are giving Tesseract.
3030

doc/tesseract.1.asc

+1
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ OPTIONS
9090
contains a list of variables and their values, one per line, with a
9191
space separating variable from value. Interesting config files
9292
include: +
93+
* `alto` - Output in ALTO format (file extension `.xml`).
9394
* `hocr` - Output in hOCR format (file extension `.hocr`).
9495
* `pdf` - Output PDF (file extension `.pdf`).
9596
* `tsv` - Output TSV (file extension `.tsv`).

0 commit comments

Comments
 (0)