-
Notifications
You must be signed in to change notification settings - Fork 5
Glossary
Glossary of terms from the domain of image processing/OCR as used within the OCR-D framework
A block is a polygon inside a page.
The semantics or function of a block such as heading, page number, column, print space...
TODO
See Glyph
See TextLine
Reading order is the intended order of regions within a document.
See Region
See Glyph
A TextLine is a block of text without line breaks.
A word is a sequence of glyphs not containing any word-bounding whitespace.
Ground Truth (GT) in the context of OCR-D is transcriptions in PAGE-XML format in combination with the original image.
We distinguish different usage scenarios for GT:
TODO
TODO
Most LSTM will be trained on line transcription/line image tuples. These can be generated from PAGE-XML of the Ground Truth.
Binarization means converting all colors in an image to either black or white.
Controlled term: binarized
(comments
of a mets:file), preprocessing/optimization/binarization
(step
in ocrd-tool.json)
See Felix' Niklas interactive demo
Manipulating an image in such a way that it is rectangular, all text lines are parallel to bottom/top edge of page and creases/folds/curving of page into spine of book has been corrected.
Controlled term: preprocessing/optimization/dewarping
See Matt Zucker's entry on Dewarping.
Remove artifacts such as smudges, ink blots, underlinings etc. from an image.
Controlled term: preprocessing/optimization/despeckling
Rotate image so that all text lines are horizontal.
Controlled term: preprocessing/optimization/deskewing
Detecting the font type used in the document. Can happen after an initial OCR run or before.
Controlled term: recognition/font-identification
ISSUE: https://github.com/OCR-D/spec/issues/41
Controlled term:
-
gray_normalized
(comments
in file) -
preprocessing/optimization/cropping
(step)
Gray normalization is similar to binarization but instead of a purely bitonal image, the output can also contain shades of gray to avoid inadvertently combining glyphs when they are very close together.
Document analysis is the detection of structure on the document level to create a table of contents.
Detects the reading order of blocks.
Detecting the print space in a page, as opposed to the margins. It is a form of block segmentation
Controlled term: preprocessing/optimization/cropping
.
--> Cropping
Segmentation means detecting areas within an image.
Specific segmentation algorithms are labelled by the semantics of the regions they detect not the semantics of the input, i.e. an algorithm that detects blocks is called block segmentation.
Segment an image into blocks. Also determines whether this is a text or non-text block (e.g. images).
Controlled term:
-
SEG-BLOCK
(USE
) -
layout/segmentation/region
(step)
Determine the type of a detected block.
Segment blocks into textlines.
Controlled term:
-
SEG-LINE
(USE
) -
layout/segmentation/line
(step)
Controlled term:
-
SEG-LINE
(USE
) -
layout/segmentation/word
(step)
Segment a textline into glyphs
Controlled term: SEG-GLYPH
TODO wrong use of document analysis
The software repository contains all document analysis algorithms developed during the project including tests. It will also contain the documentation and installation instructions for deploying a document analysis workflow.
Contains all the ground truth data.
TODO wrong use of document analysis
The research data repository contains the results of all activities during document analysis. At least it contains the end results of every processed document and its full provenance. The research data repository must be available locally.
TODO wrong use of document analysis
Contains all trained (OCR) models for document analysis. The model repository must be available locally. Ideally, a publicly available model repository will be developed.
The OCR-D project divided the various elements of an OCR workflow into six modules.
TODO
TODO
TODO
TODO
TODO
TODO