write languages parameter value to hocr output file

### expected

write languages parameter value to hocr output file

```html
<html ...>
 <head>
  
  <meta name='ocr-languages' content='deu+eng+rus'/>
 </head>
```

### why

this is useful for OCR proofreading with [hocr-editors](https://github.com/zacharywhitley/awesome-ocr/pull/7)
to run tesseract again on selected regions with the original tesseract arguments

### alternative

write all tesseract arguments to hocr output file
but this is harder to parse

```html
<html ...>
 <head>
  
  <meta name='ocr-arguments' content='tesseract src.jpg - -l deu+eng+rus hocr'/>
 </head>
```

this would be useful to also preserve CLI arguments like
<code>--oem 1 --psm 6 --tessdata-dir [tessdata_best](https://github.com/tesseract-ocr/tessdata_best)</code>
assuming tesseract is run in the same workdir

### workaround

guess the languages parameter value
from `lang='...'` attributes in the hocr output file

parse the main language from
`<p class='ocr_par' id='[^']+' lang='([^']+)'`
and parse extra languages from
`<span class='ocrx_word' id='[^']+' title='[^']+' lang='([^']+)'`

but this workaround can fail and return a wrong order of languages
(my impression is that the order of languages does matter for tesseract)

> parse

yeah i know, "parsing" xml with regex is bad
this is just an example, in a real app i would use a proper xml parser

### example

tesseract was called like
`tesseract src.jpg - -l deu+eng+rus hocr >dst.hocr`
then the main language is `deu`
and the extra languages are `eng` and `rus`

### keywords

- get tesseract languages parameter value from hocr file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

write languages parameter value to hocr output file #4455

expected

why

alternative

workaround

example

keywords

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

write languages parameter value to hocr output file #4455

Description

expected

why

alternative

workaround

example

keywords

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions