Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an extra parameter (in API and CLI) to include specific input reader params #554

Open
sheikgit opened this issue Apr 5, 2024 · 1 comment

Comments

@sheikgit
Copy link

sheikgit commented Apr 5, 2024

Hi,
Maybe it's already a feature, but I am needing to tell invoice2data that when I use ocrmypdf I also need to tell ocrmypdf with the "deskew" parameter, and and also set "redo_ocr" and "force_ocr" parameters.

Cheers,

Adrián

@ballatom
Copy link

ballatom commented Nov 3, 2024

I support this request.

I have noticed that, in some cases, Tesseract may recognize more text with --psm 1 instead of --psm 6 (the default for invoice2data defined in src/invoice2data/input/tesseract.py).

One possibility would be to call invoice2data with '--psm 1', which would overwrite the default defined in src/invoice2data/input/tesseract.py for that parameter only.
# invoice2data --input-reader tesseract --psm 1...

Another possibility would be to have default values for parameters defined in a .conf file. IMO, this would be far less practical as --psm 3 works in most cases and needs to be overwritten just a few times in my experience.

I would be happy to provide help in testing but I have no experience in Python.

Marc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants