This script converts the PDF files in a given directory to TXT through the Microsoft cognitive OCR API. It requires an active Azure subscription as it needs a subscription key to call their API.
On Ubuntu create a new Python-3 virtual env and install the packages in requirements.txt
.
Within the virtualenv simply run python main.py --dirpath /path/to/dir