This is a simple tool for splitting a document into sentences.
-
Enable Virtual Environment
virtualenv venv
source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Run the tool
uvicorn main:app --reload --port 8080
-
Open the browser and go to http://localhost:8080/tokenizer
You are good to go!
-
If you are using Windows
- Install Tesseract OCR from https://github.com/UB-Mannheim/tesseract/wiki
- Install Python
3.10.4
or above from https://www.python.org/downloads/
-
If you are using Linux
- Install Python
3.10.4
or above from https://www.python.org/downloads/ - Modify tesseract install location inside
./src/tesseract.py
- Install Python
Name: Sagnik Das
Email: [email protected]
For suggestions and contributions, please visit here
If you like my work, please star it on here