ferry-python

Website scraping scripts for FerryWave project

Requirements

Python

You will need Python 3.9.6 or up to run this set of scripts.

/venv folder contains the Python environment with necessary Python libraries. Before running the scripts, switch to this environment with a source command: source /venv/bin/activate

If for any reason the environment does not work for you, the file requirements.txt includes all necessary python packages. You can install them on your machine with pip: pip install -r requirements.txt

Credentials

You need to export the MongoDB password for the 'python-user' account into the local environment. You can do it like this: export MONGODB_PASSWORD=<password>

Tesseract

Tesseract is an image recognition library that is used to scrape some of the timetabling data for the FerryWave website. You will need to install Tesseract and Tesseract OCR on your machine in order to run scraping for all the websites.

Follow the official documentation for installation steps for your OS (choose version 5):

https://github.com/tesseract-ocr/tesseract#installing-tesseract

Don't forget to install at least one full language package as well (preferably english): https://tesseract-ocr.github.io/tessdoc/Installation.html
After the installation you will have to provide the path o Tesseract executable file. Change the appropriate line in settings.py

Running the scraper

Runnnig update_database.py will run the scraping for all defined website destinations, clear the database and push the new data into the database.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
scraping_scripts		scraping_scripts
tesseract-macos		tesseract-macos
venv		venv
.gitignore		.gitignore
README.md		README.md
savedimage.jpg		savedimage.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ferry-python

Requirements

Python

Credentials

Tesseract

Running the scraper

About

Releases

Packages

Contributors 2

Languages

JakubWronskiUG/ferry-python

Folders and files

Latest commit

History

Repository files navigation

ferry-python

Requirements

Python

Credentials

Tesseract

Running the scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages