FlyScrapper is a Python-based tool that automates the process of gathering flight details from various airline websites. By leveraging the capabilities of pyppeteer
for browser automation and easyocr
for optical character recognition, FlyScrapper navigates through the booking sections of specified URLs, logs in using captcha recognition, and scrapes available flight information such as prices, routes, and timings.
- Asynchronous web scraping for efficiency
- Automated captcha solving for login processes
- Flexibility to specify flight search parameters
- Ability to scrape multiple websites concurrently
To use FlyScrapper, you need to have Python installed on your machine. You can then set up the project's environment with the following steps:
# Clone the repository
git clone https://github.com/RZAsadi/fly-scrapper.git
cd fly-scrapper
# It's recommended to use a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
# Install the dependencies
pip install -r requirements.txt
To start using FlyScrapper, modify the urls
list in the script with the login URLs of the flight websites you wish to scrape. Also, you can customize the fly_info
dictionary to set your desired flight search parameters.
Run the script:
python scrapper.py
By default, FlyScrapper is configured to search for one-way flights from Mashhad (MHD) to Tehran (THR) using the current date. You can modify the parameters by changing the fly_info
dictionary within the script:
inf = {
'fromCity': 'MHD',
'toCity': 'THR',
'wayType': 'OneWay',
'flyDate': '1402/10/11' # Date in jdatetime format (YYYY/MM/DD)
}
The Reader
from easyocr
is set to recognize both English (en
) and Farsi/Persian (fa
). If you need other languages or only one of these, adjust the Reader
initialization:
reader = Reader(lang_list=['en', 'fa'], gpu=False) # Set to `gpu=True` if you want to use GPU acceleration
This project is licensed under the MIT License - see the LICENSE
file for details.
- This project is not affiliated with any of the flight websites it accesses and is intended for educational purposes only.
This software is for educational purposes only. Using this script to scrape websites might be against the Terms of Service of the websites. Use it responsibly and ethically.
Be sure to update the URL to the repository where it says `https://github.com/RZAsadi/fly-scrapper.git` with the correct URL. Also, you might want to include more details, like a `CONTRIBUTING.md` document or add a `LICENSE` file if there isn't one already. Always check and ensure you're permitted to scrape the websites you intend to target and respect their bots.txt file and terms of service.