FlyScrapper

FlyScrapper is a Python-based tool that automates the process of gathering flight details from various airline websites. By leveraging the capabilities of pyppeteer for browser automation and easyocr for optical character recognition, FlyScrapper navigates through the booking sections of specified URLs, logs in using captcha recognition, and scrapes available flight information such as prices, routes, and timings.

Features

Asynchronous web scraping for efficiency
Automated captcha solving for login processes
Flexibility to specify flight search parameters
Ability to scrape multiple websites concurrently

Installation

To use FlyScrapper, you need to have Python installed on your machine. You can then set up the project's environment with the following steps:

# Clone the repository
git clone https://github.com/RZAsadi/fly-scrapper.git
cd fly-scrapper

# It's recommended to use a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

# Install the dependencies
pip install -r requirements.txt

Usage

To start using FlyScrapper, modify the urls list in the script with the login URLs of the flight websites you wish to scrape. Also, you can customize the fly_info dictionary to set your desired flight search parameters.

Run the script:

python scrapper.py

Configuration

Flight Search Parameters

By default, FlyScrapper is configured to search for one-way flights from Mashhad (MHD) to Tehran (THR) using the current date. You can modify the parameters by changing the fly_info dictionary within the script:

inf = {
    'fromCity': 'MHD',
    'toCity': 'THR',
    'wayType': 'OneWay',
    'flyDate': '1402/10/11' # Date in jdatetime format (YYYY/MM/DD)
}

OCR Language

The Reader from easyocr is set to recognize both English (en) and Farsi/Persian (fa). If you need other languages or only one of these, adjust the Reader initialization:

reader = Reader(lang_list=['en', 'fa'], gpu=False) # Set to `gpu=True` if you want to use GPU acceleration

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This project is not affiliated with any of the flight websites it accesses and is intended for educational purposes only.

Disclaimer

This software is for educational purposes only. Using this script to scrape websites might be against the Terms of Service of the websites. Use it responsibly and ethically.


Be sure to update the URL to the repository where it says `https://github.com/RZAsadi/fly-scrapper.git` with the correct URL. Also, you might want to include more details, like a `CONTRIBUTING.md` document or add a `LICENSE` file if there isn't one already. Always check and ensure you're permitted to scrape the websites you intend to target and respect their bots.txt file and terms of service.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlyScrapper

Features

Installation

Usage

Configuration

Flight Search Parameters

OCR Language

License

Acknowledgments

Disclaimer

About

Releases 1

Packages

Languages

RZAsadi/fly-scrapper

Folders and files

Latest commit

History

Repository files navigation

FlyScrapper

Features

Installation

Usage

Configuration

Flight Search Parameters

OCR Language

License

Acknowledgments

Disclaimer

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages