This repository contains educational example scrapers for popular web scraping targets using the ScrapFly web scraping API and Python.
Most Scrapers use a simple web scraping stack:
- Python version 3.10+
- Scrapfly's Python SDK for sending HTTP request, bypass blocking and parsing the HTML using the built-in parsel selector.
- asyncio for writing concurrent code using the async/await syntax.
- JMESPath and nested-lookup for JSON parsing when needed.
- loguru for logging.
To learn more about web scraping see our full tutorials on how to scrape these targets (and many others) see the scrapeguide directory.
Below is the list of available web scrapers for the supported domains along with their scrape guide, sample datasets, and status. 👇
This repository contains educational reference material to illustrate how accessible web scraping can be and the provided programs are not intented to be used in web scraping production. That being said, Scrapfly team is constantly updating and improving all of this code for optimal experience.
Scrapfly does not offer legal advice and as always, consult a lawyer when creating programs that interact with other people's websites though here's a good general intro of what NOT to do:
- Do not store PII (personally identifiable information) of EU citizens who are protected by GDPR.
- Do not scrape and repurpose entire public datasets which can be protected by database protection laws in some countries.
- Do not scrape at rates that could damage the website and scrape only publicly available data.