A collection of product image scrapers for various websites.
This script can scrape product images from various websites (listed below) by their product IDs. Those product IDs then can be used to get early links/ early PIDs for each website.
When the command is ran, it looks for new products, saves any new ones into a database and sends you a Discord webhook for each new product found.
Support for more websites yet to come.
Website name | Command parameter | Website URL |
---|---|---|
Footpatrol | footpatrol |
https://www.footpatrol.com/ |
Size | size |
https://www.size.co.uk/ |
JDSports (EU) | jdsports |
https://www.jdsports.co.uk/ |
TheHipStore | thehipstore |
https://www.thehipstore.co.uk/ |
Solebox | solebox |
https://solebox.com/ |
Snipes | snipes |
https://snipes.com/ |
Onygo | onygo |
https://onygo.com/ |
Courir | courir |
https://www.courir.com/ |
Python 3.9+ is required!
- Clone this repository
git clone https://github.com/rtunazzz/Craper
- Create required files
./bin/config.sh
- Add your webhooks, footer & color preferences into the
craper/config/config.json
file. - (Optional) Add proxies to the
craper/config/proxies.txt
file
If you're struggling with setting up these configuration files, I recommend checking out these examples!
Proxy usage is not required but recommended for websites that ban often, such as Solebox, Snipes or Onygo.
Make sure to have everything set up properly before installing.
python setup.py install
Then you can go ahead and start using the command:
# Show the usage info
craper -h
# Start a Footpatrol scraper
craper footpatrol
# Start 10 Footpatrol scrapers, each scraping 100 product IDs
craper footpatrol -t10 -n100
# Start one scraper with proxies, starting from pid 01925412
craper solebox -pt 1 -s 01925412
craper size -t10 -n5 -s 10
If you'd like to contribute, feel free to open a pull request!
Adding sites should be relatively easy. All you need to do, is add a model (ideally into a separate file) into the models directory. Afterwards, make sure to import it into the init file to ensure easy importing into the main scraper.py
file. Afterwards, just update the SITES variable and that should be it!