This project aims to highlight web scraping from different kind of webpages:
-
Static pages with tables
-
Pages with pagination
-
Pages with AJAX/JS Pagination
-
All robots.txt (if any) files of the sites have been obeyed..
-
Reasonable delays have been implemented as to not overload the websites with requests.
The application is capable of scraping data from the following site
- books.toscrape.com - A scraping sandbox that resembles an e-commerce website
- scrapethissite.com - A scraping sandbox that contains a page with AJAX
- worldometers.info- A statistics site that stores info about population and other things such as the COVID 19 pandemic
The scraped data can be saved either as a csv or as a xlsx file.