Skip to content

A python application that scrapes the data from various types of pages

License

Notifications You must be signed in to change notification settings

shashwatshrma/Web-Scraper

Repository files navigation

Web Scraper

This project aims to highlight web scraping from different kind of webpages:

  • Static pages with tables

  • Pages with pagination

  • Pages with AJAX/JS Pagination

  • All robots.txt (if any) files of the sites have been obeyed..

  • Reasonable delays have been implemented as to not overload the websites with requests.

Features

The application is capable of scraping data from the following site

  1. books.toscrape.com - A scraping sandbox that resembles an e-commerce website
  2. scrapethissite.com - A scraping sandbox that contains a page with AJAX
  3. worldometers.info- A statistics site that stores info about population and other things such as the COVID 19 pandemic

The scraped data can be saved either as a csv or as a xlsx file.

About

A python application that scrapes the data from various types of pages

Topics

Resources

License

Stars

Watchers

Forks