Skip to content

RealWorga/webscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Educational Web Scraper Projects

This repository contains web scraper projects intended solely for educational purposes. The code provided in this repository may facilitate bypassing website security measures, which could be deemed a violation of the terms of service of some websites.

Please exercise responsibility in using these projects and ensure that you comply with the terms of service of each website, as well as respect their robots.txt file. The robots.txt file is a critical component of websites that instructs web robots which parts of the website they can and cannot access. It helps web administrators manage what search engines crawl and index. To learn more about robots.txt, please refer to the official documentation.

I urge you to use these projects exclusively for educational purposes and respect the website's terms of service and the robots.txt file.

Usage of stringpdf-scraper.py

  1. Change the file name of secret_file.py to secret.py and change its contents accordingly, adding your own API access key.
  2. Add your urls to the main function of stringpdf-scraper.py

About

Web Content Extractor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages