Web Scraping Script

This repository includes all the Python code I used to gather data from a novel translation archive website. In here, you will find 5 Python files, which consists of 4 different classes and 1 main script.

logger
This file contains a Logger class which is used to keep track of warnings and errors that may or may not happen during runtime and then write it into a file called logs.log.
pool
This file has a function called create_pool() which will be called by Proxer to generate a set (a pool) of random proxies and headers. This pool will later be passed on as a parameter during the HTTP GET request so the bot will seem more like a normal user and will less likely to get blocked.
proxer
The Proxer class in this file keeps track of the IP and header rotations on main, does the HTTP request, and then return the HTML response.
novelparser
Includes the BeautifulSoup codes to find the desired information, including maximum number of pages, titles for each page, and details for each novel.
main
Contains a loop to fetch get the HTML response using Proxer(). open_site() and parse it using various functions from NovelParser.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
NU_20221023.csv		NU_20221023.csv
README.md		README.md
logger.py		logger.py
main.py		main.py
novelparser.py		novelparser.py
pool.py		pool.py
proxer.py		proxer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Script

About

Releases

Packages

Languages

syafa-kh/NU_WebScrap

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Script

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages