LearnCPP Downloader

Multi-threaded web scraper to download all the tutorials from www.learncpp.com and convert them to PDF files concurrently.

Support ❤️

Please support here: https://www.learncpp.com/about/

Execution

Docker

Get the image

docker pull amalrajan/learncpp-download:latest

And run the container

docker run --rm --name=learncpp-download --mount type=bind,destination=/app/learncpp,source=/home/amalr/temp/downloads amalrajan/learncpp-download

Replace /home/amalr/temp/downloads with a local path on your system where you'd want the files to get downloaded.

Local

Install these dependencies

Python 3.10.12
wkhtmltopdf

Debian based: sudo apt install wkhtmltopdf
macOS: brew install Caskroom/cask/wkhtmltopdf
Windows: choco install wkhtmltopdf (or simply download it the old fashioned way). I wouldn't recommend using Windows, as the fonts are a bit weird. Unless of course, you have a thing for weird stuff.

Run it

Clone the repository

git clone https://github.com/amalrajan/learncpp-download.git

Install Python dependencies

cd learncpp-download
pip install -r requirements.txt

Run the script

scrapy crawl learncpp

You'll find the downloaded files inside learncpp directory under the repository root directory.

FAQ

I'm getting rate limit errors. What should I do?

Go to settings.py and set DOWNLOAD_DELAY to a higher value. The default is 0. Try setting it to 0.2.

This script is using 100% CPU. What's wrong?

That's the way it is. You can however go ahead and reduce the concurrency factor in learncpp.py

self.executor = ThreadPoolExecutor(
    max_workers=192
)  # Limit to 192 concurrent PDF conversions

Chamge max_workers to a lower value. The default is 192.

Don't see what you are looking for?

Feel free to open a new issue here: https://github.com/amalrajan/learncpp-download/issues. Don't forget to attach those console logs.

License

The MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
scraper		scraper
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LearnCPP Downloader

Support ❤️

Execution

Docker

Local

Install these dependencies

Run it

FAQ

I'm getting rate limit errors. What should I do?

This script is using 100% CPU. What's wrong?

Don't see what you are looking for?

License

About

Releases

Packages

Contributors 7

Languages

License

amalrajan/learncpp-download

Folders and files

Latest commit

History

Repository files navigation

LearnCPP Downloader

Support ❤️

Execution

Docker

Local

Install these dependencies

Run it

FAQ

I'm getting rate limit errors. What should I do?

This script is using 100% CPU. What's wrong?

Don't see what you are looking for?

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages