Skip to content

Download an entire book (or publication) in PDF file from Hathi Trust Digital Library without "partner login" requirement

License

Notifications You must be signed in to change notification settings

luiseduardobr1/hathitrustPDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hathi Trust Digital Library - Complete PDF Download

Download an entire book (or publication) in PDF from Hathi Trust Digital Library without "partner login" requirement.

Motivation

Hathi Trust Digital Library is a good site to find old publications digitized from different university libraries. However, it limits the download of full PDF files to only partner universities, which are mostly american. In this sense, this code attempts to democratize knowledge and permits to download complete public domain works in PDF from Hathi Trust website.

Features

  • Multi-threaded download of PDF pages and merge in a single file.
  • Smart download of pages, skipping already downloaded pages.
  • Supports the two most common link formats:
  • Book splicing, allowing to download only a part of the book.
  • Bulk download of multiple books.
  • Attempts to avoid Error 429 (Too Many Requests) from Hathi Trust.
    • If the error occurs, the thread will sleep for 5 seconds and try again.
    • Works in most cases, but not always.
  • Downloads are attempted 3 times before giving up.
    • Users are notified of the failure, and have the option to redownload the missing pages for merge at the end.
    • Retry attempt count is configurable via --retries option.

Requirements

Usage

usage: hathitrustPDF.py [-h] [-l LINK] [-i INPUT_FILE] [-t THREAD_COUNT] [-r RETRIES] [-b BEGIN] [-e END] [-k]
                        [-o OUTPUT_PATH] [-v] [-V]

PDF Downloader and Merger

options:
  -h, --help            show this help message and exit
  -l LINK, --link LINK  HathiTrust book link
  -i INPUT_FILE, --input-file INPUT_FILE
                        File with list of links formatted as link,output_path
  -t THREAD_COUNT, --thread-count THREAD_COUNT
                        Number of download threads
  -r RETRIES, --retries RETRIES
                        Number of retries for failed downloads
  -b BEGIN, --begin BEGIN
                        First page to download
  -e END, --end END     Last page to download
  -k, --keep            Keep downloaded pages
  -o OUTPUT_PATH, --output-path OUTPUT_PATH
                        Output file path
  -v, --verbose         Enable verbose mode
  -V, --version         show program's version number and exit

About

Download an entire book (or publication) in PDF file from Hathi Trust Digital Library without "partner login" requirement

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages