Skip to content

Python code to download all public available archives of reddit comments

Notifications You must be signed in to change notification settings

CAProjects/reddit_comments_download

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

reddit_comments_download

Python code to download all public available archives of reddit comments from https://files.pushshift.io/reddit/comments/

Code only tested with Python 3.8.2

The Python code will do the following

  • Loop though a json list i created myself
  • check if the file exists
  • if it exists it will check the SHA of the file
    • if the hash does not match then it will re-download the file
    • if the hash does match then it will move to the next

To use, edit the variable loc = 'F:\\LOCATION\\TO\\DOWNLOAD\\FILES\\TO\\' to the location you want to download all the archives to including the double backslash at the end

If you do not want all the archives then edit rc_filelist.json to only contain the archives to download

After downloading, re-run just to make sure all files downloaded correctly and is complete

About

Python code to download all public available archives of reddit comments

Topics

Resources

Stars

Watchers

Forks

Languages