Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 841 Bytes

README.md

File metadata and controls

17 lines (12 loc) · 841 Bytes

reddit_comments_download

Python code to download all public available archives of reddit comments from https://files.pushshift.io/reddit/comments/

Code only tested with Python 3.8.2

The Python code will do the following

  • Loop though a json list i created myself
  • check if the file exists
  • if it exists it will check the SHA of the file
    • if the hash does not match then it will re-download the file
    • if the hash does match then it will move to the next

To use, edit the variable loc = 'F:\\LOCATION\\TO\\DOWNLOAD\\FILES\\TO\\' to the location you want to download all the archives to including the double backslash at the end

If you do not want all the archives then edit rc_filelist.json to only contain the archives to download

After downloading, re-run just to make sure all files downloaded correctly and is complete