Stack Exchange Backup

Download all your posts on the Stack Exchange network as Markdown files via a Python script talking to the Stack Exchange API.

Showcase

Installation

Either download the repository as a ZIP file and extract it, or install Git (recommended) and do a git clone of the project:
```
git clone https://github.com/9ao9ai9ar/stack-exchange-backup.git
```
Install Python 3.12 or newer. See the support section for additional information.
Enter the directory you just extracted/cloned:
```
cd stack-exchange-backup
```
All steps hereafter assume operations under said directory.
Create and activate a virtual environment (strongly recommended).
1. If you're on Windows and have kept the defaults when installing Python using the official installer, the Python Launcher py should be installed alongside it, and you can issue the following command in a command-line shell to create a virtual environment:
```
py -3.12 -m venv .venv 
```
  Otherwise, check the Python version on your PATH with python -V and, if it meets the minimum required Python version, create a virtual environment by executing:
```
python -m venv .venv
```
  On Linux, venv is usually in its own package separate from the Python installation, in which case consult your distribution's documentation on how to install venv before proceeding.
2. After a virtual environment has been created, you need to activate it. For Linux and macOS, the command is:
```
. .venv/bin/activate
```
  and for Windows it is:
```
.\.venv\Scripts\activate
```
  Note that if you're using PowerShell on Windows, you'd first have to enable script execution in order to activate the virtual environment:
```
Set-ExecutionPolicy -ExecutionPolicy AllSigned -Scope CurrentUser
```
Install stack-exchange-backup as a local Python package using pip:
```
# Install from the requirements file first if you want reproducible build.
python -m pip install -r requirements/prod.txt
python -m pip install .
```
You may have to ensure pip is installed because, like venv, pip doesn't come bundled with Python in most Linux distributions.

In the future, I may consider publishing the program either as a package on PyPI or as a self-contained executable, so that the installation guide can be simpler than it already is.

Usage

Remember to always activate the virtual environment first!

(.venv) $ python -m stackexchange.backup --help
usage: backup.py [-h] --account-id ACCOUNT_ID [--no-meta] [--out-dir OUT_DIR] [--request-key REQUEST_KEY] [--rps RPS]

options:
  -h, --help            show this help message and exit
  --account-id ACCOUNT_ID
                        account ID
  --no-meta             do not back up meta posts
  --out-dir OUT_DIR     output directory (default: q_and_a)
  --request-key REQUEST_KEY
                        request key
  --rps RPS             requests per second limit (default: 20)

ACCOUNT_ID: the ID of a Stack Exchange account. Note that this is NOT the per-site user IDs. To acquire the ACCOUNT_ID of a user:
1. Go to the user's profile page on one of the Stack Exchange network sites and click on either the View all link next to Communities or the Network profile link in the dropdown under Profiles.
2. On the new web page that is just opened, note the URL segment after users consists of a number: this is the ACCOUNT_ID of the user (1 in the case of Jeff Atwood).
OUT_DIR: the folder to download your files to, can be either a relative path or an absolute path.
REQUEST_KEY: we provide a default request key only for your convenience. As per this FAQ on Stack Apps, it is advisable that users bring their own request keys. To access the API without a request key, provide an empty string as the value to this option.
RPS: requests per second, a soft limit imposed on the running program. It is stated in the docs in no uncertain terms that the Stack Exchange API considers 30+ requests per second per IP to be very abusive, and will thus ban any rogue IP from making further requests to the API for an indefinite period of time. Due to the nature of floating-point arithmetic and the limitations of the current implementation, do not assume it is an exact upper bound on the number of requests the program will make within any one-second period.

Format

NOTE: The output directory structure, filenames as well as the Markdown content layout format, are still subject to change without prior notice. If the output format is modified, this README will be updated to reflect the changes.

The output directory has the following structure:

+---<stack exchange site 1 hostname>
|   +---answers
|   |       <question id associated with answer 1 id>.md
|   |       <question id associated with answer 2 id>.md
|   |       ...
|   |
|   \---questions
|           <question 1 id>.md
|           <question 2 id>.md
|           ...
|
+---<stack exchange site 2 hostname>
|   +---answers
|   |       <question id associated with answer 1 id>.md
|   |       <question id associated with answer 2 id>.md
|   |       ...
|   |
|   \---questions
|           <question 1 id>.md
|           <question 2 id>.md
|           ...
|
...

Each Markdown file will represent either a question or an answer, depending on whether it is under a questions directory or an answers directory. If the Markdown file represents a question, then the question creator will be you. Otherwise, if the Markdown file represents an answer, the question creator will not be you, but the creator of one of the answers included in the Markdown file will be you. More specifically, each Markdown file will have the following format (text that is inside angle brackets, such as <this>, represents text that will vary for each Markdown file):

Question downloaded from <question link>
Question asked by <username for question creator> on <question date> at <question time>.
Number of up votes: <number of up votes for question>
Number of down votes: <number of down votes for question>
Score: <overall score associated with the question (number of up votes - number of down votes)>

# <question title>
<question body>

<loop through 1 to i if there are comments for the question>

### Comment <i>
Comment made by <username for comment i creator> on <comment i date> at <comment i time>.
Comment score: <number of up votes for comment i>

<comment i body>

<loop through 1 to j if there are answers for the question>

## Answer <j>
Answer given by <username for answer j creator> on <answer j date> at <answer j time>.
This <is/is not> the accepted answer.
Number of up votes: <number of up votes for answer j>
Number of down votes: <number of down votes for answer j>
Score: <overall score associated with answer j (number of up votes - number of down votes)>

<answer j body>

<loop through 1 to k if there are comments for answer j>

### Comment <k>
Comment made by <username for comment k creator> on <comment k date> at <comment k time>.
Comment score: <number of up votes for comment k>

<comment k body>

FAQ

Are deleted posts included in the backup?

No. The public API does not provide a way to retrieve deleted posts, even when authenticated.
Are favorites/bookmarks/saves included in the backup?

No. When public favorites, also briefly known as bookmarks, got reworked into private saves, it was done without coordinated changes to the API, so it became impossible to query a user's saves through the API.
Are Area 51 posts included in the backup?

No. Area 51 is not adequately supported in the API, and very few people are affected by this lack of support.
Are articles included in the backup?

No. Being a part of collectives, articles are only supported on Stack Overflow, and fewer than 100 articles have been published to date since the beta release of collectives in 2021. Therefore, I have concluded it's not worth the effort to add support for backing up articles, despite them still being queryable through the /users/{ids}/posts endpoint after /articles has been removed from the public API.

Related Projects

Stack Exchange API

mhdadk/stack-exchange-backup

The original repository from which this fork is derived. I'd like to express my thanks to its author, Mahmoud Abdelkhalek, for his well-commented code expedited my process of grokking the Stack Exchange API, which, while conceptually simple, has its documentation of related topics, some insufficiently explained, and the numerous bugs scattered all over the place.

StackExchangeBackupLaravel

StackExchangeBackupLaravel allows exporting a somewhat complete data footprint of a user on the Stack Exchange network, but the outputs are in JSON rather than Markdown, which are also zipped and uploaded to Amazon S3 by default. By contrast, Stack Exchange Backup is simple and straightforward: everything is downloaded to the local machine only, and installation is easier and documentation more thorough.

Stack Exchange Data Explorer

The Stack Exchange Data Explorer (SEDE) is an open source tool for running arbitrary queries against public data from the Stack Exchange network. There are ready-made queries to back up your posts on the Stack Exchange network either as a single HTML file or as a CSV file. Unfortunately, they are not the one-stop solutions to outputting individual source files in the Markdown format. Moreover, to use the SEDE service, you'd either have to log in or solve some CAPTCHAs first, and the data is only updated weekly, as opposed to the data returned by the API, which is updated about once a minute.

Pippim Website

Converts your Stack Exchange posts to your own website, hosted for free on GitHub Pages. Requires you to manually run the aforementioned SEDE query that outputs as a CSV file beforehand.

Stack Exchange Data Dump

This is a quarterly dump of all user-contributed data on the Stack Exchange network. In an announcement made in July 2024, the data dumps will no longer be uploaded to the Internet Archive; instead, they will be provided from a section of the site user profile on a Stack Exchange profile. Therefore, this method of backup has a few major downsides:

Being locked behind a login wall.
Being incomplete, meaning the data dump you download are only for the specific site from which you initiated the request.
Being complete, meaning that the download size may be humongous, and to get only your data, you'd have to do some non-trivial parsing of the downloaded XML files yourself.

Thankfully, there exists the Stack Exchange data dump downloader and transformer project that aims to overcome these pain points.

Development

In addition to the dev dependencies, this project relies on the following tools:

uv (install standalone executable)
security-constraints (uv tool install security-constraints)
Pyright (npm install pyright after installing Node.js)

Before each commit, you should run the appropriate release script for your shell.

To help you in your experimentation with the Stack Exchange APIs through the documentation webpages, I have compiled a list of the parameter types and their associated icons as below:

: Strings
: Numbers
: Dates
: Lists
: Keys
: Access Tokens

Except for numbers and dates, the icons are not explained anywhere in the documentation, but if you open the inspector in your web browser, say when you're on this page, and check the <input> nodes, you'll see that the class attributes include string-type, number-type, etc., which give you enough hint of how they should be inputted.

Support

It is my policy to strive to support all non-end-of-life stable releases of Python. However, indispensable features I rely on are sometimes not supported on older Python versions without backporting. Therefore, I can only promise my code will run on the latest bugfix release of Python. If you are a Windows user and wants to use an older, supported Python release, do note that the official website does not provide binaries for the security releases. Thereby, I encourage you to instead install it through one of the following conda distributions or package managers to benefit from the continuing security fixes:

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
assets		assets
requirements		requirements
resources/openapi		resources/openapi
src/stackexchange		src/stackexchange
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
release.ps1		release.ps1
release.sh		release.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stack Exchange Backup

Showcase

Installation

Usage

Format

FAQ

Related Projects

Stack Exchange API

mhdadk/stack-exchange-backup

StackExchangeBackupLaravel

Stack Exchange Data Explorer

Pippim Website

Stack Exchange Data Dump

Development

Support

About

Releases

Languages

License

9ao9ai9ar/stack-exchange-backup

Folders and files

Latest commit

History

Repository files navigation

Stack Exchange Backup

Showcase

Installation

Usage

Format

FAQ

Related Projects

Stack Exchange API

mhdadk/stack-exchange-backup

StackExchangeBackupLaravel

Stack Exchange Data Explorer

Pippim Website

Stack Exchange Data Dump

Development

Support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages