Skip to content

alexmerren/speedruncom-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speedrun.com API Scraper

Go Report Card Go Version

A series of executables to collect all data available from speedrun.com. A version of the collected data has been published here!

⬇️ Installation

The repository can be installed easily, and binaries can be compiled with the following commands:

$ git clone [email protected]:alexmerren/speedruncom-scraper.git
...
$ cd speedruncom-scraper
$ make all
...

This project requires:

🚀 Usage

The compiled binaries can be executed to scrape data from the speedrun.com API. The following command retrieves data for all runs, all leaderboards, all games, and all users (whom have contributed to leaderboards) on speedrun.com:

./dist/games-list && ./dist/games-data && ./dist/leaderboards-data && ./dist/users-list && ./dist/users-data && ./dist/runs-data

Alternatively, there is a Makefile target to run all executables in order:

make run

NOTE: During the scraping process there may be repeated API calls. A local HTTP cache has been implemented to handle repeated API calls locally instead of via the rate-limited API. This cache is saved as httpcache.db.

🏃 Executables

Path Description Pre-requisite(s)
./dist/games-list Retrieve all Game IDs and other data (i.e. total number of runs for a game) for verification in other executables. Retrieve other miscellaneous pieces of data such as platforms, developers, genres, etc. None
./dist/games-data Retrieve data on categories, levels, variables, and values, etc. for all game IDs retrieved in games-list. ./dist/games-list
./dist/leaderboards-data Retrieve leaderboard(s) data for all games retrieved in games-list. Note: This can fail for games with a high number of runs, use additional-leaderboards-data in this case. ./dist/games-data
./dist/supplementary-leaderboard-data Retrieve leaderboard data for all category/level/variable/value combinations of a game. This executable is tailored to retrieve data for games with an extremely high number of runs i.e. Subway Surfers. This will be extremely inefficient for games with a high count of unique category/level/variable/value combinations. None
./dist/users-list Compile a list of all unique users found on all leaderboards of all games— includes both submitters and verifiers. ./dist/leaderboards-data
./dst/users-data Retrieve non-PII data for all unique users compiled in users-list. ./dist/users-list
./dist/runs-data Retrieve all runs for all unique users compiled in users-list. This should be all runs on speedrun.com! ./dist/users-list
./dist/world-record-data Retrieve world record data for all valid category/level/variable/value combinations of a game. This is experimental, and has a delay of 1s applied to every request to ensure the V2 API is not rate limited externally. None

📝 Documentation

All documentation can be found in the docs directory.

💭 Feedback and Contribution

Any improvements or requests can be raised via GitHub Issues. Any development conversations can be found on on GitHub Discussions.