This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders. I regularly maintain this list based on my own logs.
If you are using Ruby, Voight-Kampff and isbot provide libraries for accessing this data.
Other systems for spotting robots, crawlers, and spiders that you may want to consider include isBot (Node.JS), Crawler-Detect (PHP), BrowserDetector (PHP), and browscap (JSON files).
The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.
I do welcome additions contributed as pull requests.
The pull requests should:
- contain a single addition
- specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")
- contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot
- result in a valid JSON file (don't forget the comma between items)
Example:
{
"pattern": "rogerbot",
"addition_date": "2014/02/28",
"url": "http://moz.com/help/pro/what-is-rogerbot-"
}
--Martin