Skip to content

Commit

Permalink
Improves detection for various bots (#7596)
Browse files Browse the repository at this point in the history
* Adds detection for Viber Url Downloader
* Adds detection for Zeno
* Adds detection for Barracuda Sentinel
* Improves detection for Repo Lookout
* Remove duplicate regex for Siteimprove
* Remove Radio Zeno. This is a bot, confirmed by IP address

ref #7595
  • Loading branch information
liviuconcioiu committed Feb 19, 2024
1 parent 35efad7 commit 00bd479
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 35 deletions.
29 changes: 28 additions & 1 deletion Tests/fixtures/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5769,7 +5769,7 @@
user_agent: RepoLookoutBot/v1.1.0-209-g2b273e8 (abuse reports to [email protected])
bot:
name: Repo Lookout
category: Crawler
category: Security Checker
url: https://www.repo-lookout.org/
producer:
name: Crissy Field GmbH
Expand Down Expand Up @@ -6835,3 +6835,30 @@
user_agent: WebAuthn Adoption Study (Contact [email protected])
bot:
name: Generic Bot
-
user_agent: ViberUrlDownloader
bot:
name: Viber Url Downloader
category: Service Agent
url: https://www.viber.com/
producer:
name: Viber Media S.à r.l.
url: https://www.viber.com/
-
user_agent: Zeno
bot:
name: Zeno
category: Crawler
url: https://github.com/internetarchive/Zeno
producer:
name: The Internet Archive
url: https://archive.org/
-
user_agent: Barracuda Sentinel (EE)
bot:
name: Barracuda Sentinel
category: Service Agent
url: https://sentinel.barracudanetworks.com/
producer:
name: Barracuda Networks, Inc.
url: https://www.barracudanetworks.com/
13 changes: 0 additions & 13 deletions Tests/fixtures/podcasting.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8815,19 +8815,6 @@
model: ''
os_family: iOS
browser_family: Unknown
-
user_agent: 'Zeno'
os: [ ]
client:
type: mobile app
name: Radio Zeno
version: ""
device:
type: ""
brand: ""
model: ''
os_family: Unknown
browser_family: Unknown
-
user_agent: 'Zune/4.8'
os: [ ]
Expand Down
42 changes: 25 additions & 17 deletions regexes/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2172,14 +2172,6 @@
name: 'Siteimprove GmbH'
url: 'https://siteimprove.com/'

- regex: 'Image size by Siteimprove\.com'
name: 'Siteimprove'
category: 'Search bot'
url: 'https://siteimprove.com/'
producer:
name: 'Siteimprove GmbH'
url: 'https://siteimprove.com/'

- regex: 'CATExplorador'
name: 'CATExplorador'
category: 'Search bot'
Expand Down Expand Up @@ -3220,7 +3212,7 @@
name: 'New Work SE'
url: 'https://www.xing.com/'

- regex: 'RepoLookoutBot/[\d.]+'
- regex: 'RepoLookoutBot/v?[\d.]+'
name: 'Repo Lookout'
category: 'Security Checker'
url: 'https://www.repo-lookout.org/'
Expand Down Expand Up @@ -3492,14 +3484,6 @@
name: 'Lumar'
url: 'https://www.lumar.io/'

- regex: 'RepoLookoutBot'
name: 'Repo Lookout'
category: 'Crawler'
url: 'https://www.repo-lookout.org/'
producer:
name: 'Crissy Field GmbH'
url: 'https://www.crissyfield.de/'

- regex: 'researchscan\.comsys\.rwth-aachen\.de'
name: 'Research Scan'
category: 'Crawler'
Expand Down Expand Up @@ -3994,6 +3978,30 @@
name: 'EMDASH SAS'
url: 'https://www.fontradar.com/'

- regex: 'ViberUrlDownloader'
name: 'Viber Url Downloader'
category: 'Service Agent'
url: 'https://www.viber.com/'
producer:
name: 'Viber Media S.à r.l.'
url: 'https://www.viber.com/'

- regex: '^Zeno$'
name: 'Zeno'
category: 'Crawler'
url: 'https://github.com/internetarchive/Zeno'
producer:
name: 'The Internet Archive'
url: 'https://archive.org/'

- regex: 'Barracuda Sentinel'
name: 'Barracuda Sentinel'
category: 'Service Agent'
url: 'https://sentinel.barracudanetworks.com/'
producer:
name: 'Barracuda Networks, Inc.'
url: 'https://www.barracudanetworks.com/'

# Generic detections
- regex: 'nuhk|grub-client|Download Demon|SearchExpress|Microsoft URL Control|borg|altavista|dataminr\.com|tweetedtimes\.com|teoma|oegp|http%20client|htdig|mogimogi|larbin|scrubby|searchsight|semanticdiscovery|snappy|vortex(?!(?: Build|Plus))|zeal(?!ot)|dataparksearch|findlinks|BrowserMob|URL2PNG|ZooShot|GomezA|Google SketchUp|Read%20Later|7Siters|centuryb\.o\.t9|InterNaetBoten|EasyBib AutoCite|Bidtellect|tomnomnom/meg|cortex|Re-re Studio|adreview|AHC/|NameOfAgent|Request-Promise|ALittle Client|Hello,? world|wp_is_mobile|0xAbyssalDoesntExist|Anarchy99|daumoa,damoa,daum,daumos,duamoa,duam,duamos|^revolt|nvd0rz|xfa1|Hakai|gbrmss|fuck-your-hp|IDBTE4M CODE87|Antoine|Insomania|Hells-Net|b3astmode|Linux Gnu \(cow\)|Test Certificate Info|iplabel|Magellan|TheSafex?Internetx?Search|kirkland-signature|^xenu|^ZmEu|^(?:chrome|firefox|Zeus)$'
name: 'Generic Bot'
Expand Down
4 changes: 0 additions & 4 deletions regexes/client/mobile_apps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2179,10 +2179,6 @@
name: 'Yapa'
version: '$1'

- regex: '^Zeno$'
name: 'Radio Zeno'
version: ''

- regex: 'Zune/(\d+\.[.\d]+)'
name: 'Zune'
version: ''
Expand Down

0 comments on commit 00bd479

Please sign in to comment.