Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds detection for various libraries, bots and operating systems #7498

Merged
merged 30 commits into from
Nov 14, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
afec4fa
Adds detection for HTML Parser
liviuconcioiu Oct 31, 2023
a94bd66
Improves detection for Python urllib
liviuconcioiu Oct 31, 2023
72522d8
Adds detection for msray
liviuconcioiu Oct 31, 2023
c152c3e
Adds detection for Slim
liviuconcioiu Oct 31, 2023
6f91cdd
Adds detection for Fuzz Faster U Fool
liviuconcioiu Oct 31, 2023
835c83b
Adds detection for Matomo
liviuconcioiu Oct 31, 2023
e9344c7
Improves detection for generic bots
liviuconcioiu Oct 31, 2023
6fd5560
Adds detection for Prometheus
liviuconcioiu Oct 31, 2023
6288ae7
Improves detection for generic bots
liviuconcioiu Oct 31, 2023
6b885b0
Adds detection for ArchiveBot
liviuconcioiu Oct 31, 2023
3005171
Adds detection for MADBbot
liviuconcioiu Oct 31, 2023
34d38f0
Adds detection for Kali
liviuconcioiu Oct 31, 2023
bacc7a8
Adds detection for Oracle Linux
liviuconcioiu Oct 31, 2023
c0e2d82
Improves version detection for TencentOS
liviuconcioiu Oct 31, 2023
0cc4ed6
Improves version detection for CentOS
liviuconcioiu Oct 31, 2023
526f99e
Merge branch 'master' into devices
liviuconcioiu Oct 31, 2023
d083876
Merge branch 'master' into devices
liviuconcioiu Nov 7, 2023
101a380
Move links from comment to url and update some links
liviuconcioiu Nov 7, 2023
56f7b80
Fix regex for Oracle Linux
liviuconcioiu Nov 7, 2023
62c436c
Fix regex for CentOS
liviuconcioiu Nov 7, 2023
977172b
Improve detection for generic bots
liviuconcioiu Nov 7, 2023
ecebf68
Revert "Fix regex for CentOS"
liviuconcioiu Nov 7, 2023
2da1748
Revert "Improves version detection for CentOS"
liviuconcioiu Nov 7, 2023
6d7128c
Change names
liviuconcioiu Nov 7, 2023
c11f5e9
Merge branch 'master' into devices
liviuconcioiu Nov 7, 2023
ec5f393
Merge branch 'master' into devices
liviuconcioiu Nov 9, 2023
f88b6c2
Remove newline
liviuconcioiu Nov 9, 2023
9d586f7
Merge branch 'master' into devices
liviuconcioiu Nov 13, 2023
2030287
Merge branch 'master' into devices
liviuconcioiu Nov 13, 2023
fd832d5
Improves version detection for iOS and macOS
liviuconcioiu Nov 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion Parser/OperatingSystem.php
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ class OperatingSystem extends AbstractParser
'INF' => 'Inferno',
'JME' => 'Java ME',
'KOS' => 'KaiOS',
'KAL' => 'Kali',
'KAN' => 'Kanotix',
'KNO' => 'Knoppix',
'KTV' => 'KreaTV',
Expand Down Expand Up @@ -119,6 +120,7 @@ class OperatingSystem extends AbstractParser
'OBS' => 'OpenBSD',
'OWR' => 'OpenWrt',
'OTV' => 'Opera TV',
'ORA' => 'Oracle Linux',
'ORD' => 'Ordissimo',
'PAR' => 'Pardus',
'PCL' => 'PCLinuxOS',
Expand Down Expand Up @@ -203,7 +205,7 @@ class OperatingSystem extends AbstractParser
'ORD', 'TOS', 'RSO', 'DEE', 'FRE', 'MAG', 'FEN', 'CAI', 'PCL', 'HAS',
'LOS', 'DVK', 'ROK', 'OWR', 'OTV', 'KTV', 'PUR', 'PLA', 'FUC', 'PAR',
'FOR', 'MON', 'KAN', 'ZEN', 'LND', 'LNS', 'CHN', 'AMZ', 'TEN', 'CST',
'NOV', 'ROU', 'ZOR', 'RED',
'NOV', 'ROU', 'ZOR', 'RED', 'KAL', 'ORA',
],
'Mac' => ['MAC'],
'Mobile Gaming Console' => ['PSP', 'NDS', 'XBX'],
Expand Down
31 changes: 30 additions & 1 deletion Tests/Parser/Client/fixtures/library.yml
Original file line number Diff line number Diff line change
Expand Up @@ -431,7 +431,6 @@
type: library
name: cri-o
version: 1.16.1

-
user_agent: go-containerregistry/v0.11.0
client:
Expand Down Expand Up @@ -528,3 +527,33 @@
type: library
name: Axios
version: "1.2.0"
-
user_agent: HTMLParser/1.6
client:
type: library
name: HTML Parser
version: "1.6"
-
user_agent: python-urllib3/1.26.9
client:
type: library
name: Python urllib
version: 1.26.9
-
user_agent: msray-plus
client:
type: library
name: msray
liviuconcioiu marked this conversation as resolved.
Show resolved Hide resolved
version: ""
-
user_agent: Slim Framework
client:
type: library
name: Slim
liviuconcioiu marked this conversation as resolved.
Show resolved Hide resolved
version: ""
-
user_agent: Fuzz Faster U Fool v1.5.0-dev
client:
type: library
name: Fuzz Faster U Fool
liviuconcioiu marked this conversation as resolved.
Show resolved Hide resolved
version: 1.5.0
26 changes: 25 additions & 1 deletion Tests/Parser/fixtures/oss.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3356,7 +3356,7 @@
os:
name: TencentOS
short_name: TEN
version: 4.14.105
version: "3"
platform:
family: GNU/Linux
-
Expand Down Expand Up @@ -3926,3 +3926,27 @@
version: "14.1"
platform:
family: Mac
-
user_agent: python-requests/2.7.0 CPython/2.7.15 Linux/4.16.0-kali2-amd64
os:
name: Kali
short_name: KAL
version: "2"
platform:
family: GNU/Linux
-
user_agent: python-requests/2.6.0 CPython/2.7.5 Linux/4.1.12-124.15.4.el7uek.x86_64
os:
name: Oracle Linux
short_name: ORA
version: "7"
platform: x64
family: GNU/Linux
-
user_agent: python-requests/2.7.0 CPython/2.7.3 Linux/2.6.18-308.el5
os:
name: CentOS
short_name: CES
version: "5"
platform:
family: GNU/Linux
45 changes: 45 additions & 0 deletions Tests/fixtures/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5828,3 +5828,48 @@
name: phpMyAdmin
category: Service Agent
url: https://www.phpmyadmin.net/
-
user_agent: Matomo/4.15.1
bot:
name: Matomo
category: Service Agent
url: https://github.com/matomo-org/matomo
producer:
name: InnoCraft Ltd
url: https://matomo.org/
-
user_agent: CustomUserAgent/1.0
bot:
name: Generic Bot
-
user_agent: Prometheus/2.40.5
bot:
name: Prometheus
category: Service Agent
url: https://github.com/prometheus/prometheus
producer:
name: The Linux Foundation
url: https://www.cncf.io/
-
user_agent: firefox
bot:
name: Generic Bot
-
user_agent: Chrome
bot:
name: Generic Bot
-
user_agent: ArchiveTeam ArchiveBot/20220523.4a672db (wpull 2.0.3) and not Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36
bot:
name: ArchiveBot
category: Crawler
url: https://wiki.archiveteam.org/index.php?title=ArchiveBot
producer:
name: ArchiveTeam
url: https://wiki.archiveteam.org/
-
user_agent: MADBbot/0.1 (Gathering webpages for data analytics; https://madb.zapto.org/bot.html; [email protected])
bot:
name: MADBbot
category: Crawler
url: https://madb.zapto.org/bot.html
33 changes: 31 additions & 2 deletions regexes/bots.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2035,7 +2035,7 @@
- regex: 'RSSRadio \(Push Notification Scanner;support@dorada\.co\.uk\)'
name: 'RSSRadio Bot'

- regex: '(A6-Indexer|nuhk|TsolCrawler|Yammybot|Openbot|Gulper Web Bot|grub-client|Download Demon|SearchExpress|Microsoft URL Control|borg|altavista|dataminr.com|tweetedtimes.com|TrendsmapResolver|teoma|blitzbot|oegp|furlbot|http%20client|polybot|htdig|mogimogi|larbin|scrubby|searchsight|seekbot|semanticdiscovery|snappy|vortex(?!(?: Build|Plus))|zeal(?!ot)|fast-webcrawler|converacrawler|dataparksearch|findlinks|BrowserMob|HttpMonitor|ThumbShotsBot|URL2PNG|ZooShot|GomezA|Google SketchUp|Read%20Later|RackspaceBot|robots|SeopultContentAnalyzer|7Siters|centuryb.o.t9|InterNaetBoten|EasyBib AutoCite|Bidtellect|tomnomnom/meg|My User Agent|cortex|CF-UC User Agent|Re-re Studio|adreview|AHC/|NameOfAgent|Request-Promise|ALittle Client|Hello,? world|wp_is_mobile|0xAbyssalDoesntExist|Anarchy99|daumoa,damoa,daum,daumos,duamoa,duam,duamos|^revolt|nvd0rz|xfa1|Hakai|gbrmss|fuck-your-hp|IDBTE4M CODE87|Antoine|Insomania|Hells-Net|b3astmode|Linux Gnu \(cow\)|custom_user_agent|Test Certificate Info|iplabel|Magellan)'
- regex: '(A6-Indexer|nuhk|TsolCrawler|Yammybot|Openbot|Gulper Web Bot|grub-client|Download Demon|SearchExpress|Microsoft URL Control|borg|altavista|dataminr.com|tweetedtimes.com|TrendsmapResolver|teoma|blitzbot|oegp|furlbot|http%20client|polybot|htdig|mogimogi|larbin|scrubby|searchsight|seekbot|semanticdiscovery|snappy|vortex(?!(?: Build|Plus))|zeal(?!ot)|fast-webcrawler|converacrawler|dataparksearch|findlinks|BrowserMob|HttpMonitor|ThumbShotsBot|URL2PNG|ZooShot|GomezA|Google SketchUp|Read%20Later|RackspaceBot|robots|SeopultContentAnalyzer|7Siters|centuryb.o.t9|InterNaetBoten|EasyBib AutoCite|Bidtellect|tomnomnom/meg|My User Agent|cortex|CF-UC User Agent|Re-re Studio|adreview|AHC/|NameOfAgent|Request-Promise|ALittle Client|Hello,? world|wp_is_mobile|0xAbyssalDoesntExist|Anarchy99|daumoa,damoa,daum,daumos,duamoa,duam,duamos|^revolt|nvd0rz|xfa1|Hakai|gbrmss|fuck-your-hp|IDBTE4M CODE87|Antoine|Insomania|Hells-Net|b3astmode|Linux Gnu \(cow\)|custom_user_agent|Test Certificate Info|iplabel|Magellan|CustomUserAgent)'
name: 'Generic Bot'

- regex: '^sentry'
Expand Down Expand Up @@ -3472,6 +3472,35 @@
category: 'Service Agent'
url: 'https://www.phpmyadmin.net/'

- regex: 'Matomo/([\d+.]+)'
sanchezzzhak marked this conversation as resolved.
Show resolved Hide resolved
name: 'Matomo'
category: 'Service Agent'
url: 'https://github.com/matomo-org/matomo'
producer:
name: 'InnoCraft Ltd'
url: 'https://matomo.org/'

- regex: 'Prometheus/([\d+.]+)'
name: 'Prometheus'
category: 'Service Agent'
url: 'https://github.com/prometheus/prometheus'
producer:
name: 'The Linux Foundation'
url: 'https://www.cncf.io/'

- regex: 'ArchiveTeam ArchiveBot'
name: 'ArchiveBot'
category: 'Crawler'
url: 'https://wiki.archiveteam.org/index.php?title=ArchiveBot'
producer:
name: 'ArchiveTeam'
url: 'https://wiki.archiveteam.org/'

- regex: 'MADBbot/([\d+.]+)'
name: 'MADBbot'
category: 'Crawler'
url: 'https://madb.zapto.org/bot.html'

# Generic detections
- regex: '[a-z0-9\-_]*((?<!cu|power[ _]|m[ _])bot(?![ _]TAB|[ _]?5[0-9]|[ _]Senior|[ _]Junior)|crawler|crawl|checker|archiver|transcoder|spider)([^a-z]|$)'
- regex: '[a-z0-9\-_]*((?<!cu|power[ _]|m[ _])bot(?![ _]TAB|[ _]?5[0-9]|[ _]Senior|[ _]Junior)|crawler|crawl|checker|archiver|transcoder|spider|^firefox|^chrome)([^a-z]|$)'
name: 'Generic Bot'
68 changes: 51 additions & 17 deletions regexes/client/libraries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,75 +5,109 @@
# @license http://www.gnu.org/licenses/lgpl.html LGPL v3 or later
###############

- regex: 'Fuzz Faster U Fool v(\d+[\.\d]+)'
name: 'Fuzz Faster U Fool'
version: '$1'
url: 'https://github.com/ffuf/ffuf'

- regex: 'Slim Framework'
name: 'Slim'
version: ''
url: 'https://www.slimframework.com/'

- regex: 'msray-plus'
name: 'msray'
version: ''
url: 'https://github.com/super-l/msray'

- regex: 'HTMLParser(?:/(\d+[\.\d]+))?'
name: 'HTML Parser'
version: '$1'
url: 'https://htmlparser.sourceforge.net/'

# got - a nodejs library
- regex: '^got(?:/(\d+\.[.\d]+))? \('
name: 'got'
version: '$1'
url: 'https://github.com/sindresorhus/got'

# Typhoeus (https://github.com/typhoeus/typhoeus)
# Typhoeus
- regex: 'Typhoeus'
name: 'Typhoeus'
version: ''
url: 'https://github.com/typhoeus/typhoeus'

# req (https://github.com/imroc/req)
# req
- regex: 'req/v([\.\d]+)'
name: 'req'
version: '$1'
url: 'https://github.com/imroc/req'

# quic-go (https://github.com/lucas-clemente/quic-go)
# quic-go
- regex: 'quic-go-HTTP/3'
name: 'quic-go'
version: ''
url: 'https://github.com/lucas-clemente/quic-go'

# Azure Data Factory (https://azure.microsoft.com/en-us/products/data-factory/)
# Azure Data Factory
- regex: 'azure-data-factory(?:/(\d+[\.\d]+))?'
name: 'Azure Data Factory'
version: '$1'
url: 'https://azure.microsoft.com/en-us/products/data-factory/'

# Dart (https://dart.dev/)
# Dart
- regex: 'Dart(?:/(\d+[\.\d]+))?'
name: 'Dart'
version: '$1'
url: 'https://dart.dev/'

# r-curl (https://github.com/jeroen/curl)
# r-curl
- regex: 'r-curl(?:/(\d+[\.\d]+))?'
name: 'r-curl'
version: '$1'
url: 'https://github.com/jeroen/curl'

# HTTPX (https://www.python-httpx.org/)
# HTTPX
- regex: 'python-httpx(?:/(\d+[\.\d]+))?'
name: 'HTTPX'
version: '$1'
url: 'https://www.python-httpx.org/'

# fasthttp (https://github.com/valyala/fasthttp)
# fasthttp
- regex: 'fasthttp(?:/(\d+[\.\d]+))?'
name: 'fasthttp'
version: '$1'
url: 'https://github.com/valyala/fasthttp'

# GeoIP Update (https://github.com/maxmind/geoipupdate)
# GeoIP Update
- regex: 'geoipupdate(?:/(\d+[\.\d]+))?'
name: 'GeoIP Update'
version: '$1'
url: 'https://github.com/maxmind/geoipupdate'

# PHP cURL Class (https://github.com/php-curl-class/php-curl-class)
# PHP cURL Class
- regex: 'PHP-Curl-Class(?:/(\d+[\.\d]+))?'
name: 'PHP cURL Class'
version: '$1'
url: 'https://github.com/php-curl-class/php-curl-class'

# cPanel HTTP Client (https://www.cpanel.net/)
# cPanel HTTP Client
- regex: 'Cpanel-HTTP-Client(?:/(\d+[\.\d]+))?'
name: 'cPanel HTTP Client'
version: '$1'
url: 'https://www.cpanel.net/'

# AnyEvent HTTP (http://software.schmorp.de/pkg/AnyEvent)
# AnyEvent HTTP
- regex: 'AnyEvent-HTTP(?:/(\d+[\.\d]+))?'
name: 'AnyEvent HTTP'
version: '$1'
url: 'http://software.schmorp.de/pkg/AnyEvent'

# SlimerJS (https://www.slimerjs.org/)
# SlimerJS
- regex: 'SlimerJS/(\d+[\.\d]+)'
name: 'SlimerJS'
version: '$1'
url: 'https://www.slimerjs.org/'

- regex: 'Wget(?:/(\d+[\.\d]+))?'
name: 'Wget'
Expand Down Expand Up @@ -101,7 +135,7 @@
version: '$1'
url: 'https://pypi.org/project/httplib2/'

- regex: 'Python-urllib(?:/?(\d+[\.\d]+))?'
- regex: 'Python-urllib3?(?:/?(\d+[\.\d]+))?'
name: 'Python urllib'
version: '$1'

Expand Down Expand Up @@ -142,12 +176,12 @@
- regex: 'HTTP_Request2(?:/(\d+[\.\d]+))?'
name: 'HTTP_Request2'
version: '$1'
url: 'http://pear.php.net/package/http_request2'
url: 'https://pear.php.net/package/http_request2'

- regex: 'Mechanize(?:/(\d+[\.\d]+))?'
name: 'Mechanize'
version: '$1'
url: 'http://github.com/sparklemotion/mechanize/'
url: 'https://github.com/sparklemotion/mechanize'

- regex: 'aiohttp(?:/(\d+[\.\d]+))?'
name: 'aiohttp'
Expand Down Expand Up @@ -188,7 +222,7 @@
- regex: 'RestSharp/(\d+[\.\d]+)'
name: 'RestSharp'
version: '$1'
url: 'http://restsharp.org/'
url: 'https://github.com/restsharp/RestSharp'

- regex: 'scalaj-http/(\d+[\.\d]+)'
name: 'ScalaJ HTTP'
Expand Down
18 changes: 16 additions & 2 deletions regexes/oss.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,20 @@
# @license http://www.gnu.org/licenses/lgpl.html LGPL v3 or later
###############

##########
# Oracle Linux (https://www.oracle.com/linux/)
##########
- regex: '.+.el(\d+(?:[_\.]\d+)*)uek'
liviuconcioiu marked this conversation as resolved.
Show resolved Hide resolved
name: 'Oracle Linux'
version: '$1'

##########
# Kali (https://www.kali.org/)
##########
- regex: '.+kali(\d)'
name: 'Kali'
version: '$1'

##########
# PICO OS (https://www.picoxr.com/global/software/pico-os)
##########
Expand Down Expand Up @@ -50,7 +64,7 @@
##########
# TencentOS (https://github.com/Tencent/TencentOS-kernel)
##########
- regex: 'Linux/(\d+[\.\d]*).+tlinux'
- regex: '.+tlinux(\d)'
name: 'TencentOS'
version: '$1'

Expand Down Expand Up @@ -555,7 +569,7 @@
name: 'CentOS Stream'
version: '$1'

- regex: '.+.el(\d+(?:[_\.]\d+)*).(?:centos|x86_64)'
- regex: '.+.el(\d+(?:[_\.]\d+)*)'
liviuconcioiu marked this conversation as resolved.
Show resolved Hide resolved
name: 'CentOS'
version: '$1'

Expand Down
Loading