Skip to content

Url Malware analysis with machine learning + volatility plugin to extract url from memory

Notifications You must be signed in to change notification settings

xophidia/MalwareUrlAnalyzer

Repository files navigation

MalwareUrlAnalyzer

MalwareUrlAnalyzer is a tool written in python3 allowing to classify a url or list of url using the following 3 algorithms.

  • LogisticRegression
  • RandomForest
  • Naive bayes

POI : Point Of Interest indicates the interest of this url. If the 3 algorithms predict that it is interesting (1.0) then POI is true
The number of values ​​to 1.0 can be set with the -r option

Features :

  • The training was carried out on "characteristic" urls and not on domains.
    The ipynb file is being "cleaned up" ...

  • MalwareUrlAnalyzer is fast and can parse large datasets in just seconds (27998 urls en 9s)

  • MalwareUrlAnalyzer has only been tested on linux (Ubuntu 20.04) so far.

  • The results can be displayed or exported in json and protobuf format

  • A volatility plugin makes it possible to extract the urls from the memory of the mapped processes and space

It is based on the following dataset :

Install

Structure :

models\ contains zip files and ML models
template \ contains proto file for export
volatility_plugins \ contains Vol plugin and readme

export_pb2.py result of the compilation of the proto file 
MalwareUrlAnalyzer.py Main script

To use the models, you have to unzip them and position them where MalwareUrlAnalyze.py is or use -p option to specify the path.

python MalwareUrlAnalyzer.py URL
python MalwareUrlAnalyzer.py -p models URL
  1. Decompress all models
7z e models_all.zip 
  1. Check file integrity (md5sum models_all.z*)
# zip -s 10m -r -9 models_all.zip lr.joblib nb.joblib randomforest_tfidf.joblib

14ec71a3d2bbefdcf08697e903d60311  models_all.z01
9aba3c3a98c3b7ee512492da6197c65e  models_all.z02
3059a7b8d105162155d74d34193f9d86  models_all.z03
27c49785e4e02917d98f8bec54306006  models_all.z04
fb8815014ff95534183818245edf47e2  models_all.z05
5778deba3d1660a3e856a2f550bffee4  models_all.z06
d44b0286477e9aeeb8b6ec8af4cef029  models_all.zip
  1. Need to install thoses packages :
  • pandas
  • numpy
  • joblib
  • argparse
  • time
  • tqdm
  • scikit-learn
pip install -r requirements.txt 

Use

(base) xophidia@ubuntu:~/Desktop$ python MalwareUrlAnalyzer.py -h
usage: MalwareUrlAnalyzer.py [-h] [-f F] [-p P] [-o] [-i] [strings [strings ...]]

Detect Url Benign/Malware

positional arguments:
  strings     The string to analyze

optional arguments:
  -h, --help  show this help message and exit
  -f F        file to analyze 
  -r {2,3}    Set Point Of Interest (True/False) 2: [2 True] 3: [3 True]
  -p P        where models are. If no -p path is ./
  -o          print result Json format
  -e          Export result into protobuf format - filename export
  -i, --info  Print information

You can analyse one file or one string.

python MalwareUrlAnalyzer.py -p models -f temp.csv

 
 ███▄ ▄███▓ ▄▄▄       ██▓     █     █░ ▄▄▄       ██▀███  ▓█████  █    ██  ██▀███   ██▓     ▄▄▄       ███▄    █  ▄▄▄       ██▓    ▓██   ██▓▒███████▒▓█████  ██▀███  
▓██▒▀█▀ ██▒▒████▄    ▓██▒    ▓█░ █ ░█░▒████▄    ▓██ ▒ ██▒▓█   ▀  ██  ▓██▒▓██ ▒ ██▒▓██▒    ▒████▄     ██ ▀█   █ ▒████▄    ▓██▒     ▒██  ██▒▒ ▒ ▒ ▄▀░▓█   ▀ ▓██ ▒ ██▒
▓██    ▓██░▒██  ▀█▄  ▒██░    ▒█░ █ ░█ ▒██  ▀█▄  ▓██ ░▄█ ▒▒███   ▓██  ▒██░▓██ ░▄█ ▒▒██░    ▒██  ▀█▄  ▓██  ▀█ ██▒▒██  ▀█▄  ▒██░      ▒██ ██░░ ▒ ▄▀▒░ ▒███   ▓██ ░▄█ ▒
▒██    ▒██ ░██▄▄▄▄██ ▒██░    ░█░ █ ░█ ░██▄▄▄▄██ ▒██▀▀█▄  ▒▓█  ▄ ▓▓█  ░██░▒██▀▀█▄  ▒██░    ░██▄▄▄▄██ ▓██▒  ▐▌██▒░██▄▄▄▄██ ▒██░      ░ ▐██▓░  ▄▀▒   ░▒▓█  ▄ ▒██▀▀█▄  
▒██▒   ░██▒ ▓█   ▓██▒░██████▒░░██▒██▓  ▓█   ▓██▒░██▓ ▒██▒░▒████▒▒▒█████▓ ░██▓ ▒██▒░██████▒ ▓█   ▓██▒▒██░   ▓██░ ▓█   ▓██▒░██████▒  ░ ██▒▓░▒███████▒░▒████▒░██▓ ▒██▒
░ ▒░   ░  ░ ▒▒   ▓▒█░░ ▒░▓  ░░ ▓░▒ ▒   ▒▒   ▓▒█░░ ▒▓ ░▒▓░░░ ▒░ ░░▒▓▒ ▒ ▒ ░ ▒▓ ░▒▓░░ ▒░▓  ░ ▒▒   ▓▒█░░ ▒░   ▒ ▒  ▒▒   ▓▒█░░ ▒░▓  ░   ██▒▒▒ ░▒▒ ▓░▒░▒░░ ▒░ ░░ ▒▓ ░▒▓░
░  ░      ░  ▒   ▒▒ ░░ ░ ▒  ░  ▒ ░ ░    ▒   ▒▒ ░  ░▒ ░ ▒░ ░ ░  ░░░▒░ ░ ░   ░▒ ░ ▒░░ ░ ▒  ░  ▒   ▒▒ ░░ ░░   ░ ▒░  ▒   ▒▒ ░░ ░ ▒  ░ ▓███/@Xophidia_2021  ░  ░▒ ░ ▒░
░      ░     ░   ▒     ░ ░     ░   ░    ░   ▒     ░░   ░    ░    ░░░ ░ ░   ░░   ░   ░ ░     ░   ▒      ░   ░ ░   ░   ▒     ░ ░    ▒ ▒ ░░  ░ ░ ░ ░ ░   ░     ░░   ░ 
       ░         ░  ░    ░  ░    ░          ░  ░   ░        ░  ░   ░        ░         ░  ░      ░  ░         ░       ░  ░    ░  ░ ░ ░       ░ ░       ░  ░   ░     
                                                                                 
Based on :
    - https://acris.aalto.fi/ws/portalfiles/portal/16859732/urlset.csv.zip
    - https://www.unb.ca/cic/datasets/url-2016.html
    - extract URL from OTX Alienvault
    - http://205.174.165.80/CICDataset/ISCX-URL-2016/
    
     In the order : ("Naive Bayes | LogisticRegression | RandomForest")
     pynb file : 

Analyse du fichier temp.csv en cours
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [01:45<00:00,  4.58s/it]
                                                    Naive Bayes  Logistic Regression  Random Forest    POI
https://www.jeuxvideo.com                                   0.0                  0.0            1.0  False
https://www.lemonde.fr/pixels/live/2021/07/21/p...          0.0                  0.0            0.0  False
amazon.fr/ref=nav_logo                                      1.0                  1.0            1.0   True
github.com/Invoke-IR/ForensicPosters                        0.0                  0.0            0.0  False
scikit-learn.org/stable/modules/generated/sklea...          0.0                  0.0            0.0  False
jonashartley.com/hilaryolsen/wp-includes/random...          1.0                  1.0            1.0   True
apk.mirror                                                  0.0                  0.0            0.0  False
www.rfc-editor.org/rfc/rfc2350.txt                          0.0                  0.0            0.0  False
cert.pl/en/posts/2018/07/dissecting-smoke-loader/           0.0                  0.0            0.0  False
www.kaggle.com/victorambonati/unsupervised-anom...          0.0                  0.0            1.0  False
xsso.xjpakmdcfuqe.ru/e5718ce3090cb9e30634085055...          1.0                  1.0            1.0   True
081.ftphosting.pw/user81249/4918/0124.txt                   0.0                  0.0            0.0  False
fusu.icu/ajax/7z.php?ext=me                                 1.0                  1.0            1.0   True
keke.icu/ajax/7z.php?ext=me                                 1.0                  1.0            1.0   True
luru.icu/js/filters.php                                     0.0                  1.0            1.0  False
luru.icu/js/facebook.js?1555768638150                       1.0                  1.0            1.0   True
keke.icu/app/7za.exe?id=6986                                1.0                  1.0            1.0   True
www.jeuxvideo.com                                           0.0                  0.0            1.0  False
www.youtube.com/watch?v=55iZ8qFE2MM                         0.0                  0.0            1.0  False
de.letscompareonline.com/cgi-bin/ztEE/                      1.0                  1.0            1.0   True
rakikuma.com/cgi-bin/K/                                     1.0                  1.0            1.0   True
pacificgroup.ws/paradisesuiting.com/closed_module           1.0                  1.0            1.0   True

real	0m5.973s
user	0m5.386s
sys	0m0.574s

python MalwareUrlAnalyzer.py -p models https://www.developpez.com/actu/316946/L-UE-envisage-de-rendre-les-transferts-de-bitcoins-plus-tracables-en-exigeant-la-collecte-d-informations-sur-le-destinataire-et-l-expediteur/

 ███▄ ▄███▓ ▄▄▄       ██▓     █     █░ ▄▄▄       ██▀███  ▓█████  █    ██  ██▀███   ██▓     ▄▄▄       ███▄    █  ▄▄▄       ██▓    ▓██   ██▓▒███████▒▓█████  ██▀███  
▓██▒▀█▀ ██▒▒████▄    ▓██▒    ▓█░ █ ░█░▒████▄    ▓██ ▒ ██▒▓█   ▀  ██  ▓██▒▓██ ▒ ██▒▓██▒    ▒████▄     ██ ▀█   █ ▒████▄    ▓██▒     ▒██  ██▒▒ ▒ ▒ ▄▀░▓█   ▀ ▓██ ▒ ██▒
▓██    ▓██░▒██  ▀█▄  ▒██░    ▒█░ █ ░█ ▒██  ▀█▄  ▓██ ░▄█ ▒▒███   ▓██  ▒██░▓██ ░▄█ ▒▒██░    ▒██  ▀█▄  ▓██  ▀█ ██▒▒██  ▀█▄  ▒██░      ▒██ ██░░ ▒ ▄▀▒░ ▒███   ▓██ ░▄█ ▒
▒██    ▒██ ░██▄▄▄▄██ ▒██░    ░█░ █ ░█ ░██▄▄▄▄██ ▒██▀▀█▄  ▒▓█  ▄ ▓▓█  ░██░▒██▀▀█▄  ▒██░    ░██▄▄▄▄██ ▓██▒  ▐▌██▒░██▄▄▄▄██ ▒██░      ░ ▐██▓░  ▄▀▒   ░▒▓█  ▄ ▒██▀▀█▄  
▒██▒   ░██▒ ▓█   ▓██▒░██████▒░░██▒██▓  ▓█   ▓██▒░██▓ ▒██▒░▒████▒▒▒█████▓ ░██▓ ▒██▒░██████▒ ▓█   ▓██▒▒██░   ▓██░ ▓█   ▓██▒░██████▒  ░ ██▒▓░▒███████▒░▒████▒░██▓ ▒██▒
░ ▒░   ░  ░ ▒▒   ▓▒█░░ ▒░▓  ░░ ▓░▒ ▒   ▒▒   ▓▒█░░ ▒▓ ░▒▓░░░ ▒░ ░░▒▓▒ ▒ ▒ ░ ▒▓ ░▒▓░░ ▒░▓  ░ ▒▒   ▓▒█░░ ▒░   ▒ ▒  ▒▒   ▓▒█░░ ▒░▓  ░   ██▒▒▒ ░▒▒ ▓░▒░▒░░ ▒░ ░░ ▒▓ ░▒▓░
░  ░      ░  ▒   ▒▒ ░░ ░ ▒  ░  ▒ ░ ░    ▒   ▒▒ ░  ░▒ ░ ▒░ ░ ░  ░░░▒░ ░ ░   ░▒ ░ ▒░░ ░ ▒  ░  ▒   ▒▒ ░░ ░░   ░ ▒░  ▒   ▒▒ ░░ ░ ▒  ░ ▓███/@Xophidia_2021  ░  ░▒ ░ ▒░
░      ░     ░   ▒     ░ ░     ░   ░    ░   ▒     ░░   ░    ░    ░░░ ░ ░   ░░   ░   ░ ░     ░   ▒      ░   ░ ░   ░   ▒     ░ ░    ▒ ▒ ░░  ░ ░ ░ ░ ░   ░     ░░   ░ 
       ░         ░  ░    ░  ░    ░          ░  ░   ░        ░  ░   ░        ░         ░  ░      ░  ░         ░       ░  ░    ░  ░ ░ ░       ░ ░       ░  ░   ░     
         
Based on :
    - https://acris.aalto.fi/ws/portalfiles/portal/16859732/urlset.csv.zip
    - https://www.unb.ca/cic/datasets/url-2016.html
    - extract URL from OTX Alienvault
    - http://205.174.165.80/CICDataset/ISCX-URL-2016/
    
     In the order : ("Naive Bayes | LogisticRegression | RandomForest")
     pynb file : 

Analyse de l'url ['https://www.developpez.com/actu/316946/L-UE-envisage-de-rendre-les-transferts-de-bitcoins-plus-tracables-en-exigeant-la-collecte-d-informations-sur-le-destinataire-et-l-expediteur/'] en cours
Naive Bayes  Logistic Regression  Random Forest    POI
0.0                 0.0              0.0         False

License

All the code of the project is licensed under the GNU Lesser General Public License

About

Url Malware analysis with machine learning + volatility plugin to extract url from memory

Resources

Stars

Watchers

Forks

Languages