Code repository for the paper Adversarial Deep Learning for Robust Detection of Binary Encoded Malware, A. Al-Dujaili et al., 2018
-
Visualization tool in the paper On Visual Hallmarks of Robustness to Adversarial Malware can be found here.
-
A series of related blog posts can be found here.
-
Dataset can be shared upon request, please fill in the form https://goo.gl/forms/hn5Dfiset1Y1BkMr1 and we will send you a link to the dataset.
All the required packages are specified in the yml files under helper_files
. If you have conda
installed, you can just cd
to the main directory and execute the following with osx_environment.yml
or linux_environment.yml
on OSx or Linux, respectively.
conda env create —f ./helper_files/(osx|linux)_environment.yml
This will create an environment called nn_mal
.
To activate this environment, execute:
source activate nn_mal
Note: If you're going to use losswise, you may run into an issue of one print line whose argument is not enclosed by brackets, just put the brackets if this error shows up and you're good to go.
Note: If you’re running the code on Mac OS with Cuda, then according to Pytorch.org “macOS Binaries dont support CUDA, install from source if CUDA is needed”
- Configure your experiment as desired by modifying the
parameters.ini
file. Among the things you may want to to specify: a - dataset filepath b - gpu device if any c - name of the experiment d - training method (inner maximizer) e - evasion method
Note In case you do not have access to the dataset, you can still run the code on a synthetic dataset with 8-dimensional binary feature vectors, whose bits are set with probability 0.2 for malicious class and 0.8 for benign class.
- Execute
framework.py
python framework.py
Note: the experiments can be all logged and monitored using losswise.
To activate logging, set losswise_api_key
to your API key in parameters.in
and set is_losswise
to True
In order to reproduce the results in the paper, set the filepaths to the malicious and benign saved feature vectors (these can be re-generated with generate_vectors.py
) and execute the run_experiments.py
script
python run_experiments.py
Results (accuracy metrics, bscn measures, and evasion rates) will be populated under (to-be-generated) result_files
directory. On the other hand, the trained models will be saved under helper_files
.
The results can be compiled into LaTeX tables saved under result_files
by runnig the function create_tex_tables()
with the valid filepath to the result files under utils/script_functions.py
. By default, you can do the following
cd utils/
python script_functions.py
NOTE For linux OS, you may run into the trouble of running source
from within Python os.system()
. A workaround is to replace the os.system()
command in run_experiments.py
with the following line:
system('/bin/bash -c "source activate nn_mal;python framework.py”')
If you make use of this code and you'd like to cite us, please consider the following:
@article{al2018adversarial,
title={Adversarial Deep Learning for Robust Detection of Binary Encoded Malware},
author={Al-Dujaili, Abdullah and Huang, Alex and Hemberg, Erik and O'Reilly, Una-May},
journal={arXiv preprint arXiv:1801.02950},
year={2018}
}