at Here
use the link to download extracted raw data, processed data and the generated data. [Dropbox]
If downloaded data in this way, you can go to step 4 directly.
Step1. Data Preparation
Download the UGR16 data from the website, like april.week3.csv.uniqblacklistremoved. Then write it's path to the config.ini
Delete the data under data/ folder for clean purpose.
Step2. Analysis the occurence of the data and extract raw data of some users. The output of this step will be the data/raw_data/*.csv
cd dataset/
`python -e`
Step3. Clean and make up the raw data and generate the cleaned data. Then decide the bin size.
`python -fw'
Step4. Modify the config file to choose the baseline or later use the command line to override
Step5. Modify the config file to choose the validation model and Use KL to validate the generated data.
Refer to the config.ini file. Three categories settings are considered so far.
1. [Default], making sure the basic functionality can work.
userlist: ip users to be considered and learned.
baseline: model choice.
2. [GENERATE], describing generation ralevant issues.
save_to_csv: whether output the generated data to the disk.
gen_users: ip suers that to be generated as the sip.
original_date, gen_daynumber: how many days to be generated and the start date of them.
3. [VALIDATE], describing validation ralevant issues.
test_set: file location that to be tested.
raw_compare, gen_compare: 2 groups of experiments.
| | extract target data from the whole raw data
| | craft data to the cleaned data
| | 3 classes, baseline1&2 class implementations and the father class which defines the interfaces
| | encapsulated experiments entrance
| | test modules
| |+raw_data/: 10 csv files, which refer 10 users data of day1.
| |+clean_data/: 10 csv files, which are supplemented by support info columns, like hour, delta time and #B-1
| |+gen_data/: 2 csv files for baseline1&2. Each one contain 5 days generated data.
| |-exp_record.txt: experiments parameters settings and process time
GMM - 7 cluster for the #B.
Baseline2 rule:
argmax_B { P(B|T, B-1) = P(T|B) * P(B-1|B) * P(B) }
Theoretically, the very first sample to be generated should be the marginal distribution P(B)
Date and Time | Plan |
5/21- 6/7 (2 weeks) | I should finally make the dataset characteristics clear and should already build the first demo for our own approach (NN). |
6/8-7/1 (3 weeks) | refine the NN model and evaluate. |
7/1-7/15(2 weeks) | add novel features to our model. |
7/15-7/31(2 weeks) | implement simple prove-of-applications for our approach. |
8/1-9/1 (4 weeks) | writing for the paper and supplement if the experiments not enough. |