-
Notifications
You must be signed in to change notification settings - Fork 0
/
Commit_Meaning
177 lines (90 loc) · 4 KB
/
Commit_Meaning
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
- Goal of this File:
---------------------
- This file contains a description of the commits made in this
repository.
- Sorted (in time) from oldest to newest.
- Note: any commit relative to the code, it will be denoted by
'commit name'_code.
*************** Commits Description **************
1) creating_sample_code:
--------------------------------
Till this point, I managed to create 1 sample from the csv files
of SpecCence data.
The sample is of the form: 29 x width , where width is the number of
lines in the csv files, then reshape this sample to a column vector
- (29 x width),1
- recall that each line represenets a widnow of 125 ms
I didn't check yet the gaps in time between each window.
To be done in the next phase.
---------------------------------------------------------------
2) adding_header_none_code:
------------------------------
Need to add header = None in case the csv file
don't have header.
3) data_per_csv_file_code:
-----------------------------
- The construction of the data per csv file / per hour
is done
- the dataloader also work
4) saving_data_npy:
--------------------
In this step, I have created a function which takes the csv file (1 hour duration),
check for the time continuity of the frames, and save these data frame in npy format
(numpy data type).
5) Organizing_to_class:
--------------------------
Transform the helper_function into a class and do a cleaning
for the main_data_building.
6) Adding_feature_make_directory:
----------------------------------
In the class 'Class_Dataset_Construction.py', adding option
of automatically creaing the data set folder and check if it exists
or not.
Also, fix some bug concerning file_name variable:
putting it before the loop
7) Data_Scannnig_Construction:
----------------------
The Main_Data_Scannning.py:
This file take the data set directory and check for file existence in all
sensors directories.
It uses the itertool module so we can escape from nested loop while testing
for file existence in different sensors direcotries.
It is just a demo to see how itertoo.product work.
Then I have refactor the class 'Class_Dataset_Construction.py' so it takes into account:
- the creation of directory/sensor_name/train_spec_id.npy
- where sensor_name is also a subdiractory
- train_spec_id.npy naming system: train_spec_day_hour_slice
In conclusion, the class 'Class_Dataset_Construction.py' creates proper dataset per sensor, all in one root directory, where its name
is chosen by the user.
8) Data_Building_small_changes:
------------------------------------
Instead of creating multiple direcotries for each sensor, I have put the sensor id also in .npy file
sensor id is created using the command np.full(self.__width, sensor_index),
where we create a vector of length self.__width, filled with values equal
to sensor_index (which is an integer number).
So we have only 1 global directory for the created data set.
Naming convention:
- train_id_xx.npy for the sensor index of length (10^4)
- train_time_xx.npy for the time stamp of length (10^4)
- train_spec_xx.npy for the spectrograms of length (10^4 x 29)
where xx = SensorIndex_month_day_hour_slice
9) Data_Building_multiple_csv_files:
------------------------------------
Extending the 'Class_Dataset_Construction.py' to handle multiple csv files
per 1 day.
10) Audio2Vec_Encoder_Part
----------------------------
In this part I have done the encoder part of audio2vec.
Having some issue with the size of the data when passing through the conv layer.
Working on solving it.
11) Audio2Vec_Encoder_Part_2
---------------------------------
Issue of the dimension has been solved.
Now committing the solution.
How I solve it:
I have added a padding option for the convolution operation
so we don't loose dimension.
Also in the max pooling operation, I have made the kernel size of 1 with
stride of 2, so we can reduce the frequency dimension only and not the time
dimension, since the time dimension is 1 already and nothing to be
reduced.