Skip to content

Commit 46297ec

Browse files
committed
movad arch
Signed-off-by: Leonardo Rossi <[email protected]>
1 parent b431f7c commit 46297ec

37 files changed

+3003
-0
lines changed

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
output
2+
data
3+
__pycache__
4+
tmp_curve_video.mp4
5+
*.pyc

README.md

+175
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
## [Memory-augmented Online Video Anomaly Detection (MOVAD)](https://arxiv.org/abs/2302.10719)
2+
3+
Official PyTorch implementation of **MOVAD**.
4+
5+
We propose **MOVAD**, a brand new architecture for online (frame-level) video
6+
anomaly detection.
7+
8+
![MOVAD Architurecture](images/arch.jpg)
9+
10+
Authors: Leonardo Rossi, Vittorio Bernuzzi, Tomaso Fontanini,
11+
Massimo Bertozzi, Andrea Prati.
12+
13+
[IMP Lab](http://implab.ce.unipr.it/) -
14+
Dipartimento di Ingegneria e Architettura
15+
16+
University of Parma, Italy
17+
18+
19+
## Abstract
20+
21+
The ability to understand the surrounding scene is of paramount importance
22+
for Autonomous Vehicles (AVs).
23+
24+
This paper presents a system capable to work in a real time guaranteed
25+
response times and online fashion, giving an immediate response to the arise
26+
of anomalies surrounding the AV, exploiting only the videos captured by a
27+
dash-mounted camera.
28+
29+
Our architecture, called MOVAD, relies on two main modules:
30+
a short-term memory to extract information related to the ongoing action,
31+
implemented by a Video Swin Transformer adapted to work in an online scenario,
32+
and a long-term memory module that considers also remote past information
33+
thanks to the use of a Long-Short Term Memory (LSTM) network.
34+
35+
We evaluated the performance of our method on Detection of Traffic Anomaly
36+
(DoTA) dataset, a challenging collection of dash-mounted camera videos of
37+
accidents.
38+
39+
After an extensive ablation study, MOVAD is able to reach an AUC score of
40+
82.11%, surpassing the current state-of-the-art by +2.81 AUC.
41+
42+
43+
## Usage
44+
45+
### Installation
46+
```bash
47+
$ git clone https://github.com/IMPLabUniPr/movad/tree/icip
48+
$ cd movad
49+
$ wget https://github.com/SwinTransformer/storage/releases/download/v1.0.4/swin_base_patch244_window1677_sthv2.pth -O pretrained/swin_base_patch244_window1677_sthv2.pth
50+
$ conda env create -n movad_env --file environment.yml
51+
$ conda activate movad_env
52+
```
53+
54+
# Download DoTa dataset
55+
56+
Please download from [official website](https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly)
57+
the dataset and save inside `data/dota` directory.
58+
59+
You should obtain the following structure:
60+
61+
```
62+
data/dota
63+
├── annotations
64+
│   ├── 0qfbmt4G8Rw_000306.json
65+
│   ├── 0qfbmt4G8Rw_000435.json
66+
│   ├── 0qfbmt4G8Rw_000602.json
67+
│   ...
68+
├── frames
69+
│   ├── 0qfbmt4G8Rw_000072
70+
│   ├── 0qfbmt4G8Rw_000306
71+
│   ├── 0qfbmt4G8Rw_000435
72+
│ .... 
73+
└── metadata
74+
├── metadata_train.json
75+
├── metadata_val.json
76+
├── train_split.txt
77+
└── val_split.txt
78+
```
79+
80+
### Train
81+
```bash
82+
python main.py --config cfgs/v1_1.yml --output output/v1_1/ --phase train --epochs 100 --epoch -1
83+
```
84+
85+
### Eval
86+
```bash
87+
python main.py --config cfgs/v1_1.yml --output output/v1_1/ --phase test --epoch 10
88+
```
89+
90+
### Play: generate video
91+
```bash
92+
python main.py --config cfgs/v1_1.yml --output output/v1_1/ --phase play --epoch 100
93+
```
94+
95+
## Results
96+
97+
### Table 1
98+
99+
Memory modules effectiveness.
100+
101+
| # | Short-term | Long-term | AUC | Conf |
102+
|:---:|:---:|:---:|:---:|:---:|
103+
| 1 | | | 66.53 | [conf](cfgs/v0_1.yml) |
104+
| 2 | X | | 74.46 | [conf](cfgs/v2_3.yml) |
105+
| 3 | | X | 68.76 | [conf](cfgs/v1_1.yml) |
106+
| 4 | X | X | 79.21 | [conf](cfgs/v1_3.yml) |
107+
108+
### Figure 2
109+
110+
Short-term memory module.
111+
112+
| Name | Conf |
113+
|:---:|:---:|
114+
| NF 1 | [conf](cfgs/v1_1.yml) |
115+
| NF 2 | [conf](cfgs/v1_2.yml) |
116+
| NF 3 | [conf](cfgs/v1_3.yml) |
117+
| NF 4 | [conf](cfgs/v1_4.yml) |
118+
| NF 5 | [conf](cfgs/v1_5.yml) |
119+
120+
### Figure 3
121+
122+
Long-term memory module.
123+
124+
| Name | Conf |
125+
|:---:|:---:|
126+
| w/out LSTM | [conf](cfgs/v2_1.yml) |
127+
| LSTM (1 cell) | [conf](cfgs/v2_2.yml) |
128+
| LSTM (2 cells) | [conf](cfgs/v1_3.yml) |
129+
| LSTM (3 cells) | [conf](cfgs/v2_3.yml) |
130+
| LSTM (4 cells) | [conf](cfgs/v2_4.yml) |
131+
132+
### Figure 4
133+
134+
Video clip length (VCL).
135+
136+
| Name | Conf |
137+
|:---:|:---:|
138+
| 4 frames | [conf](cfgs/v3_1.yml) |
139+
| 8 frames | [conf](cfgs/v1_3.yml) |
140+
| 12 frames | [conf](cfgs/v3_2.yml) |
141+
| 16 frames | [conf](cfgs/v3_3.yml) |
142+
143+
### Table 2
144+
145+
Comparison with the state of the art.
146+
147+
| # | Method | Input | AUC | Conf |
148+
|:---:|:---:|:---:|:---:|:---:|
149+
| 9 | Our (MOVAD) | RGB (320x240) | 80.11 | [conf](cfgs/v4_1.yml) |
150+
| 10 | Our (MOVAD) | RGB (640x480) | 82.11 | [conf](cfgs/v4_2.yml) |
151+
152+
## License
153+
154+
See [GPL v2](./LICENSE) License.
155+
156+
## Acknowledgement
157+
158+
This research benefits from the HPC (High Performance Computing) facility
159+
of the University of Parma, Italy.
160+
161+
## Citation
162+
If you find our work useful in your research, please cite:
163+
164+
```
165+
@misc{https://doi.org/10.48550/arxiv.2302.10719,
166+
doi = {10.48550/ARXIV.2302.10719},
167+
url = {https://arxiv.org/abs/2302.10719},
168+
author = {Rossi, Leonardo and Bernuzzi, Vittorio and Fontanini, Tomaso and Bertozzi, Massimo and Prati, Andrea},
169+
keywords = {Computer Vision and Pattern Recognition (cs.CV), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences, F.1.1, 68-02, 68-04, 68-06, 68T07, 68T10, 68T45},
170+
title = {Memory-augmented Online Video Anomaly Detection},
171+
publisher = {arXiv},
172+
year = {2023},
173+
copyright = {Creative Commons Attribution Share Alike 4.0 International}
174+
}
175+
```

cfgs/v0_1.yml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
NF: 1
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
transformer_type: 'swin_base_patch244_window1677_sthv2'
16+
vertical_flip_prob: 0.5

cfgs/v1_1.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 1
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 2
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

cfgs/v1_2.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 2
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 2
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

cfgs/v1_3.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 3
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 2
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 2
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

cfgs/v1_4.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 4
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 2
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

cfgs/v1_5.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 5
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 2
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

cfgs/v2_1.yml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
NF: 3
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
transformer_type: 'swin_base_patch244_window1677_sthv2'
16+
vertical_flip_prob: 0.5

cfgs/v2_2.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 3
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 1
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

cfgs/v2_3.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 3
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 3
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

cfgs/v2_4.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 3
2+
VCL: 8
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 4
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

cfgs/v3_1.yml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
NF: 3
2+
VCL: 4
3+
apply_softmax: true
4+
batch_size: 8
5+
class_weights: (0.3, 0.7)
6+
data_mean: [.5, .5, .5]
7+
data_path: ./data/dota
8+
data_std: [.5, .5, .5]
9+
dataset: Dota
10+
dropout: 0.3
11+
image_shape: [720, 1280]
12+
input_shape: [240, 320]
13+
lr: 0.0001
14+
pretrained: 'pretrained/swin_base_patch244_window1677_sthv2.pth'
15+
rnn_cell_num: 2
16+
rnn_state_size: 1024
17+
transformer_type: 'swin_base_patch244_window1677_sthv2'
18+
vertical_flip_prob: 0.5

0 commit comments

Comments
 (0)