The goal of this project is to create a learning system that returns location of fundamental heart sounds (S1 and S2)
To train the data, we need annotated labels for S1 and S2 in human heart sounds. We were lucky enough to get medical data from physionet. All of it can be found here
We used two types: a mobilenetv3+SSD structure and a custom-made VIT structure using Transformer's Encoder.
-
used python version 3.9.7 & cuda_11.4
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
-
VIT
If the temporary VIT model performed well, we planned to learn the auscultated locations using the cls token through additional labeling, but it is currently showing underfitting.
The code for the Training and testing can be found in python/example.ipynb.
- When we created our custom dataset, we started with data label 1.
We set the train,valid,test set in the ratio 0.6,0.2,0.2 and the model performance was plucked based on the test set.
We haven't trained on all parameters, but here's what we've seen so far.
Type | SR | Channels | Filter | Time masking | Pretrain(ImageNet) | Aug | X(wl=sr/x) | mAP0.5 |
---|---|---|---|---|---|---|---|---|
MnetSSD | 8000 | 1 | High, Low | ✔ | ✔ | 15 | 0.802 |