Skip to content

Commit

Permalink
First upload
Browse files Browse the repository at this point in the history
Source, pretrained models, and experimental setup for SleepEDF-SC and SleepEDF-ST
  • Loading branch information
pquochuy committed Aug 23, 2019
1 parent d236028 commit c02e831
Show file tree
Hide file tree
Showing 469 changed files with 15,310 additions and 3 deletions.
Binary file added .DS_Store
Binary file not shown.
83 changes: 80 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,81 @@
# sleep_transfer_learning
This is the place holder for the source code and the pretrained models associated with the work "Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning"

The source code and the pretrained models are being prepared and will be made available soon.

# Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning

This repository contains source code, pretrained models, and experimental setup in the manuscript:

- Huy Phan, Oliver Y. Chén, Philipp Koch, Zongqing Lu, Ian McLoughlin, Alfred Mertins, and Maarten De Vos. [__Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning.__](https://arxiv.org/abs/1907.13177) _arXiv preprint arXiv:1907.13177_, 2019

<img src="figure/Sleep_Transfer.png" class="center" alt="Sleep Transfer Learning" width="450"/>

## Data Preparation with Matlab:
-------------

### SeqSleepNet
- Change path to `seqsleepnet/`
- Run `preprare_data_sleepedf_sc.m` to prepare SleepEDF-SC data (the path to the data must be provided, refer to the script for comments). The `.mat` files generated are stored in `mat/` directory.
- Run `genlist_sleepedf_sc.m` to generate list of SleepEDF-SC files for network training based on the data split in `data_split_sleepedf_sc.mat`. The files generated are stored in `tf_data/` directory.
- Run `preprare_data_sleepedf_st.m` to prepare SleepEDF-ST data (the path to the data must be provided refer to the script for comments). The `.mat` files generated are stored in `mat/` directory.
- Run `genlist_sleepedf_st.m` to generate list of SleepEDF-ST files for network training based on the data split in `data_split_sleepedf_st.mat`. The files generated are stored in `tf_data/` directory.

### DeepSleepNet (likewise)

## Network training and evaluation with Tensorflow:
-------------
### SeqSleepNet
- Change path to `seqsleepnet/tensorflow/seqsleepnet/`
- Run the example bash scripts:

- `finetune_all.sh`: finetune entire a pretrained network
- `finetune_softmax_SPB.sh`: finetune softmax + sequence processing block (SPB)
- `finetune_softmax_EPB.sh`: finetune softmax + epoch processing block (EPB)
- `finetune_softmax.sh`: finetune softmax
- `train_scratch.sh`: train a network from scratch

_Note_: when the `--pretrained_model` parameter is empty, the network will be trained from scratch. Otherwise, the specified pretrained model will be loaded and finetuned with the finetuning strategy specified in the `--finetune_mode`
### DeepSleepNet (likewise)

_Note_: DeepSleepNet pretrained models are quite heavy. They were uploaded separately and can be downloaded from here: [https://zenodo.org/record/3375235](https://zenodo.org/record/3375235)

## Evaluation
After training/finetuning and testing the network on test data:

- Change path to `seqsleepnet/` or `deepsleepnet/`
- Refer to `examples_evaluation.m` for examples that calculates the performance metrics.

## Some results:
-------------
- Finetuning results with _SeqSleepNet_:

![seqsleepnet_results](figure/seqsleepnet_finetuning.png)

- Finetuning results with _DeepSleepNet_:

![deepsleepnet_results](figure/deepsleepnet_finetuning.png)

Environment:
-------------
- Matlab v7.3 (for data preparation)
- Python3
- Tensorflow GPU versions 1.4 - 1.14 (for network training and evaluation)
- numpy
- scipy
- sklearn
- h5py

## Note on the SleepEDF Expanded Database:

The SleepEDF expanded database can be download from https://physionet.org/content/sleep-edfx/1.0.0/. The latest version of this database contains 153 subjects in the SC subset. This experiment was conducted with the __previous version__ of the SC subset which contains __20 subjects__ intentionally to simulate the situation of a small cohort. If you download the new version, make sure to use 20 subjects __SC400-SC419__.

On the ST subset of the database, the experiments were conducted with 22 placebo recordings. Make sure that you refer to https://physionet.org/content/sleep-edfx/1.0.0/ST-subjects.xls to obtain the right recordings and subjects.

The experiments only used the __in-bed__ parts (from _light off_ time to _light on_ time) of the recordings to avoid the dominance of Wake stage as suggested in

- S. A. Imtiaz and E. Rodriguez-Villegas, __An open-source toolbox for standardized use of PhysioNet Sleep EDF Expanded Database__. _Proc. EMBC_, pp. 6014-6017, 2015.

Meta information (e.g. _light off_ and _light on_ times to extract the __in-bed__ parts data from the whole day-night recordings the meta information is provided in `sleepedfx_meta`.

Contact:
-------------
Huy Phan <br>
Email: huy.phan{at}ieee.org or h.phan{a}kent.ac.uk
Binary file added deepsleepnet/.DS_Store
Binary file not shown.
97 changes: 97 additions & 0 deletions deepsleepnet/compute_sleepedfsc_performance.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
function [acc, f1, kappa, mean_sens, mean_sel] = compute_sleepedfsc_performance(ret_path)

seq_len = 20;
Nfold = 20;
yh = cell(Nfold,1);
yt = cell(Nfold,1);
mat_path = './mat/sleepedf_sc/';
% load data split
load('./data_split_sleepedf_sc.mat');

for fold = 1 : Nfold
fold
test_s = test_sub{fold};
sample_size = [];
for i = 1 : numel(test_s)
i
for night = 1 : 2
sname = ['n', num2str(test_s(i),'%02d'), '_', num2str(night), '_eeg.mat'];
% subject 13 does not have 2 nights
if(~exist([mat_path, sname], 'file'))
continue
end
load([mat_path,sname], 'label');
% this is actual output of the network as we excluded those at the
% recording ends which do not consitute a full sequence
sample_size = [sample_size; numel(label) - (seq_len - 1)];
% pool ground-truth labels of all test subjects
yt{fold} = [yt{fold}; double(label)];
end
end


if(~exist([ret_path, 'n', num2str(fold),'/test_ret.mat'],'file'))
disp('Returned file does not exist:')
disp([ret_path, 'n', num2str(fold),'/test_ret.mat'])
end

load([ret_path, 'n', num2str(fold),'/test_ret.mat']);
% as we shifted by one PSG epoch when generating sequences, L (sequence
% length) decisions are available for each PSG epoch. This segment is
% to aggregate the decisions to derive the final one.
score_ = cell(1,seq_len);
for n = 1 : seq_len
score_{n} = softmax(squeeze(score(n,:,:)));
end
score = score_;
clear score_;

count = 0;
for i = 1 : numel(test_s)
for night = 1 : 2
sname = ['n', num2str(test_s(i),'%02d'), '_', num2str(night), '_eeg.mat'];
if(~exist([mat_path, sname], 'file'))
continue
end
count = count + 1;
% start and end positions of current test subject's output
start_pos = sum(sample_size(1:count-1)) + 1;
end_pos = sum(sample_size(1:count-1)) + sample_size(count);
score_i = cell(1,seq_len);
for n = 1 : seq_len
score_i{n} = score{n}(start_pos:end_pos, :);
N = size(score_i{n},1);
% padding ones for those positions not constituting full
% sequences
score_i{n} = [ones(seq_len-1,5); score{n}(start_pos:end_pos, :)];
score_i{n} = circshift(score_i{n}, -(seq_len - n), 1);
end

% multiplicative probabilistic smoothing for aggregation
% which equivalent to summation in log domain
fused_score = log(score_i{1});
for n = 2 : seq_len
fused_score = fused_score + log(score_i{n});
end

% the final output labels via likelihood maximization
yhat = zeros(1,size(fused_score,1));
for k = 1 : size(fused_score,1)
[~, yhat(k)] = max(fused_score(k,:));
end

% pool outputs of all test subjects
yh{fold} = [yh{fold}; double(yhat')];
end
end
end

yh = cell2mat(yh);
yt = cell2mat(yt);

[acc, kappa, f1, ~, spec] = calculate_overall_metrics(yt, yh);
[sens, sel] = calculate_classwise_sens_sel(yt, yh);
mean_sens = mean(sens);
mean_sel = mean(sel);
end

93 changes: 93 additions & 0 deletions deepsleepnet/compute_sleepedfst_performance.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
function [acc, f1, kappa, mean_sens, mean_sel] = compute_sleepedfst_performance(ret_path)

seq_len = 20;
Nfold = 11;
yh = cell(Nfold,1);
yt = cell(Nfold,1);
mat_path = './mat/sleepedf_st/';
% load data split
load('./data_split_sleepedf_st.mat');

for fold = 1 : Nfold
fold
test_s = test_sub{fold};
sample_size = [];
for i = 1 : numel(test_s)
i
sname = ['n', num2str(test_s(i),'%02d'), '_eeg.mat'];
if(~exist([mat_path, sname], 'file'))
continue
end
load([mat_path,sname], 'label');
% this is actual output of the network as we excluded those at the
% recording ends which do not consitute a full sequence
sample_size = [sample_size; numel(label) - (seq_len - 1)];
% pool ground-truth labels of all test subjects
yt{fold} = [yt{fold}; double(label)];
end


if(~exist([ret_path, 'n', num2str(fold),'/test_ret.mat'],'file'))
disp('Returned file does not exist:')
disp([ret_path, 'n', num2str(fold),'/test_ret.mat'])
end

load([run_path, 'n', num2str(fold),'/test_ret.mat']);
% as we shifted by one PSG epoch when generating sequences, L (sequence
% length) decisions are available for each PSG epoch. This segment is
% to aggregate the decisions to derive the final one.
score_ = cell(1,seq_len);
for n = 1 : seq_len
score_{n} = softmax(squeeze(score(n,:,:)));
end
score = score_;
clear score_;

count = 0;
for i = 1 : numel(test_s)
sname = ['n', num2str(test_s(i),'%02d'), '_eeg.mat'];
if(~exist([mat_path, sname], 'file'))
continue
end
count = count + 1;
% start and end positions of current test subject's output
start_pos = sum(sample_size(1:count-1)) + 1;
end_pos = sum(sample_size(1:count-1)) + sample_size(count);
score_i = cell(1,seq_len);
%valid_ind = cell(1,seq_len);
for n = 1 : seq_len
score_i{n} = score{n}(start_pos:end_pos, :);
N = size(score_i{n},1);
% padding ones for those positions not constituting full
% sequences
score_i{n} = [ones(seq_len-1,5); score{n}(start_pos:end_pos, :)];
score_i{n} = circshift(score_i{n}, -(seq_len - n), 1);
end

% multiplicative probabilistic smoothing for aggregation
% which equivalent to summation in log domain
fused_score = log(score_i{1});
for n = 2 : seq_len
fused_score = fused_score + log(score_i{n});
end

% the final output labels via likelihood maximization
yhat = zeros(1,size(fused_score,1));
for k = 1 : size(fused_score,1)
[~, yhat(k)] = max(fused_score(k,:));
end

% pool outputs of all test subjects
yh{fold} = [yh{fold}; double(yhat')];
end
end

yh = cell2mat(yh);
yt = cell2mat(yt);

[acc, kappa, f1, ~, spec] = calculate_overall_metrics(yt, yh);
[sens, sel] = calculate_classwise_sens_sel(yt, yh);
mean_sens = mean(sens);
mean_sel = mean(sel);
end

Binary file added deepsleepnet/data_split_sleepedf_sc.mat
Binary file not shown.
Binary file added deepsleepnet/data_split_sleepedf_st.mat
Binary file not shown.
25 changes: 25 additions & 0 deletions deepsleepnet/examples_evaluation.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
%%
% Examples on how to evaluate the performance
%%
clear all
close all
clc

addpath('../metrics');

%% Example 1
% path to tensorflow experiments with SleepEDF-SC and the network output saved in
% test_ret.mat
% finetuning 2chan EEG+EOG experiment is used as the example here
ret_path = './tensorflow/seqsleepnet/finetune_all_2chan/sleepedf_sc/';

[acc, f1, kappa, mean_sens, mean_sel] = compute_sleepedfsc_performance(ret_path);


%% Example 2
% path to tensorflow experiments with SleepEDF-ST and the network output saved in
% test_ret.mat
% finetuning 2chan EEG+EOG experiment is used as the example here
ret_path = './tensorflow/seqsleepnet/finetune_all_2chan/sleepedf_st/';

[acc, f1, kappa, mean_sens, mean_sel] = compute_sleepedfsc_performance(ret_path);
Loading

0 comments on commit c02e831

Please sign in to comment.