First upload

Source, pretrained models, and experimental setup for SleepEDF-SC and SleepEDF-ST
pquochuy · Aug 23, 2019 · c02e831 · c02e831
1 parent d236028
commit c02e831
Show file tree

Hide file tree

Showing 469 changed files with 15,310 additions and 3 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/README.md b/README.md
@@ -1,4 +1,81 @@
-# sleep_transfer_learning
-This is the place holder for the source code and the pretrained models associated with the work "Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning"
 
-The source code and the pretrained models are being prepared and will be made available soon.
+
+# Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning
+
+This repository contains source code, pretrained models, and experimental setup in the manuscript:
+
+- Huy Phan, Oliver Y. Chén, Philipp Koch, Zongqing Lu, Ian McLoughlin, Alfred Mertins, and Maarten De Vos. [__Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning.__](https://arxiv.org/abs/1907.13177) _arXiv preprint arXiv:1907.13177_, 2019
+
+<img src="figure/Sleep_Transfer.png" class="center" alt="Sleep Transfer Learning" width="450"/>
+
+## Data Preparation with Matlab:
+-------------
+
+### SeqSleepNet
+- Change path to `seqsleepnet/`
+- Run `preprare_data_sleepedf_sc.m` to prepare SleepEDF-SC data (the path to the data must be provided, refer to the script for comments). The `.mat` files generated are stored in `mat/` directory.
+- Run `genlist_sleepedf_sc.m` to generate list of SleepEDF-SC files for network training based on the data split in  `data_split_sleepedf_sc.mat`. The files generated are stored in `tf_data/` directory.
+- Run `preprare_data_sleepedf_st.m` to prepare SleepEDF-ST data (the path to the data must be provided refer to the script for comments). The `.mat` files generated are stored in `mat/` directory.
+- Run `genlist_sleepedf_st.m` to generate list of SleepEDF-ST files for network training based on the data split in  `data_split_sleepedf_st.mat`. The files generated are stored in `tf_data/` directory.
+
+### DeepSleepNet (likewise)
+
+## Network training and evaluation with Tensorflow:
+-------------
+### SeqSleepNet
+- Change path to `seqsleepnet/tensorflow/seqsleepnet/`
+- Run the example bash scripts:
+
+	- `finetune_all.sh`: finetune entire a pretrained network
+	- `finetune_softmax_SPB.sh`: finetune softmax + sequence processing block (SPB)
+	- `finetune_softmax_EPB.sh`: finetune  softmax + epoch processing block (EPB)
+	- `finetune_softmax.sh`: finetune softmax
+	- `train_scratch.sh`: train a network from scratch
+
+_Note_: when the `--pretrained_model` parameter is empty, the network will be trained from scratch. Otherwise, the specified pretrained model will be loaded and finetuned with the finetuning strategy specified in the `--finetune_mode` 
+### DeepSleepNet (likewise)
+
+_Note_: DeepSleepNet pretrained models are quite heavy. They were uploaded separately and can be downloaded from here: [https://zenodo.org/record/3375235](https://zenodo.org/record/3375235) 
+
+## Evaluation
+After training/finetuning and testing the network on test data:
+
+- Change path to `seqsleepnet/` or `deepsleepnet/`
+- Refer to `examples_evaluation.m` for examples that calculates the performance metrics.
+
+## Some results:
+-------------
+- Finetuning results with _SeqSleepNet_:
+
+![seqsleepnet_results](figure/seqsleepnet_finetuning.png)
+
+- Finetuning results with _DeepSleepNet_:
+
+![deepsleepnet_results](figure/deepsleepnet_finetuning.png)
+
+Environment:
+-------------
+- Matlab v7.3 (for data preparation)
+- Python3
+- Tensorflow GPU versions 1.4 - 1.14  (for network training and evaluation)
+- numpy
+- scipy
+- sklearn
+- h5py
+
+## Note on the SleepEDF Expanded Database:
+
+The SleepEDF expanded database can be download from https://physionet.org/content/sleep-edfx/1.0.0/. The latest version of this database contains 153 subjects in the SC subset. This experiment was conducted with the __previous version__ of the SC subset which contains __20 subjects__ intentionally to simulate the situation of a small cohort. If you download the new version, make sure to use 20 subjects __SC400-SC419__.
+
+On the ST subset of the database, the experiments were conducted with 22 placebo recordings. Make sure that you refer to https://physionet.org/content/sleep-edfx/1.0.0/ST-subjects.xls to obtain the right recordings and subjects.
+
+The experiments only used the __in-bed__ parts (from _light off_ time to _light on_ time) of the recordings to avoid the dominance of Wake stage  as suggested in 
+
+- S. A. Imtiaz and E. Rodriguez-Villegas, __An open-source toolbox for standardized use of PhysioNet Sleep EDF Expanded Database__. _Proc. EMBC_, pp. 6014-6017, 2015.
+
+Meta information (e.g. _light off_ and _light on_ times to extract the __in-bed__ parts data from the whole day-night recordings the meta information is provided in `sleepedfx_meta`.
+
+Contact:
+-------------
+Huy Phan <br>
+Email: huy.phan{at}ieee.org or h.phan{a}kent.ac.uk  
diff --git a/deepsleepnet/.DS_Store b/deepsleepnet/.DS_Store
diff --git a/deepsleepnet/compute_sleepedfsc_performance.m b/deepsleepnet/compute_sleepedfsc_performance.m
@@ -0,0 +1,97 @@
+function [acc, f1, kappa, mean_sens, mean_sel] = compute_sleepedfsc_performance(ret_path)
+
+    seq_len = 20;
+    Nfold = 20;
+    yh = cell(Nfold,1);
+    yt = cell(Nfold,1);
+    mat_path = './mat/sleepedf_sc/';
+    % load data split
+    load('./data_split_sleepedf_sc.mat');
+
+    for fold = 1 : Nfold
+        fold
+        test_s = test_sub{fold};
+        sample_size = [];
+        for i = 1 : numel(test_s)
+            i
+            for night = 1 : 2
+                sname = ['n', num2str(test_s(i),'%02d'), '_', num2str(night), '_eeg.mat'];
+                % subject 13 does not have 2 nights
+                if(~exist([mat_path, sname], 'file'))
+                    continue
+                end
+                load([mat_path,sname], 'label');
+                % this is actual output of the network as we excluded those at the
+                % recording ends which do not consitute a full sequence
+                sample_size = [sample_size; numel(label) -  (seq_len - 1)]; 
+                % pool ground-truth labels of all test subjects
+                yt{fold} = [yt{fold}; double(label)];
+            end
+        end
+
+
+        if(~exist([ret_path, 'n', num2str(fold),'/test_ret.mat'],'file'))
+            disp('Returned file does not exist:')
+            disp([ret_path, 'n', num2str(fold),'/test_ret.mat'])
+        end
+
+        load([ret_path, 'n', num2str(fold),'/test_ret.mat']);
+        % as we shifted by one PSG epoch when generating sequences, L (sequence
+        % length) decisions are available for each PSG epoch. This segment is
+        % to aggregate the decisions to derive the final one.
+        score_ = cell(1,seq_len);
+        for n = 1 : seq_len
+            score_{n} = softmax(squeeze(score(n,:,:)));
+        end
+        score = score_;
+        clear score_;
+
+        count = 0;
+        for i = 1 : numel(test_s)
+            for night = 1 : 2
+                sname = ['n', num2str(test_s(i),'%02d'), '_', num2str(night), '_eeg.mat'];
+                if(~exist([mat_path, sname], 'file'))
+                    continue
+                end
+                count = count + 1;
+                % start and end positions of current test subject's output
+                start_pos = sum(sample_size(1:count-1)) + 1;
+                end_pos = sum(sample_size(1:count-1)) + sample_size(count);
+                score_i = cell(1,seq_len);
+                for n = 1 : seq_len
+                    score_i{n} = score{n}(start_pos:end_pos, :);
+                    N = size(score_i{n},1);
+                    % padding ones for those positions not constituting full
+                    % sequences
+                    score_i{n} = [ones(seq_len-1,5); score{n}(start_pos:end_pos, :)];
+                    score_i{n} = circshift(score_i{n}, -(seq_len - n), 1);
+                end
+
+                % multiplicative probabilistic smoothing for aggregation
+                % which equivalent to summation in log domain
+                fused_score = log(score_i{1});
+                for n = 2 : seq_len
+                    fused_score = fused_score + log(score_i{n});
+                end
+
+                % the final output labels via likelihood maximization
+                yhat = zeros(1,size(fused_score,1));
+                for k = 1 : size(fused_score,1)
+                    [~, yhat(k)] = max(fused_score(k,:));
+                end
+
+                % pool outputs of all test subjects
+                yh{fold} = [yh{fold}; double(yhat')];
+            end
+        end
+    end
+
+    yh = cell2mat(yh);
+    yt = cell2mat(yt);
+
+    [acc, kappa, f1, ~, spec] = calculate_overall_metrics(yt, yh);
+    [sens, sel]  = calculate_classwise_sens_sel(yt, yh);
+    mean_sens = mean(sens);
+    mean_sel = mean(sel);
+end
+
diff --git a/deepsleepnet/compute_sleepedfst_performance.m b/deepsleepnet/compute_sleepedfst_performance.m
@@ -0,0 +1,93 @@
+function [acc, f1, kappa, mean_sens, mean_sel] = compute_sleepedfst_performance(ret_path)
+
+    seq_len = 20;
+    Nfold = 11;
+    yh = cell(Nfold,1);
+    yt = cell(Nfold,1);
+    mat_path = './mat/sleepedf_st/';
+    % load data split
+    load('./data_split_sleepedf_st.mat');
+
+    for fold = 1 : Nfold
+        fold
+        test_s = test_sub{fold};
+        sample_size = [];
+        for i = 1 : numel(test_s)
+            i
+            sname = ['n', num2str(test_s(i),'%02d'), '_eeg.mat'];
+            if(~exist([mat_path, sname], 'file'))
+                continue
+            end
+            load([mat_path,sname], 'label');
+            % this is actual output of the network as we excluded those at the
+            % recording ends which do not consitute a full sequence
+            sample_size = [sample_size; numel(label) -  (seq_len - 1)]; 
+            % pool ground-truth labels of all test subjects
+            yt{fold} = [yt{fold}; double(label)];
+        end
+
+
+        if(~exist([ret_path, 'n', num2str(fold),'/test_ret.mat'],'file'))
+            disp('Returned file does not exist:')
+            disp([ret_path, 'n', num2str(fold),'/test_ret.mat'])
+        end
+
+        load([run_path, 'n', num2str(fold),'/test_ret.mat']);
+        % as we shifted by one PSG epoch when generating sequences, L (sequence
+        % length) decisions are available for each PSG epoch. This segment is
+        % to aggregate the decisions to derive the final one.
+        score_ = cell(1,seq_len);
+        for n = 1 : seq_len
+            score_{n} = softmax(squeeze(score(n,:,:)));
+        end
+        score = score_;
+        clear score_;
+
+        count = 0;
+        for i = 1 : numel(test_s)
+            sname = ['n', num2str(test_s(i),'%02d'), '_eeg.mat'];
+            if(~exist([mat_path, sname], 'file'))
+                continue
+            end
+            count = count + 1;
+            % start and end positions of current test subject's output
+            start_pos = sum(sample_size(1:count-1)) + 1;
+            end_pos = sum(sample_size(1:count-1)) + sample_size(count);
+            score_i = cell(1,seq_len);
+            %valid_ind = cell(1,seq_len);
+            for n = 1 : seq_len
+                score_i{n} = score{n}(start_pos:end_pos, :);
+                N = size(score_i{n},1);
+                % padding ones for those positions not constituting full
+                % sequences
+                score_i{n} = [ones(seq_len-1,5); score{n}(start_pos:end_pos, :)];
+                score_i{n} = circshift(score_i{n}, -(seq_len - n), 1);
+            end
+
+            % multiplicative probabilistic smoothing for aggregation
+            % which equivalent to summation in log domain
+            fused_score = log(score_i{1});
+            for n = 2 : seq_len
+                fused_score = fused_score + log(score_i{n});
+            end
+
+            % the final output labels via likelihood maximization
+            yhat = zeros(1,size(fused_score,1));
+            for k = 1 : size(fused_score,1)
+                [~, yhat(k)] = max(fused_score(k,:));
+            end
+
+            % pool outputs of all test subjects
+            yh{fold} = [yh{fold}; double(yhat')];
+        end
+    end
+
+    yh = cell2mat(yh);
+    yt = cell2mat(yt);
+
+    [acc, kappa, f1, ~, spec] = calculate_overall_metrics(yt, yh);
+    [sens, sel]  = calculate_classwise_sens_sel(yt, yh);
+    mean_sens = mean(sens);
+    mean_sel = mean(sel);
+end
+
diff --git a/deepsleepnet/data_split_sleepedf_sc.mat b/deepsleepnet/data_split_sleepedf_sc.mat
diff --git a/deepsleepnet/data_split_sleepedf_st.mat b/deepsleepnet/data_split_sleepedf_st.mat
diff --git a/deepsleepnet/examples_evaluation.m b/deepsleepnet/examples_evaluation.m
@@ -0,0 +1,25 @@
+%% 
+% Examples on how to evaluate the performance
+%% 
+clear all
+close all
+clc
+
+addpath('../metrics');
+
+%% Example 1
+% path to tensorflow experiments with SleepEDF-SC and the network output saved in
+% test_ret.mat
+% finetuning 2chan EEG+EOG experiment is used as the example here
+ret_path = './tensorflow/seqsleepnet/finetune_all_2chan/sleepedf_sc/';
+
+[acc, f1, kappa, mean_sens, mean_sel] = compute_sleepedfsc_performance(ret_path);
+
+
+%% Example 2
+% path to tensorflow experiments with SleepEDF-ST and the network output saved in
+% test_ret.mat
+% finetuning 2chan EEG+EOG experiment is used as the example here
+ret_path = './tensorflow/seqsleepnet/finetune_all_2chan/sleepedf_st/';
+
+[acc, f1, kappa, mean_sens, mean_sel] = compute_sleepedfsc_performance(ret_path);