How to preprocess SS2016 and test this dataset at different SNR levels? #2

John-666-git · 2024-10-30T03:10:55Z

In your paper, you mentioned the preprocessing operations for the SS2016 dataset. I am unsure how to perform these operations specifically, and I couldn't find the preprocessing code on GitHub. Could you please share the code for the preprocessing of this dataset, or upload it to GitHub? I would greatly appreciate it.

I have sent you an email, and you can also communicate with me via email.

I sincerely hope to get your guidance. Warm regards, and thank you for your attention to this matter.

woldier · 2024-10-30T03:28:28Z

Hi @John-666-git .

I'm glad you noticed our work. We actually provide the processed data, and have open sourced it to the huggingface hub.
Given your curiosity about the data preprocessing process, we will be uploading the preprocessing code in the future.

John-666-git · 2024-10-30T04:12:54Z

Dear woldier:

Thank you very much for your reply.

Your preprocessing workflow has been incredibly enlightening, and seeing the process in code form would provide even greater clarity. If it would be possible to upload the code at your earliest convenience, I would be truly grateful.

Thank you very much for your time and consideration!

woldier · 2024-10-30T08:09:48Z

Exp. Doc.

The SS2016 EOG dataset, as proposed in "Cite’A semi-simulated EEG/EOG dataset for the comparison of EOG artifact rejection techniques," comprises clean EEG signals contaminated with ocular artifacts. This dataset is particularly valuable as it contains both clean EEG signals and their contaminated counterparts, collected from 54 participants during a close-eye experiment. Notably, the EEG signals in this dataset are free from ocular artifacts, recorded from 19 electrodes placed according to the International 10-20 system, with varying numbers of sampling points ranging from 5600 to 8400. To facilitate processing, the data were segmented into non-overlapping segments of 512 samples each, resulting in a total of 11495 samples. image (1) shows the 0th-7th segments of data 0 of sim1.

image(1). The recorded signals from the first electrode of Participant 1 in SS2016 are displayed in sequence as segments 1 through 6. Each segment comprises 512 samples, with a sampling rate of 200 SPS.

Initially, we partitioned this dataset into training and test sets using an 8:2 split ratio and proceeded with training. However, upon evaluating the training results (refer to image (2)), we observed that the overall noise level in the dataset was relatively low, potentially underrepresenting the network's denoising performance. Additionally, the distribution of SNR values (as shown in image (3)) indicated a predominance of samples within the -5 to 15 range, which could bias the network learning process.

image (2). Visualization of EEG signal denoising results of EEGDiR on SS2016 test set. Each segment comprises 512 samples, with a sampling rate of 200 SPS.

image (3). The SNR distribution of the SS2016 dataset after signal segmentation is depicted in the figure. It's important to highlight that the SNR values are rounded up for statistical simplicity. The horizontal axis represents various SNR levels ranging from -10 to 20, while the vertical axis indicates the number of segments corresponding to each SNR level.

To address these concerns, we followed the data processing approach outlined in the EEGDenoiseNet dataset to enhance the dataset's robustness. Specifically, we separated the noise from the pure signal pairs and introduced a parameter λ to achieve varying noise levels (-7 to 2 dB), ensuring uniform sample distribution across different noise levels. This augmentation of the dataset is crucial for improving the network's resilience to diverse noise conditions.

However, during training, we encountered instances where λ values became excessively large for signal pairs with high SNR, resulting in unrealistic training samples. To mitigate this, we filtered out sample pairs with SNR higher than 5 dB. The visualization results of the processed signals are presented in Figure 4, further validating the efficacy of our approach.
We believe that the incorporation of the SS2016 EOG dataset, along with comprehensive data processing and augmentation strategies, strengthens the credibility of our proposed method. We appreciate your insightful feedback, which has contributed to enhancing the rigor and validity of our study.

image (4). Visualization of EEG signal denoising results of EEGDiR on SS2016 test set. Note that the dataset is filtered and processed. Each segment comprises 512 samples, with a sampling rate of 200 SPS.

woldier · 2024-10-30T08:44:20Z

Hi @John-666-git:

plz check code and dev doc.

Sincerely,
woldier

John-666-git · 2024-10-30T08:50:52Z

Dear woldier:

Thank you so much for your quick and helpful response! I really appreciate the time you took to address my questions.

woldier pinned this issue Oct 30, 2024

woldier linked a pull request Oct 30, 2024 that will close this issue

update dataset preprocessing #3

Merged

woldier closed this as completed in #3 Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to preprocess SS2016 and test this dataset at different SNR levels? #2

How to preprocess SS2016 and test this dataset at different SNR levels? #2

John-666-git commented Oct 30, 2024

woldier commented Oct 30, 2024

John-666-git commented Oct 30, 2024

woldier commented Oct 30, 2024

woldier commented Oct 30, 2024

John-666-git commented Oct 30, 2024

How to preprocess SS2016 and test this dataset at different SNR levels? #2

How to preprocess SS2016 and test this dataset at different SNR levels? #2

Comments

John-666-git commented Oct 30, 2024

woldier commented Oct 30, 2024

John-666-git commented Oct 30, 2024

woldier commented Oct 30, 2024

Exp. Doc.

woldier commented Oct 30, 2024

John-666-git commented Oct 30, 2024