Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to preprocess SS2016 and test this dataset at different SNR levels? #2

Closed
John-666-git opened this issue Oct 30, 2024 · 5 comments · Fixed by #3
Closed

How to preprocess SS2016 and test this dataset at different SNR levels? #2

John-666-git opened this issue Oct 30, 2024 · 5 comments · Fixed by #3

Comments

@John-666-git
Copy link

In your paper, you mentioned the preprocessing operations for the SS2016 dataset. I am unsure how to perform these operations specifically, and I couldn't find the preprocessing code on GitHub. Could you please share the code for the preprocessing of this dataset, or upload it to GitHub? I would greatly appreciate it.

I have sent you an email, and you can also communicate with me via email.

I sincerely hope to get your guidance. Warm regards, and thank you for your attention to this matter.

@woldier
Copy link
Owner

woldier commented Oct 30, 2024

Hi @John-666-git .

I'm glad you noticed our work. We actually provide the processed data, and have open sourced it to the huggingface hub.
Given your curiosity about the data preprocessing process, we will be uploading the preprocessing code in the future.

@John-666-git
Copy link
Author

Dear woldier:

Thank you very much for your reply.

Your preprocessing workflow has been incredibly enlightening, and seeing the process in code form would provide even greater clarity. If it would be possible to upload the code at your earliest convenience, I would be truly grateful.

Thank you very much for your time and consideration!

@woldier
Copy link
Owner

woldier commented Oct 30, 2024

Exp. Doc.

The SS2016 EOG dataset, as proposed in "Cite’A semi-simulated EEG/EOG dataset for the comparison of EOG artifact rejection techniques," comprises clean EEG signals contaminated with ocular artifacts. This dataset is particularly valuable as it contains both clean EEG signals and their contaminated counterparts, collected from 54 participants during a close-eye experiment. Notably, the EEG signals in this dataset are free from ocular artifacts, recorded from 19 electrodes placed according to the International 10-20 system, with varying numbers of sampling points ranging from 5600 to 8400. To facilitate processing, the data were segmented into non-overlapping segments of 512 samples each, resulting in a total of 11495 samples. image (1) shows the 0th-7th segments of data 0 of sim1.

image(1). The recorded signals from the first electrode of Participant 1 in SS2016 are displayed in sequence as segments 1 through 6. Each segment comprises 512 samples, with a sampling rate of 200 SPS.
image1

Initially, we partitioned this dataset into training and test sets using an 8:2 split ratio and proceeded with training. However, upon evaluating the training results (refer to image (2)), we observed that the overall noise level in the dataset was relatively low, potentially underrepresenting the network's denoising performance. Additionally, the distribution of SNR values (as shown in image (3)) indicated a predominance of samples within the -5 to 15 range, which could bias the network learning process.

image (2). Visualization of EEG signal denoising results of EEGDiR on SS2016 test set. Each segment comprises 512 samples, with a sampling rate of 200 SPS.
image2
image (3). The SNR distribution of the SS2016 dataset after signal segmentation is depicted in the figure. It's important to highlight that the SNR values are rounded up for statistical simplicity. The horizontal axis represents various SNR levels ranging from -10 to 20, while the vertical axis indicates the number of segments corresponding to each SNR level.
image3

To address these concerns, we followed the data processing approach outlined in the EEGDenoiseNet dataset to enhance the dataset's robustness. Specifically, we separated the noise from the pure signal pairs and introduced a parameter λ to achieve varying noise levels (-7 to 2 dB), ensuring uniform sample distribution across different noise levels. This augmentation of the dataset is crucial for improving the network's resilience to diverse noise conditions.

However, during training, we encountered instances where λ values became excessively large for signal pairs with high SNR, resulting in unrealistic training samples. To mitigate this, we filtered out sample pairs with SNR higher than 5 dB. The visualization results of the processed signals are presented in Figure 4, further validating the efficacy of our approach.
We believe that the incorporation of the SS2016 EOG dataset, along with comprehensive data processing and augmentation strategies, strengthens the credibility of our proposed method. We appreciate your insightful feedback, which has contributed to enhancing the rigor and validity of our study.

image (4). Visualization of EEG signal denoising results of EEGDiR on SS2016 test set. Note that the dataset is filtered and processed. Each segment comprises 512 samples, with a sampling rate of 200 SPS.
image4

@woldier woldier pinned this issue Oct 30, 2024
@woldier woldier linked a pull request Oct 30, 2024 that will close this issue
@woldier
Copy link
Owner

woldier commented Oct 30, 2024

Hi @John-666-git:

plz check code and dev doc.

Sincerely,
woldier

@John-666-git
Copy link
Author

Dear woldier:

Thank you so much for your quick and helpful response! I really appreciate the time you took to address my questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants